2011-12-16

SICP Exercise 2.60: Duplicate Sets

We specified that a set would be represented as a list with no duplicates. Now suppose we allow duplicates. For instance, the set {1,2,3} could be represented as the list (2 3 2 1 3 2 2). Design procedures element-of-set?, adjoin-set, union-set, and intersection-set that operate on this representation. How does the efficiency of each compare with the corresponding procedure for the non-duplicate representation? Are there applications for which you would use this representation in preference to the non-duplicate one?

We're still using a list as our representation here. We've just removed the restriction on duplicates. This means that those set operations which create a new set by taking an existing set and potentially adding new items to it no longer need to check if those items are present before adding them. This means adjoin-set and union-set no longer need to perform those checks. We'll look at those set operations in detail in a minute, but let's first look at the other two set operations: element-of-set and intersection-set.

The implementation of element-of-set given in the book simply iterates through the list representation of the set and returns true as soon as it encounters a matching element, or false if there is no such matching element. Allowing duplicates in the list representation doesn't change this approach. We still need to scan through the list to see if the element is present. It's just that, where duplicates exist, we may compare against the same value multiple times. So element-of-set can be used as is - it may run slower though, as the presence of duplicates means that the lists we're scanning may be longer than their equivalents when duplicates are not allowed.

We can see that the same holds true for intersection-set. The implementation in the book iterates through the list representation of set1 and, for each element, adds it to the result set iff it's also present in set2. If there are duplicates in either set1 or set2 this doesn't really change this approach. We still need to examine each item in set1 and only include it in the result set if it's also in set2, so we can reuse the existing implementation of intersection-set. Note that the result set could contain duplicates: any element that appears in both sets will appear as many times as it is present in set1. As with element-of-set this means that it may also run slower and may generate larger sets in comparison with using equivalent sets where duplicates are not allowed.

Now onto adjoin-set... In the "no duplicates" implementation provided in the book, adjoin-set checks to see whether the element is already in the representation before appending it. As we're allowed duplicates in our representation we don't need this test - we can just cons it onto the head. This gives us a very simple implementation:
(define (adjoin-set x set)
  (cons x set))
...or, to put it more succinctly...
(define adjoin-set cons)
As this simply puts the item on the head of the list this makes it an Θ(1) operation, and so is much more efficient than the "no duplicates" implementation (which is Θ(n)). Note that if we wanted to be slightly smart about it, still allow duplicates generally, and still retain the Θ(1) efficiency, we could simply check the head of the list to ensure that we're not putting another identical element onto the head of the list:
(define (adjoin-set x set)
  (if (or (null? set) (not (equal? x (car set))))
      (cons x set)
      set))
However, given that the example representation of the set {1,2,3} ((2 3 2 1 3 2 2)) includes two '2' values adjacent to each other, this isn't a necessary optimization.

The last operation, union-set needs to produce the set of all values in both set1 and set2. As we no longer need to worry about duplicates we can simply append the two sets together to produce the result we need:
(define (union-set set1 set2)
  (append set1 set2))
...or, more succinctly...
(define union-set append)
Okay, so let's build some sets:
> (define evens
    (adjoin-set 0 (adjoin-set 2 (adjoin-set 4 (adjoin-set 6 (adjoin-set 8 '()))))))
> (define odds
    (adjoin-set 1 (adjoin-set 3 (adjoin-set 5 (adjoin-set 7 (adjoin-set 9 '()))))))
> evens
'(0 2 4 6 8)
> odds
'(1 3 5 7 9)
> (adjoin-set 2 evens)
'(2 0 2 4 6 8)
> (adjoin-set 2 odds)
'(2 1 3 5 7 9)
> (intersection-set evens odds)
'()
> (intersection-set evens evens)
'(0 2 4 6 8)
> (union-set evens odds)
'(0 2 4 6 8 1 3 5 7 9)
> (union-set evens evens)
'(0 2 4 6 8 0 2 4 6 8)
Let's compare efficiencies of implementations:

OperationNo DuplicatesAllow Duplicates
element-of-set?Θ(n)Θ(n)
adjoin-setΘ(n)Θ(1)
intersection-setΘ(n2)Θ(n2)
union-setΘ(n2)Θ(n)

Of course we need to be slightly careful in the comparison here... While element-of-set? is Θ(n) regardless of whether or not we allow duplicates, the n here is the number of elements in the list representation of the set, not the number of distinct elements in the set. As a result, n could be much larger in the "allow duplicates" case than in the "no duplicates" case, and so the operation could be much slower. A similar issue arises for intersection-set too (except we're dealing with Θ(n2) here, so the effect can be much more exacerbated).

We also need to be aware of this issue when comparing the two union-set efficiencies. While the "allow duplicates" case is definitely more efficient (at Θ(n) as opposed to the no duplicates efficiency of Θ(n2)), the value of n in the "allow duplicates" case could potentially be much higher. For example, consider the set {1, 2, 3}. In the "no duplicates" case the size of the underlying list we have to process (and so the n we are dealing with) will always be 3. However, in the "allow duplicates" case all we know for sure is that it will be at least 3 - but (theoretically) there's no upper bound on the size of the underlying list we have to process, so with certain list representations we may find that the Θ(n) operation in the "allow duplicates" case may still perform worse than the Θ(n2) operation in the "no duplicates case.

With adjoin-set there is no such issue. The "allow duplicates" case will take constant time regardless of the size of the underlying representation (or at least we're assuming that this is how cons works). As a result it doesn't matter that there may be duplicates in the set - this has no effect on the efficiency of the operation, and so the "allow duplicates" case will generally be quicker than the "no duplicates" case.

So when would we use this representation in preference to the non-duplicate one? Well, in applications where we're going to use adjoin-set much more frequently than any of the other operations (and where memory is not a concern) it may be preferable to use the implementation that allows duplicates.

SICP Exercise 2.59: Unioning Unordered Sets

Implement the union-set operation for the unordered-list representation of sets.

The implementation of union-set is similar in many respects to intersection-set. We can use a recursive strategy and, if we know how to form the union of set2 and the cdr of set1 then we only need to decide whether to include the car of set1 in this, which depends upon whether (car set1) is also in set2. However, the logic gets flipped around.

To begin with, the union of any set S with the empty set is the set S (as opposed to the intersection of any set with the empty set, which is the empty set). We'll need two clauses in our cond to cope with this: if set1 is null then the union of the two sets is set2, while if set2 is null then the union of the two sets is set1.

Next, while the decision as to whether to include the car of set1 in the result depends upon whether (car set1) is also in set2, the decision is inverted. For intersections we include (car set1) in the result if it's also in set2. For unions we reverse this and only include (car set1) if it's not in set2. Why do we invert it? We need to invert this logic as our base case is when set1 is null, in which case we use set2 as the result. We're therefore adding elements of set1 onto set2 to produce the union of the two sets and so, in order to ensure there are no duplicates, we need to exclude any elements of set1 that are already in set2.

Here's the procedure:
(define (union-set set1 set2)
  (cond ((null? set1) set2)
        ((null? set2) set1)
        ((element-of-set? (car set1) set2)
         (union-set (cdr set1) set2))
        (else (cons (car set1)
                    (union-set (cdr set1) set2)))))
...and here it is in action:
(union-set '() (list 1 2 3 4 5))
'(1 2 3 4 5)
(union-set (list 1 2 3 4 5) '())
'(1 2 3 4 5)
(union-set (list 1 2 4 6 7) (list 2 3 4 5 8))
'(1 6 7 2 3 4 5 8)

2011-12-06

SICP Exercise 2.58: Infix Operators... Part (b)... The Hard Version

Suppose we want to modify the differentiation program so that it works with ordinary mathematical notation, in which + and * are infix rather than prefix operators. Since the differentiation program is defined in terms of abstract data, we can modify it to work with different representations of expressions solely by changing the predicates, selectors, and constructors that define the representation of the algebraic expressions on which the differentiator is to operate.
  1. Show how to do this in order to differentiate algebraic expressions presented in infix form, such as (x + (3 * (x + (y + 2)))). To simplify the task, assume that + and * always take two arguments and that expressions are fully parenthesized.
  2. The problem becomes substantially harder if we allow standard algebraic notation, such as (x + 3 * (x + y + 2)), which drops unnecessary parentheses and assumes that multiplication is done before addition. Can you design appropriate predicates, selectors, and constructors for this notation such that our derivative program still works?
In the previous post we dealt with part (a) of this exercise which allowed us to ignore the associativity of the operations and just handle infix expressions with two operands. In this post we not only need to deal with variable numbers of operands (which we dealt with for prefix notation in exercise 2.57)... The example above contains a mix of operators at the same level within the expression. This means we'll also need to deal with operator precedence.

Dealing With Precedence
At the moment our program only deals with three different kinds of operations: exponentiation, multiplication and addition. The normal order of precedence for these operations is the order in which they've just been listed. I.e. exponentiation binds more tightly than multiplication, which binds more tightly than addition. Of course we've also got parenthesized sub-expressions to consider - but, as we're representing our expressions as Scheme lists this doesn't actually cause us any additional precedence headaches. Provided we simply treat the sub-expression (which is a list) as an operand of the operator to which it is bound, and then process the sub-expression itself as if it were its own expression, all will be fine.

So how can we deal with standard algebraic notation and still have deriv work then? Well, one way is to treat the expression as if it were appropriately parenthesized according to the precedence rules, binding the operands for the highest precedence operators first. For example, we can treat the expression (a + b * c ** d * e + f ** g) as if it were parenthesized as (a + (b * (c ** d) * e) + (f ** g)). I.e. we should treat this expression as a sum with three operands - the symbol a, a sub-expression that's the product b * (c ** d) * e, and the exponentiation f ** g. We should then also be able to apply the reduction process we built in exercise 2.57 (or a variant of it) to this to simplify expressions as much as is possible (or at least as much as is possible using reduce!).

In order to do this we first need a way to identify what the lowest precedence operator in an expression is (ignoring sub-expressions). This effectively provides us with our groupings. For example, the lowest precedence operator in (a + b * c ** d * e + f ** g) is +. If we then divide the expression up into sub-expressions around the occurrences of + we get the expression (a + (b * c ** d * e) + (f ** g)). We can then repeat this for the sub-expressions we've created: the lowest precedence operation in (b * c ** d * e) is *, so that allows us to divide that sub-expression up into (b * (c ** d) * e), while the lowest precedence operation in (f ** g) is **, so there's no further division to do there.

Let's start by defining our operators in a list, ordered by precedence:
(define operators '(+ * **))
We can then get the precedence of an operator by getting its index in the list. First, let's define a general procedure for getting the index of an element in a list. This can simply iterate through the list, accumulating a count until it finds a matching element. I'll use zero-based indexes, and have my procedure return -1 if the element isn't present:
(define (get-index element alist)
  (define (find-element-index index remaining)
    (cond ((null? remaining) -1)
          ((eq? element (car remaining)) index)
          (else (find-element-index (+ index 1) (cdr remaining)))))
  (find-element-index 0 alist))
Here it is in action:
> (get-index '+ operators)
0
> (get-index '** operators)
2
> (get-index 'wibble operators)
-1
We can then wrap get-index in a procedure that gets the precedence of a given operator, or reports an error if it's not a valid operator:
(define (precedence op)
  (let ((index (get-index op operators)))
    (if (>= index 0)
        index
        (error "Unknown operation -- PRECEDENCE" op))))
So let's test that with the same symbols:
> (precedence '+)
0
> (precedence '**)
2
> (precedence 'wibble)
. . Unknown operation -- PRECEDENCE wibble
Of course it's worth noting that, were we dealing with a larger or different set of operations then this simplistic approach to operator precedence would not necessarily work. This is due to some operations, such as subtraction or division, sharing the same precedence as other operations: subtraction has the same precedence as addition while division has the same precedence as multiplication. We could solve this by redefining operators as a list of lists such as '((+ -) (* /) (**)), and then find the index of the list that the operator is a member of, rather than just finding the index of the matching element. However, to keep everything simple, both here and in the rest of the exercise, I'm going to stick to our limited set of three operations!

Determining an Expression's Operation Type
As noted above, we want to treat expressions as if they were appropriately parenthesized according to the precedence rules, and to do this we want to identify what the lowest precedence operator in an expression is. We've produced precedence, which will tell us what the precedence of an operator is. All we need to do now is iterate through the elements of an expression, find all the operators in it and figure out which has the lowest precedence.

First, let's build a helper procedure to figure out whether or not a value is an operator. We can do this simply by asking for its index in the list of operators and checking that it's non-negative:
(define (operator? op)
  (>= (get-index op operators) 0))
We can then use accumulate (introduced in the Sequence Operations sub-section in section 2.2.3) to process the expression. For each element of the expression this needs to check whether or not it's an operator and, if it is, check its precedence against that of the lowest precedence operator found so far. If it's got a lower precedence then we change the accumulated value to this new operator, otherwise we keep the current operator.
(define (get-operation expression)
  (accumulate
    (lambda (x y)
      (if (and (operator? x)
               (or (null? y) (< (precedence x) (precedence y))))
          x
          y))
    '()
    expression))
You'll note that I've used '() as the initial value. I'm not overly happy with this, and it probably betrays my C/C++/Java background somewhat... I could have made the assumption that every expression has a valid operator in it and used the highest precedence operator (i.e. **) as the initial value. If I'd done this then I could omit the null? check here. However, it feels more correct to me to start from a position of not making such an assumption and not knowing what the operator is. To do this we need a value which is easily testable and is guaranteed not to be an operator. I really want to use a null value here (like I say, C/C++/Java background), but as Scheme doesn't, as far as I'm aware, support these, I'm using the empty list instead...

Anyway, here it is in action:
> (get-operation '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
'+
> (get-operation '(1 * 3))
'*
> (get-operation '(1 ** 3 * 2))
'*
> (get-operation '(1 + x * 3))
'+
> (get-operation '(1 ** y))
'**
We can then redefine sum?, product? and exponentiation in terms of get-operation, as all they need to do is check whether or not the operator returned by get-operation is the correct one for the type in question. As all three procedures perform the same test, differing only in the operation symbol, I extracted a helper procedure, is-operation? and had all three procedures delegate to this:
(define (is-operation? op expression)
  (and (pair? expression) (eq? (get-operation expression) op)))

(define (sum? x)
  (is-operation? '+ x))

(define (product? x)
  (is-operation? '* x))

(define (exponentiation? x)
  (is-operation? '** x))
Using the same expressions as before we can put this to the test:
> (sum? '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
#t
> (sum? '(1 * 3))
#f
> (sum? '(1 ** 3 * 2))
#f
> (sum? '(1 + x * 3))
#t
> (sum? '(1 ** y))
#f
> (product? '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
#f
> (product? '(1 * 3))
#t
> (product? '(1 ** 3 * 2))
#t
> (product? '(1 + x * 3))
#f
> (product? '(1 ** y))
#f
> (exponentiation? '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
#f
> (exponentiation? '(1 * 3))
#f
> (exponentiation? '(1 ** 3 * 2))
#f
> (exponentiation? '(1 + x * 3))
#f
> (exponentiation? '(1 ** y))
#t
Selectors
In exercise 2.57 we are told that we can cope with arbitrary numbers of terms by having the selector for the left-hand side of an operator return the first term and the selector for the right hand side of an operator returning the rest of the terms: "For example, the addend of a sum would be the first term, and the augend would be the sum of the rest of the terms."

To do this with infix expressions we need to identify the first occurrence of the operator in a given expression and return the sub-list either preceding or following this element as appropriate. We can then use these to iterate through an expression with a known lowest precedence operator and divide it up into sub-expressions. Of course we'll need to cope with the situation that will then occur when we get to the last sub-expression - there will be no occurrence of the lowest precedence operator left. In this case the sub-list preceding the "first occurrence" of the operator should be the last sub-expression, while the sub-list following the "first occurrence" of the operator should be the empty list. This is consistent with exercise 2.57 and allows us to easily test when we've processed the last sub-expression (i.e. we test whether or not the sub-list following the first occurrence of the operator is null).

Now Scheme provides a procedure, list-tail, that returns the tail of a list from a particular index onwards... and we've already built a procedure, get-index, that returns the index of an element in a list. We can plug these together to produce a procedure that will give us what we need:
(define (get-list-expression-after-op op expression)
  (let ((index (get-index op expression)))
    (if (< index 0)
        '()
        (list-tail expression (+ index 1)))))
Note that we need to put in an explicit check of the index as, under DrRacket at least, calling list-tail with a negative number returns the entire list, whereas we want it to return the empty list in such a case.

Also note that this procedure always returns a list. However, to keep our selectors for the right-hand sides of operators consistent with the preceding exercises, if there's only a single element in the list we need it to return the element itself, not the list. While we could modify the procedure appropriately, I'm actually going to produce a wrapper procedure that gives us what we need (as we'll need the original procedure when we come on to modifying reduce):
(define (get-expression-after-op op expression)
  (let ((tail (get-list-expression-after-op op expression)))
    (if (and (pair? tail) (null? (cdr tail)))
        (car tail)
        tail)))
Unfortunately, scheme doesn't appear to provide a complementary procedure to list-tail for getting the head of a list. However, it's not that difficult to build up a procedure that gets the first n items of a list. As before we'll have two procedures here: one to get the list preceding the first occurrence of an operation (or the whole list if it doesn't appear), and one to get that list and, if it only contains one element, return the element instead of the list:
(define (get-list-expression-before-op op expression)
  (define (build-expression l)
    (if (or (null? l) (eq? (car l) op))
        '()
        (cons (car l) (build-expression (cdr l)))))
  (build-expression expression))

(define (get-expression-before-op op expression)
  (let ((head (get-list-expression-before-op op expression)))
    (if (null? (cdr head))
        (car head)
        head)))
Let's put these to the test:
> (get-expression-before-op '+ '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
'(1 * 3 * 5 ** 7)
> (get-expression-after-op '+ '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
'(3 ** 4 + 4)
> (get-expression-before-op '* '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
1
> (get-expression-after-op '* '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
'(3 * 5 ** 7 + 3 ** 4 + 4)
> (get-expression-before-op '** '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
'(1 * 3 * 5)
> (get-expression-after-op '** '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
'(7 + 3 ** 4 + 4)
> (get-expression-before-op '+ '(1 * 3 * 5 ** 7 + 3))
'(1 * 3 * 5 ** 7)
> (get-expression-after-op '+ '(1 * 3 * 5 ** 7 + 3))
3
> (get-expression-before-op '- '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
'(1 * 3 * 5 ** 7 + 3 ** 4 + 4)
> (get-expression-after-op '- '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
'()
It's worth noting the last case here - if the operator is not present in the expression then the whole expression is "before" it, and nothing is "after" it.

Okay, now we can easily build our selectors using these:
(define (addend s) (get-expression-before-op '+ s))
(define (augend s) (get-expression-after-op '+ s))

(define (multiplier s) (get-expression-before-op '* s))
(define (multiplicand s) (get-expression-after-op '* s))

(define (base s) (get-expression-before-op '** s))
(define (exponent s) (get-expression-after-op '** s))
And here they are in action:
> (sum? '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
#t
> (addend '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
'(1 * 3 * 5 ** 7)
> (augend '(1 * 3 * 5 ** 7 + 3 ** 4 + 4))
'(3 ** 4 + 4)
> (addend '(3 ** 4 + 4))
'(3 ** 4)
> (augend '(3 ** 4 + 4))
4
> (product? '(1 ** 3 * 2))
#t
> (multiplier '(1 ** 3 * 2))
'(1 ** 3)
> (multiplicand '(1 ** 3 * 2))
2
> (exponentiation? '(1 ** y))
#t
> (base '(1 ** y))
1
> (exponent '(1 ** y))
'y
Reduce, Reuse, Recycle
In exercise 2.57 we produced the procedure reduce that would reduce an expression to a simpler form. We should be able to reuse it here, with some adjustments, to perform the same reductions for infix expressions. We can reuse the main body of the procedure and count-accumulate directly, as these deal with expressions as atomic units - they don't deal with any of the internals of the representation. However, both generate and count-accumulate need a few changes. Let's start with generate.

We still want generate to check whether or not we have any expression counts and just return the accumulation of the numeric values if we don't. However, we're trying to generate a different expression now. I had a think about it and decided that a better way to approach the production of expressions was to:
  1. No longer delegate the insertion of operators to passed in procedures - they're all using infix notation, so rather than repeat the same pattern of insertion in each constructor we can simply have the constructor pass in the appropriate symbols for the two operations we need to know about: the normal operation to insert between each sub-expression and the operation to use when promoting a sub-expression that occurs multiple times. I've assumed these are in scope as operation and promoted-op respectively.
  2. Include the accumulation into the sub-expression counts. If we append it to the end of the list of counts as having occurred once then we can process all of the sub-expressions and the accumulation at once.
  3. Map the counts across into a list of sub-expressions (where they occurred once) and promoted sub-expressions (where they occurred multiple times). I.e. we can use map to examine each {sub-expression, count} tuple and, if the count is 1, map it to the sub-expression, otherwise promote it.
  4. Iterate through the mapped list and insert the operation's symbol between each mapped sub-expression. To do this we can use fold-left from exercise 2.38, start with an initial value of a list containing the first mapped sub-expression only and then, for each subsequent sub-expression, append the operator and the sub-expression to this list.
Here's my implementation:
(define (generate counts accumulation)
    (if (null? counts)
        accumulation
        (let ((full-reduction (if (= accumulation identity-element)
                                  counts
                                  (append counts (list (list accumulation 1))))))
          (let ((promoted-reduction (map (lambda (x) (if (= (cadr x) 1)
                                                         (car x)
                                                         (list (car x)
                                                               promoted-op
                                                               (cadr x))))
                                         full-reduction)))
            (if (null? (cdr promoted-reduction))
                (car promoted-reduction)
                (fold-left (lambda (x y) (append x (list operation y)))
                           (list (car promoted-reduction))
                           (cdr promoted-reduction)))))))
Now let's move on to process... First note that we've managed to simplify the null case. The updated generate now copes with the case when the accumulation equals the identity element, so we can simplify process's handling so that it just calls generate. Numbers can just be dealt with as before, as can the case when the head of the expression is not a sub-expression of the operation type we're reducing for. The only remaining case is when the head of the expression is a sub-expression of the operation type we're reducing for. In this case we can use is-operation? as the test - after all we've made the assumption that the operation we're processing for is in scope as operation, so there's no need to have an explicit procedure to test this passed in! The append that takes place is slightly different though. We're no longer dealing with prefix operators, so we can't just lop the top off the sub-expression and append that in front of the tail of the expression. Instead we need to split the sub-expression around the operator, and append both of these sub-expressions, in order, in front of the tail of the expression.

As usual, this can be expressed more cleaning in code than in English:
(define (process expression counts accumulation)
  (cond ((null? expression) (generate counts accumulation))
        ((number? (car expression))
         (process (cdr expression)
                  counts
                  (accumulator (car expression) accumulation)))
        ((is-operation? operation (car expression))
         (process (append (get-list-expression-before-op operation (car expression))
                          (get-list-expression-after-op operation (car expression))
                          (cdr expression))
                  counts
                  accumulation))
        (else (process (cdr expression)
                       (count-accumulate (car expression) counts)
                       accumulation))))
Putting all this together gives us a new version of reduce as follows:
(define (reduce operation promoted-op accumulator identity-element expression)
  (define (generate counts accumulation)
    (if (null? counts)
        accumulation
        (let ((full-reduction (if (= accumulation identity-element)
                                  counts
                                  (append counts (list (list accumulation 1))))))
          (let ((promoted-reduction (map (lambda (x) (if (= (cadr x) 1)
                                                         (car x)
                                                         (list (car x)
                                                               promoted-op
                                                               (cadr x))))
                                         full-reduction)))
            (if (null? (cdr promoted-reduction))
                (car promoted-reduction)
                (fold-left (lambda (x y) (append x (list operation y)))
                           (list (car promoted-reduction))
                           (cdr promoted-reduction)))))))
  (define (count-accumulate expression counts)
    (cond ((null? counts) (list (list expression 1)))
          ((equal? expression (caar counts))
           (cons (list expression (+ (cadar counts) 1)) (cdr counts)))
          (else (cons (car counts) (count-accumulate expression (cdr counts))))))
  (define (process expression counts accumulation)
    (cond ((null? expression) (generate counts accumulation))
          ((number? (car expression))
           (process (cdr expression)
                    counts
                    (accumulator (car expression) accumulation)))
          ((is-operation? operation (car expression))
           (process (append (get-list-expression-before-op operation (car expression))
                            (get-list-expression-after-op operation (car expression))
                            (cdr expression))
                    counts
                    accumulation))
          (else (process (cdr expression)
                         (count-accumulate (car expression) counts)
                         accumulation))))
  (if (and (pair? expression) (not (null? (cdr expression))))
      (process expression '() identity-element)
      (error "invalid expression -- REDUCE" expression)))
It's then a simple matter to modify make-sum and make-product to use this:
(define (make-sum . a)
  (reduce '+ '* + 0 a))

(define (make-product . p)
  (let ((reduction (reduce '* '** * 1 p)))
    (if (and (pair? reduction)
             (number? (last reduction))
             (= (last reduction) 0))
        0
        reduction)))
And to complete the set we can modify make-exponentiation so that it too uses infix notation:
(define (make-exponentiation base exponent)
  (cond ((=number? base 1) 1)
        ((=number? exponent 0) 1)
        ((=number? exponent 1) base)
        ((and (number? base) (number? exponent)) (expt base exponent))
        (else (list base '** exponent))))
Let's make some expressions!
> (make-sum 'a 'b 'c 1 2 3 'c 'd 'e 4 5 6 'a 'd 'f)
'((a * 2) + b + (c * 2) + (d * 2) + e + f + 21)
> (make-product 'a 'b 'c 1 2 3 'c 'd 'e 4 5 6 'a 'd 'f)
'((a ** 2) * b * (c ** 2) * (d ** 2) * e * f * 720)
> (make-sum 1 2 3)
6
> (make-product 4 5 6)
120
> (make-sum 'a 'a 'a)
'(a * 3)
> (make-product (make-sum 'a 'b) (make-sum 'a 'b))
'((a + b) ** 2)
Derive Me Crazy!
And, without further ado (or any changes to deriv), here it is in action:
> (deriv (make-product 'x 'y (make-sum 'x 3)) 'x)
'((x * y) + (y * (x + 3)))
> (deriv (make-product (make-product 'x 'y) (make-sum 'x 3)) 'x)
'((x * y) + (y * (x + 3)))
> (deriv (make-exponentiation 'a (make-sum 'a 'b)) 'a)
'((a + b) * (a ** (a + b + -1)))