2011-10-03

SICP Exercise 2.38: MapReduce

The accumulate procedure is also known as fold-right, because it combines the first element of the sequence with the result of combining all the elements to the right. There is also a fold-left, which is similar to fold-right, except that it combines elements working in the opposite direction:
(define (fold-left op initial sequence)
  (define (iter result rest)
    (if (null? rest)
        result
        (iter (op result (car rest))
              (cdr rest))))
  (iter initial sequence))
What are the values of
(fold-right / 1 (list 1 2 3))
(fold-left / 1 (list 1 2 3))
(fold-right list nil (list 1 2 3))
(fold-left list nil (list 1 2 3))
Give a property that op should satisfy to guarantee that fold-right and fold-left will produce the same values for any sequence.

What the authors omit to point out in this exercise is that fold-left is also known as reduce. Not particularly relevant, but reduce, in combination with map, inspired a pretty nifty distributed processing framework that you may have heard of.

Okay, let's spin up the interpreter:
> (fold-right / 1 (list 1 2 3))
3/2
> (fold-left / 1 (list 1 2 3))
1/6
> (fold-right list nil (list 1 2 3))
'(1 (2 (3 ())))
> (fold-left list nil (list 1 2 3))
'(((() 1) 2) 3)
So what property does op have to satisfy to guarantee that fold-right and fold-left will produce the same values for any sequence?

Well fold-left and fold-right process the elements in opposite directions. Both start with the initial value. I.e. fold-left starts by applying op to the initial value and the first element, takes the result of this and applies op to this value and the second element and so on. However, fold-right starts by applying op to the initial value and the last element, takes the result of this and applies op to this value and the second-last element and so on.

This means that op must return the same result regardless of which order the operators are applied to them. In other words, op must be commutative.

SICP Exercise 2.37: Matrix Manipulation

Suppose we represent vectors v = (vi) as sequences of numbers, and matrices m = (mij) as sequences of vectors (the rows of the matrix). For example, the matrix
⎡ 1 2 3 4 ⎤
⎢ 4 5 6 6 ⎥
⎣ 6 7 8 9 ⎦
is represented as the sequence ((1 2 3 4) (4 5 6 6) (6 7 8 9)). With this representation, we can use sequence operations to concisely express the basic matrix and vector operations. These operations (which are described in any book on matrix algebra) are the following:
(dot-product v w)       returns the sum Σiviwi
(matrix-*-vector m v)   returns the vector t, where ti = Σjmijvj
(matrix-*-matrix m n)   returns the matrix p, where pij = Σkmiknkj
(transpose m)           returns the matrix n, where nij = mji
We can define the dot product as
(define (dot-product v w)
  (accumulate + 0 (map * v w)))
Fill in the missing expressions in the following procedures for computing the other matrix operations. (The procedure accumulate-n is defined in exercise 2.36.)
(define (matrix-*-vector m v)
  (map <??> m))
(define (transpose mat)
  (accumulate-n <??> <??> mat))
(define (matrix-*-matrix m n)
  (let ((cols (transpose n)))
    (map <??> m)))
The formal definition of matrix-*-vector given above effectively indicates that the generated vector should contain the results of calculating the dot products of each of the rows in m and the vector v. Now as the matrix m is represented as a sequence of sequences, with each sub-sequence representing the corresponding row in the matrix, mapping across the matrix will perform an operation on each row of the sequence. So all we need to do is provide a procedure that calculates the dot product of the row and v and we have our implementation:
(define (matrix-*-vector m v)
  (map (lambda (x) (dot-product x v)) m))
The procedure transpose needs to transform the matrix so that the rows become columns and vice versa. In other words, we need to take the first item from each row in turn and put them together, in order, into a sequence, then we take the second item from each row in turn and put them into a sequence, and so on, and put these resulting sequences together to form our transformed matrix.

Now we know that the procedure accumulate-n applies accumulate to the sequence containing the first items from each sequence in a sequence of sequences, and then cons the result onto the sequence produced by applying accumulate-n to the sequence of sequences produced by taking the tails of each of the original sequences. It just so happens that the sequence passed to accumulate in this process is the sequence we want, so all we need to do is to pass an appropriate operator and base case to accumulate-n so that it effectively turns the application of accumulate into an identity function. In this case the operator cons and the base case nil will achieve the desired effect:
(define (transpose mat)
  (accumulate-n cons nil mat))
Finally onto matrix-*-matrix. Matrix multiplication takes an a×b matrix, m, and a b×c matrix, n, and produces a new a×c matrix, p, where the value pij is calculated as the dot product of the ith row of m and the jth column of n. Handily the template we're given for matrix-*-matrix transforms the matrix n so that its rows become columns and vice versa. This means that each sub-sequence of cols will have the same length as each sub-sequence of m, and we can generate each row of the resulting matrix by taking the corresponding sub-sequence from m and applying matrix-*-vector to the matrix cols and that sub-sequence of m:
(define (matrix-*-matrix m n)
  (let ((cols (transpose n)))
    (map (lambda (x) (matrix-*-vector cols x))
         m)))
Okay, let's see these matrix operations in action:
> (define v (list 1 2 3))
> (define m1 (list (list 1 2 3) (list 4 5 6)))
> (define m2 (list (list 1 2) (list 3 4) (list 5 6)))
> (matrix-*-vector m1 v)
'(14 32)
> (transpose m1)
'((1 4) (2 5) (3 6))
> (matrix-*-matrix m1 m2)
'((22 28) (49 64))

2011-10-02

SICP Exercise 2.36: n-way Accumulation

The procedure accumulate-n is similar to accumulate except that it takes as its third argument a sequence of sequences, which are all assumed to have the same number of elements. It applies the designated accumulation procedure to combine all the first elements of the sequences, all the second elements of the sequences, and so on, and returns a sequence of the results. For instance, if s is a sequence containing four sequences, ((1 2 3) (4 5 6) (7 8 9) (10 11 12)), then the value of (accumulate-n + 0 s) should be the sequence (22 26 30). Fill in the missing expressions in the following definition of accumulate-n:
(define (accumulate-n op init seqs)
  (if (null? (car seqs))
      nil
      (cons (accumulate op init <??>)
            (accumulate-n op init <??>))))
What we need to do here is to get the sequence containing all the first elements of the sequences in seqs and apply accumulate to those, and then cons that result onto the head of the list produced by applying accumulate-n to the sequence of sequences where each sub-sequence is the tail of the corresponding sequence in seqs. We're given most of this in the template procedure. It applies accumulate to some sequence and then cons it onto the sequence returned by applying accumulate-n onto a sequence of sequences. All we need to do is to:
  • Get the sequence containing all the first elements of the sequences in seqs, so we can pass this to accumulate.
  • Get the sequence of sequences where each sub-sequence is the tail of the corresponding sequence in seqs, so we can pass this to the recursive call to accumulate-n.
We can achieve both of these via map. We know that we can extract the first element of a sequence by using car, so mapping seqs with the operation car will give us the sequence containing the first element of each sequence in seqs. Similarly, we know that we can extract the tail of a sequence by using cdr, so mapping seqs with the operation cdr will give us the sequence of sequences containing the tails of each sequence in seqs.

So here's the completed procedure:
(define (accumulate-n op init seqs)
  (if (null? (car seqs))
      nil
      (cons (accumulate op init (map car seqs))
            (accumulate-n op init (map cdr seqs)))))
And here it is in action:
> (define s (list (list 1 2 3) (list 4 5 6) (list 7 8 9) (list 10 11 12)))
> (accumulate-n + 0 s)
(22 26 30)
> (accumulate-n * 1 s)
(280 880 1944)