2011-12-22

SICP Exercise 2.69: Generating Huffman Trees

The following procedure takes as its argument a list of symbol-frequency pairs (where no symbol appears in more than one pair) and generates a Huffman encoding tree according to the Huffman algorithm.
(define (generate-huffman-tree pairs)
  (successive-merge (make-leaf-set pairs)))
Make-leaf-set is the procedure given above that transforms the list of pairs into an ordered set of leaves. Successive-merge is the procedure you must write, using make-code-tree to successively merge the smallest-weight elements of the set until there is only one element left, which is the desired Huffman tree. (This procedure is slightly tricky, but not really complicated. If you find yourself designing a complex procedure, then you are almost certainly doing something wrong. You can take significant advantage of the fact that we are using an ordered set representation.)

The procedure generate-huffman-tree calls successive-merge with the results of (make-leaf-set pairs). As noted above, this is an ordered set of leaves. To be more precise, it is a list of leaves, ordered by increasing frequency of occurrence. This means that, initially at least, we can find the smallest-weight elements of the set at the start of the set. If this were an invariant of the procedure then this would mean that we could always find the smallest-weight elements of the set at the start of the list. We can do this by defining successive-merge as a recursive procedure that, at each call, picks the first two nodes off the list, merges them, inserts the result of the merge into the appropriate place in the remainder of the list (to maintain the invariant) and then makes the recursive call with this result.

So provided we can perform two actions (merge a pair of elements and insert a node into the appropriate place in the list) then we will be able to do this... and thankfully the authors provide us with a couple of procedures that will let us do this. make-code-tree takes two nodes and produces a new node with an appropriate symbols list and weight and with those nodes as its children. adjoin-set takes a node and inserts it into the appropriate position in a list of nodes that are ordered by increasing weight.

All we need to do now is to provide a base case and we're done. Or, as we'll actually do, a couple of base cases. The first base case is when we have completed merging all of the nodes in a non-empty list of nodes. In this situation we have a list with one node in it, and we can simply return that node as the result. The second base case is an edge case - we need to deal with what happens when pairs is an empty list. In that case make-leaf-set also returns an empty list. To keep it consistent we'll do the same.

Okay, so we've got all the components we need, we know what we need them to do, so let's put them together:
(define (successive-merge set)
  (cond ((null? set) '())
        ((null? (cdr set)) (car set))
        (else (successive-merge
               (adjoin-set (make-code-tree (car set) (cadr set))
                           (cddr set))))))
To test this, let's use the frequencies that were used to generate sample-tree for exercise 2.67, and compare the result against sample-tree:
> (generate-huffman-tree '((A 4) (B 2) (C 1) (D 1)))
'((leaf A 4) ((leaf B 2) ((leaf D 1) (leaf C 1) (D C) 2) (B D C) 4) (A B D C) 8)
> sample-tree
'((leaf A 4) ((leaf B 2) ((leaf D 1) (leaf C 1) (D C) 2) (B D C) 4) (A B D C) 8)

No comments:

Post a Comment