2011-12-23

SICP Exercise 2.71: Huffman With Power-of-2 Frequencies

Suppose we have a Huffman tree for an alphabet of n symbols, and that the relative frequencies of the symbols are 1, 2, 4, ..., 2n-1. Sketch the tree for n=5; for n=10. In such a tree (for general n) how many bits are required to encode the most frequent symbol? the least frequent symbol?

Before we sketch the trees out, let's check what the interpreter gives us:
> (generate-huffman-tree '((A 1) (B 2) (C 4) (D 8) (E 16)))
'(((((leaf A 1) (leaf B 2) (A B) 3)
    (leaf C 4)
    (A B C)
    7)
   (leaf D 8)
   (A B C D)
   15)
  (leaf E 16)
  (A B C D E)
  31)
> (generate-huffman-tree '((A 1) (B 2) (C 4) (D 8) (E 16)
                           (F 32) (G 64) (H 128) (I 256) (J 512)))
'((((((((((leaf A 1) (leaf B 2) (A B) 3)
         (leaf C 4)
         (A B C)
         7)
        (leaf D 8)
        (A B C D)
        15)
       (leaf E 16)
       (A B C D E)
       31)
      (leaf F 32)
      (A B C D E F)
      63)
     (leaf G 64)
     (A B C D E F G)
     127)
    (leaf H 128)
    (A B C D E F G H)
    255)
   (leaf I 256)
   (A B C D E F G H I)
   511)
  (leaf J 512)
  (A B C D E F G H I J)
  1023)
You'll note that in the tree formed, each non-leaf node has a leaf node as its right-child. When you sketch them out you get the following for n=5:
...and the following for n=10:
You'll note that:
  • When n=5, the tree is 5 levels deep. When we use a Huffman tree for encoding we generate a bit for each edge we traverse in the tree (not for each node we visit), so the longest path from the root to a leaf node, which is for the least frequent symbol, will generate a 4-bit encoding.
  • When n=10, the tree is 10 levels deep, so the longest path from the root to a leaf node will generate a 9-bit encoding.
  • In both trees the shortest path from the root to a leaf node, which is for the most frequent symbol, traverses a single edge and so will generate a 1-bit encoding
In general we can therefore state that, for an alphabet of n symbols where relative frequencies of the symbols are 1, 2, 4, ..., 2n-1, a Huffman tree will encode the most frequent symbol using a 1-bit encoding, while the least frequent symbol will be encoded using n-1 bits.

No comments:

Post a Comment