2011-12-23

SICP Exercise 2.70: Yip a Yip Wah Boom!

The following eight-symbol alphabet with associated relative frequencies was designed to efficiently encode the lyrics of 1950s rock songs. (Note that the ``symbols'' of an ``alphabet'' need not be individual letters.)

A2NA16
BOOM1SHA3
GET2YIP9
JOB2WAH1

Use generate-huffman-tree (exercise 2.69) to generate a corresponding Huffman tree, and use encode (exercise 2.68) to encode the following message:

Get a job
Sha na na na na na na na na
Get a job
Sha na na na na na na na na
Wah yip yip yip yip yip yip yip yip yip
Sha boom

How many bits are required for the encoding? What is the smallest number of bits that would be needed to encode this song if we used a fixed-length code for the eight-symbol alphabet?

Okay, first let's build our Huffman tree:
> (define fifties-song-tree
    (generate-huffman-tree
      '((A 2) (BOOM 1) (GET 2) (JOB 2) (NA 16) (SHA 3) (YIP 9) (WAH 1))))
> fifties-song-tree
'((leaf NA 16)
  ((leaf YIP 9)
   (((leaf A 2) ((leaf WAH 1) (leaf BOOM 1) (WAH BOOM) 2) (A WAH BOOM) 4)
   ((leaf SHA 3) ((leaf JOB 2) (leaf GET 2) (JOB GET) 4) (SHA JOB GET) 7)
   (A WAH BOOM SHA JOB GET)
   11)
  (YIP A WAH BOOM SHA JOB GET)
  20)
 (NA YIP A WAH BOOM SHA JOB GET)
 36)
And now let's encode our song:
> (define encoded-song (encode '(GET A JOB
                                 SHA NA NA NA NA NA NA NA NA
                                 GET A JOB
                                 SHA NA NA NA NA NA NA NA NA
                                 WAH YIP YIP YIP YIP YIP YIP YIP YIP YIP 
                                 SHA BOOM)
                               fifties-song-tree))
> encoded-song
'(1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
  1 0 0 1 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 0 1
  0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1)
> (length encoded-song)
84
So we need 84 bits to encode the song using our Huffman tree.

How about if we use a fixed-length code? Well, we have eight symbols to encode so, using a fixed-length code, we could encode the message using 3 bits per symbol (as 23=8). As there are a total of 36 symbols to encode, that means we could encode this using a fixed-length code using 36×3=108 bits.

So using Huffman encoding has saved us 24 bits - a 22% reduction in the size of the message.

No comments:

Post a Comment