2011-12-30

SICP Exercise 2.74: Insatiable Insanity

Insatiable Enterprises, Inc., is a highly decentralized conglomerate company consisting of a large number of independent divisions located all over the world. The company's computer facilities have just been interconnected by means of a clever network-interfacing scheme that makes the entire network appear to any user to be a single computer. Insatiable's president, in her first attempt to exploit the ability of the network to extract administrative information from division files, is dismayed to discover that, although all the division files have been implemented as data structures in Scheme, the particular data structure used varies from division to division. A meeting of division managers is hastily called to search for a strategy to integrate the files that will satisfy headquarters' needs while preserving the existing autonomy of the divisions.

Show how such a strategy can be implemented with data-directed programming. As an example, suppose that each division's personnel records consist of a single file, which contains a set of records keyed on employees' names. The structure of the set varies from division to division. Furthermore, each employee's record is itself a set (structured differently from division to division) that contains information keyed under identifiers such as address and salary. In particular:
  1. Implement for headquarters a get-record procedure that retrieves a specified employee's record from a specified personnel file. The procedure should be applicable to any division's file. Explain how the individual divisions' files should be structured. In particular, what type information must be supplied?
  2. Implement for headquarters a get-salary procedure that returns the salary information from a given employee's record from any division's personnel file. How should the record be structured in order to make this operation work?
  3. Implement for headquarters a find-employee-record procedure. This should search all the divisions' files for the record of a given employee and return the record. Assume that this procedure takes as arguments an employee's name and a list of all the divisions' files.
  4. When Insatiable takes over a new company, what changes must be made in order to incorporate the new personnel information into the central system?
Before we even begin on the exercise, it's worth noting that keying employment records by employee name is generally a bad idea. Names are not guaranteed to be unique identifiers of people. As a result Insatiable Enterprises may find itself in a situation where two employees in the same division share one employment record, where a division cannot employ someone because the division already employs someone with the same name, or where searching for an employee using find-employee-record returns only the first record encountered in all of the divisions' files, despite there being employees in different divisions with the same name.

Human Resource Management Systems (HRMS) that have had even the smallest amount of thought put into their data model will uses something like an employee ID or a government-issued ID (such as a social security number or national insurance number) as a key. Such IDs are intended to be unique, and so avoid the collisions.

Anyway, we'll run with employee name as the key since that's what the exercise states...

(a) Retrieving an Employee's Record for a Given Division
Before we start to address get-record itself, we should first have a think about the general problem. We have a potentially large number of divisions, each of which structures its data file in a different way. We wish to expose a common interface for accessing this data, regardless of which division the data is from, and so abstract away these differences. In order to do this we will need to be able to apply operations (such as get-record) to any of the divisions' files and successfully navigate the data structures and perform the appropriate processing.

As noted in the exercise, a strategy based upon data-directed programming can be used to achieve this without ending up with a large amount of per-data structure logic (and a large maintenance headache!) embedded in each procedure in our common interface. Instead we can install operations on a per-type basis in a table, tag each data structure with an appropriate type tag and then use the tag embedded in a particular piece of data to retrieve the appropriate version of an operation we wish to apply.

In section 2.4.3 the authors provide us with two procedures for installing and retrieving operations: get and put respectively (and, if you're using DrRacket, I gave the appropriate magic for getting these working in the last exercise). These two procedures have a pair of common operands: the name of the operation and the type of the data the operation is applied to. Given this, it seems obvious that the name of the operation should correspond to the operation we are trying to apply (such as get-record), while the type of the data should be a unique per-division identifier. If we continue with the "names as keys" theme then this could be as simple as the division's name.

Section 2.4.3 also shows us that, provided the data is tagged with the type, we can then extract the type from the data and use that to retrieve operations appropriate to the data type. This then answers the question posed in the exercise: "what type information must be supplied?" Each division's file must be tagged with the division's identifier under which its operations are registered. We can perform the tagging and extract both the tag and the data using the procedures attach-tag, type-tag and contents, which were introduced in section 2.4.2.

Now let's start thinking about implementing get-record...

First let's extend the statement from the previous paragraph a bit further. Not only must each division's file must be tagged with the division's identifier under which its operations are registered, but any data structure returned by our common interface to which further data-directed operations may be applied must be tagged with the division's identifier.

For example, get-record is going to return a record to which get-salary may be applied. In order for get-salary to find the appropriate per-division operation to apply to retrieve the salary the record itself must be tagged with the division's identifier. Rather than requiring each division's implementation of get-record to perform this tagging we can actually move the tagging into the common get-record implementation, simplifying the process.

We also need to think about what we want the actual interface to be. Let's make it a requirement that each division's get-record operation returns the record (untagged) if it's present, and returns #f if it's not. That way we can try to retrieve the record, check to see if we've got it and, if so, tag it with the division's tag.

That should give us enough to implement get-record:
(define (get-record employee-name personnel-file)
  (let ((record ((get 'get-record (type-tag personnel-file))
                   employee-name
                   (contents personnel-file))))
    (if record
        (attach-tag (type-tag personnel-file) record)
        #f)))
We'll test these out at the end, where I'll also introduce a couple of divisions' personnel files and some extensions.

(b) Retrieving an Employee's Salary from their Employee Record
It's worth noting that we've already answered the question posed in this part of the exercise: "How should the record be structured in order to make this operation work?" We previously stated that "...any data structure returned by our common interface to which further data-directed operations may be applied must be tagged with the division's identifier" and ensured that the implementation of get-record applied the tag for us automatically.

Given that employee records returned by the common interface are tagged in this way we can then similarly use the tag to retrieve a pre-installed per-division operation with the appropriate name (i.e. 'get-salary) and then apply that to retrieve the employee's salary. In this case, however, get-salary doesn't need to tag it with the division's identifier, as the salary is not a "data structure returned by our common interface to which further data-directed operations may be applied" - it's simply a numeric value.

All we need to do then for get-salary is to use the tag from the record to get the appropriate operation to apply to extract the salary and then apply this operation to the content of the record:
(define (get-salary record)
  ((get 'get-salary (type-tag record)) (contents record)))
We can similarly produce a get-address procedure:
(define (get-address record)
  ((get 'get-address (type-tag record)) (contents record)))

(c) Finding an Employee Across Multiple Divisions
For this part we're told to assume that the find-employee-record procedure we implement takes not only the employee name, but also a list of all the divisions' files as arguments. Personally I'd prefer it if the latter argument weren't necessary and that part of installing a division's procedures were also registering the file. If I've got some spare time I'll post a followup showing how this could be achieved. However, in the meantime, we've got a list of files...

Now we've already written a procedure, get-record, that retrieves a employee's record by employee name from a specified division's file or, by our definition, returns #f if the record is not present. So finding an employee's record across a list of divisions' files should just be a case of iterating through the files, trying to find the employee's record in each one, returning it as soon as we find it or returning #f if it's not present in any of the divisions' files.

Here's one implementation that will achieve this:
(define (find-employee-record employee-name personnel-files)
  (if (null? personnel-files)
      #f
      (let ((record (get-record employee-name (car personnel-files))))
        (if record
            record
            (find-employee-record employee-name (cdr personnel-files))))))
(d) Insatiable Ingesting Another Company
Given the framework we've put in place this should provide a fairly low bar for Insatiable Enterprises, Inc. to incorporate the personnel files of a newly acquired company into its systems... Assuming, of course, that they happen to have implemented their HRMS in Scheme, or in a way that is easily interfaced to Scheme. All that should be required is to:
  1. Define a unique division ID under which procedures specific to the division can be installed.
  2. Write the set of procedures that comply with the common interface requirements and that retrieve the required information from the division's personnel file.
  3. Write a procedure to install these procedures as operations under the appropriate keys.
  4. Invoke this procedure, installing the operations.
  5. Update the list of division files to include the new division's file, suitably tagged.
Note the last step. I've assumed there's a list of all divisions' files kicking around somewhere that has each division's file tagged with its division ID so that this can be used for find-employee-record.

Putting it Into Practice
Okay, that's the common interface defined and we've also specified a set of steps for adding a newly acquired company's personnel file into the system. Let's put it into practice. We'll define a couple of divisions' personnel files using different data structures, give the divisions IDs, produce the required common interface procedures, install them as operations, build a list of divisions' files and try it out...

For our first division's representation we can take the binary search tree set representation introduced in the subsection Sets as binary trees, and the lookup mechanism introduced in the subsection Sets and information retrieval, and modify it so that it supports Scheme String keys (using the string=?, string<? and string>? comparators):
(define (entry tree) (car tree))
(define (left-branch tree) (cadr tree))
(define (right-branch tree) (caddr tree))
(define (make-tree entry left right)
  (list entry left right))

(define (list->tree elements)
  (car (partial-tree elements (length elements))))

(define (partial-tree elts n)
  (if (= n 0)
      (cons '() elts)
      (let ((left-size (quotient (- n 1) 2)))
        (let ((left-result (partial-tree elts left-size)))
          (let ((left-tree (car left-result))
                (non-left-elts (cdr left-result))
                (right-size (- n (+ left-size 1))))
            (let ((this-entry (car non-left-elts))
                  (right-result (partial-tree (cdr non-left-elts)
                                              right-size)))
              (let ((right-tree (car right-result))
                    (remaining-elts (cdr right-result)))
                (cons (make-tree this-entry left-tree right-tree)
                      remaining-elts))))))))

(define (key record)
  (car record))

(define (lookup given-key set-of-records)
  (if (null? set-of-records)
      #f
      (let ((entry-key (key (entry set-of-records))))
        (cond ((string=? given-key entry-key) (entry set-of-records))
              ((string<? given-key entry-key)
               (lookup given-key (left-branch set-of-records)))
              ((string>? given-key entry-key)
               (lookup given-key (right-branch set-of-records)))))))
Using this we can unveil to the world Unquenchable Industries, which uses a tree keyed on employee name to hold each record, with each record consisting of a tree where the fields are keyed by field name (i.e. "address" and "salary"):
              
(define unquenchable-industries-employees
  (list->tree (list (cons "John Doe"
                          (list->tree (list (cons "address" "55 Ghetto Grove")
                                            (cons "salary" 100))))
                    (cons "Myddle Mann"
                          (list->tree (list (cons "address" "24 Suburb Street")
                                            (cons "salary" 40000))))
                    (cons "Ritchie Rich"
                          (list->tree (list (cons "address" "1 Park Avenue")
                                            (cons "salary" 250000)))))))
We'll give our division the following ID:
(define unquenchable-industries-id 'unquenchable-industries)
Now let's implement our common interface procedures. There are just three of these in our definitions above: get-record, get-salary and get-address. The functionality we require of the 'get-record operation is provided directly by lookup - it looks up an entry by its key, returning the entry if it's present or #f if it's not. Note that lookup returns a pair with the key as the first element and the data as the second. This means that our 'get-salary and 'get-address operations will need to first extract the tree holding the record, lookup the appropriate field and, if it's present, extract the data it holds. We'll need to wrap all these up in a procedure that installs them under the appropriate operation and division keys.

Here's the procedures defined and installed:
(define (install-unquenchable-industries)
  (define (get-salary record)
    (let ((salary (lookup "salary" (cdr record))))
      (if salary
          (cdr salary)
          #f)))
  (define (get-address record)
    (let ((address (lookup "address" (cdr record))))
      (if address
          (cdr address)
          #f)))
  ; Install the procedures
  (put 'get-record unquenchable-industries-id lookup)
  (put 'get-salary unquenchable-industries-id get-salary)
  (put 'get-address unquenchable-industries-id get-address))

(install-unquenchable-industries)
As noted above, the last step is to tag the division's file and create the list of divisions' files:
(define personnel-files
  (list
   (attach-tag unquenchable-industries-id unquenchable-industries-employees)))
Using this we can find employee records and extract information about them:
> (get-salary (find-employee-record "Ritchie Rich" personnel-files))
250000
> (get-address (find-employee-record "John Doe" personnel-files))
"55 Ghetto Grove"
Now in a dramatic and stock-market unsettling move, Insatiable Industries, Inc. acquires Rapacious Ventures. Rapacious Ventures happen to also use a Scheme-based HRMS but, being a bit more forward thinking, they use nested R6RS hashtables. Here's Rapacious Ventures' personnel file:
(require rnrs/hashtables-6)

(define rapacious-ventures-employees (make-hashtable string-hash string=?))
(hashtable-set! rapacious-ventures-employees
           "Fred Blogs"
           (let ((h (make-hashtable string-hash string=?)))
             (hashtable-set! h "salary" 12345)
             (hashtable-set! h "address" "123 Wibble Street")
             h))
(hashtable-set! rapacious-ventures-employees
           "John Smith"
           (let ((h (make-hashtable string-hash string=?)))
             (hashtable-set! h "salary" 30000)
             (hashtable-set! h "address" "42 Nowhere Road")
             h))
(hashtable-set! rapacious-ventures-employees
           "Jack Copper"
           (let ((h (make-hashtable string-hash string=?)))
             (hashtable-set! h "salary" 25000)
             (hashtable-set! h "address" "99 Letsbee Avenue")
             h))
Let's bring Rapacious Ventures into the Insatiable Enterprises, Inc. fold...

First we give the new division an ID:
(define rapacious-ventures-id 'rapacious-ventures)
Next we implement and install the common interface procedures. We can look up entries in their personnel file using hashtable-ref, which takes the hashtable to use, the key we want to lookup and a default value to return if the key is not present. If we're happy with the employee record missing the employee's name then this is pretty close to the 'get-record operation, so we can simply wrap this in a procedure that wires through the appropriate operands, and fixing #f as the default value. Similarly, our implementations for the 'get-salary and 'get-address operations can simply call hashtable-ref with the appropriate key:
(define (install-rapacious-ventures)
  (define (get-record employee-name personnel-file)
    (hashtable-ref personnel-file employee-name #f))
  (define (get-salary record)
    (hashtable-ref record "salary" #f))
  (define (get-address record)
    (hashtable-ref record "address" #f))
  ; Install the procedures
  (put 'get-record rapacious-ventures-id get-record)
  (put 'get-salary rapacious-ventures-id get-salary)
  (put 'get-address rapacious-ventures-id get-address))

(install-rapacious-ventures)
Finally, we extend the tag the division's file and add it in to the the list of divisions' files:
(define personnel-files
  (list
   (attach-tag unquenchable-industries-id unquenchable-industries-employees)
   (attach-tag rapacious-ventures-id rapacious-ventures-employees)))
Using this we should still be able to access Unquenchable Industries' employee records and access Rapacious Ventures employee records:
> (get-salary (find-employee-record "Myddle Mann" personnel-files))
40000
> (get-address (find-employee-record "Ritchie Rich" personnel-files))
"1 Park Avenue"
> (get-salary (find-employee-record "John Smith" personnel-files))
30000
> (get-address (find-employee-record "Jack Copper" personnel-files))
"99 Letsbee Avenue"

2011-12-24

SICP Exercise 2.73: Data-Directed Derivatives

Section 2.3.2 described a program that performs symbolic differentiation:
(define (deriv exp var)
  (cond ((number? exp) 0)
        ((variable? exp) (if (same-variable? exp var) 1 0))
        ((sum? exp)
         (make-sum (deriv (addend exp) var)
                   (deriv (augend exp) var)))
        ((product? exp)
         (make-sum
           (make-product (multiplier exp)
                         (deriv (multiplicand exp) var))
           (make-product (deriv (multiplier exp) var)
                         (multiplicand exp))))
        (else (error "unknown expression type -- DERIV" exp))))
We can regard this program as performing a dispatch on the type of the expression to be differentiated. In this situation the "type tag" of the datum is the algebraic operator symbol (such as +) and the operation being performed is deriv. We can transform this program into data-directed style by rewriting the basic derivative procedure as
(define (deriv exp var)
   (cond ((number? exp) 0)
         ((variable? exp) (if (same-variable? exp var) 1 0))
         (else ((get 'deriv (operator exp)) (operands exp)
                                            var))))
(define (operator exp) (car exp))
(define (operands exp) (cdr exp))
  1. Explain what was done above. Why can't we assimilate the predicates number? and same-variable? into the data-directed dispatch?
  2. Write the procedures for derivatives of sums and products, and the auxiliary code required to install them in the table used by the program above.
  3. Choose any additional differentiation rule that you like, such as the one for exponents (exercise 2.56), and install it in this data-directed system.
  4. In this simple algebraic manipulator the type of an expression is the algebraic operator that binds it together. Suppose, however, we indexed the procedures in the opposite way, so that the dispatch line in deriv looked like
    ((get (operator exp) 'deriv) (operands exp) var)
    
    What corresponding changes to the derivative system are required?
Before we get started let's give ourselves the get and put procedures we need... As the book notes, we discuss the construction of such procedures in section 3.3.3. To save you looking ahead, here's the appropriate code you'll need if you're following along... I'll note (yet again) that I'm using DrRacket as my scheme interpreter - in order to get this to fly I had to include a couple of packages:
#lang racket

(require rnrs/base-6)
(require rnrs/mutable-pairs-6)

(define (assoc key records)
  (cond ((null? records) false)
        ((equal? key (caar records)) (car records))
        (else (assoc key (cdr records)))))

(define (make-table)
  (let ((local-table (list '*table*)))
    (define (lookup key-1 key-2)
      (let ((subtable (assoc key-1 (cdr local-table))))
        (if subtable
            (let ((record (assoc key-2 (cdr subtable))))
              (if record
                  (cdr record)
                  false))
            false)))
    (define (insert! key-1 key-2 value)
      (let ((subtable (assoc key-1 (cdr local-table))))
        (if subtable
            (let ((record (assoc key-2 (cdr subtable))))
              (if record
                  (set-cdr! record value)
                  (set-cdr! subtable
                            (cons (cons key-2 value)
                                  (cdr subtable)))))
            (set-cdr! local-table
                      (cons (list key-1
                                  (cons key-2 value))
                            (cdr local-table)))))
      'ok)    
    (define (dispatch m)
      (cond ((eq? m 'lookup-proc) lookup)
            ((eq? m 'insert-proc!) insert!)
            (else (error "Unknown operation -- TABLE" m))))
    dispatch))

(define operation-table (make-table))
(define get (operation-table 'lookup-proc))
(define put (operation-table 'insert-proc!))

(a) What Was Done

In the original version the type of an expression was tested through various predicates. Two of these, sum? and product?, relied upon the prefix notation used for the expressions to extract the car of the expression and compare it to the symbol corresponding to the operation. In order to move to a data-directed style of implementation a new procedure, operator, has been introduced to extract the symbol corresponding to the current operation. A corresponding procedure, operands, has also been introduced that extracts the operands for the operation. This also makes use of the fact that expressions defined using prefix notation, simply returning the cdr of the expression.

The former of these two procedures is used to obtain the type tag (i.e. '+ or '*) to use with get to retrieve the appropriate 'deriv procedure to apply. The latter procedure is then used to obtain the operands to pass to the obtained procedure, and the obtained procedure is invoked.

This allows the deriv procedure to support differentiation of any number of different operations to be supported without requiring continual updates to the procedure - provided, of course, that the operations are expressed using prefix notation. Adding a new operation is simply a case of registering the corresponding procedure under the 'deriv operation with the appropriate symbol corresponding to the type.

Unfortunately, number? and variable? cannot be handled with the implementation as it stands, as these are not expressed as a list with an operator as their first element and operands as their tail. As a result the implementations of operator and operands will not be able to cope with them - the interpreter will raise an error if they are applied to numbers or variables. This prevents us from registering appropriate procedures to cope with them (and so, by proxy, prevents us from encapsulating same-variable? in a procedure and registering that).

Note that we could provide a work-around for this by extending operator and operands appropriately. We could extend operator so that it checks whether or not an expression is a number or variable before returning the car of the expression. If it's either of the two then we could return proxy operations (e.g. 'number and 'variable) as appropriate. This would allow us to register procedures. Similarly, we could extend operator so that it checks whether it's dealing with a list or not before returning the cdr of the expression. If it's not a list then we can simply return a single element list. This would lead to the following implementation:
(define (deriv exp var)
  ((get 'deriv (operator exp)) (operands exp) var))

(define (operator exp)
  (cond ((number? exp) 'number)
        ((variable? exp) 'variable)
        (else (car exp))))

(define (operands exp)
  (if (pair? exp)
      (cdr exp)
      (list exp)))
Following the packaging mechanism introduced in section 2.4.3 we could then install the appropriate procedures as follows:
(define (install-number-routines)
  (define (derivative ops var) 0)
  (put 'deriv 'number derivative))
    
(define (install-variable-routines)
  (define (derivative ops var)
    (if (same-variable? (car ops) var) 1 0))
  (put 'deriv 'variable derivative))

(install-number-routines)
(install-variable-routines)
We won't use this though - we'll stick to the exercise!

(b) Data-Directed Derivatives of Sums and Products

The procedures for derivatives of sums and products correspond very closely to the appropriate consequent expressions in their corresponding clauses in the original implementations. However, note that the implementations of addend, augend, multiplier and multiplicand provided in the book will not work for us here as they assume that they are operating on a complete expression in prefix form (i.e. a list of 3 elements with the operator as the first element and the operands as the second and third elements), whereas the procedures will be dealing with a list containing just the operands.

Rather than redefine the selectors, we'll simply process the operands directly. We can access the first operand using car, and the second operand using cadr. So we simply replace invocations of addend and multiplier on the expression with caring the operands and, similarly, replace invocations of augend and multiplicand on the expression with cadring the operands.

We then need to install the procedures into the table used by the program. We saw an example of doing this above, where we saw how we could deal with number? and variable? by abusing the procedures operator and operands... You may have noticed above that the inner procedures that were actually installed above were both called derivative rather than deriv. There was a reason for that, and it's here that we can see the reason...

The calculation of derivatives for both sums and products requires recursive calls to the deriv procedure defined above. If we don't rename the inner procedures that calculate the derivatives for sums and products to something other than deriv then the in-scope association for deriv will be to the inner procedure being defined, and so the inner procedures' recursive calls will be to themselves. To avoid this (which would cause the calls to ultimately fail) we must rename the inner procedures to something other than deriv. I chose to go with the expanded name, derivative, and applied this name change to the procedures for dealing with numbers and variables too in order to maintain consistency among the packages.

Anyway, here are the packages for sums and products:
(define (install-sum-routines)
  (define (derivative ops var)
    (make-sum
     (deriv (car ops) var)
     (deriv (cadr ops) var)))
  (put 'deriv '+ derivative))

(define (install-product-routines)
  (define (derivative ops var)
    (make-sum
     (make-product (car ops)
                   (deriv (cadr ops) var))
     (make-product (deriv (car ops) var)
                   (cadr ops))))
  (put 'deriv '* derivative))

(install-sum-routines)
(install-product-routines)
As usual, we'll test them out. In this case I got a somewhat unexpected result for the third one:
> (deriv '(+ x 3) 'x)
1
> (deriv '(* x y) 'x)
'y
> (deriv '(* (* x y) (+ x 3)) 'x)
(mcons
 '+
 (mcons
  (mcons '* (mcons 'x (mcons 'y '())))
  (mcons (mcons '* (mcons 'y (mcons (mcons '+ (mcons 'x (mcons 3 '()))) '()))) '())))
If you recall from the book, the result for calculating the derivative of the last expression should be as follows:
> (deriv '(* (* x y) (+ x 3)) 'x)
'(+ (* x y) (* y (+ x 3)))
Now if you look closely at the result we actually get then you'll notice that there is some similarity to the expected result. In fact, if you were to replace mcons with cons in the produced result and evaluate it then you would get the expected result.

So what's going on? Well, the R6RS specification banished procedures such as set-cdr! to their own library, as these can give rise to mutable pairs. LISP and Scheme are often referred to as functional programming languages and it's my understanding that this change in R6RS is an effort to isolate procedures that have side-effects from the standard language, as these are not true functions. It's also my understanding that including this library under DrRacket effectively results in replacing the normal (immutable pair) implementations of cons, car, etc. with their mutable pair equivalents, which includes replacing cons with mcons, and that's what we're seeing here.

If you want more details on this then you can read your way through the DrRacket team's original post where they discuss these changes. Googling for "mutable pair racket" will also throw up a bunch of results that will give you further info.

(c) Data-Directed Derivatives of Exponents

In exercise 2.56 we extended the original deriv procedure to support exponents by using '** as the operator representing an exponent and adding in the following clause into the cond:
        …
        ((exponentiation? exp)
         (make-product
           (make-product (exponent exp)
                         (make-exponentiation (base exp)
                                              (make-sum (exponent exp) -1)))
           (deriv (base exp) var)))
        …
Converting this to the data-directed version is again straightforward. The two operands to the operator are the base and exponent, in that order, so we can use car and cadr to extract these in an analogous manner to sums and products and so convert the consequent expression into a derivative procedure. All that remains is to install it into the table for the 'deriv operation under the '** type. This gives the following implementation:
(define (install-exponent-routines)
  (define (derivative ops var)
    (make-product
     (make-product (cadr ops)
                   (make-exponentiation (car ops)
                                        (make-sum (cadr ops) -1)))
     (deriv (car ops) var)))
  (put 'deriv '** derivative))
After installing this we can - in order to get this to fly I st examples as we used in exercise 2.56:
> (deriv (make-exponentiation 'a 5) 'a)
(mcons '* (mcons 5 (mcons (mcons '** (mcons 'a (mcons 4 '()))) '())))
> (deriv (make-exponentiation 'a 'b) 'a)
(mcons
 '*
 (mcons
  'b
  (mcons (mcons '** (mcons 'a (mcons (mcons '+ (mcons 'b (mcons -1 '()))) '()))) '())))
> (deriv (make-exponentiation 'a (make-sum 'a 'b)) 'a)
(mcons
 '*
 (mcons
  (mcons '+ (mcons 'a (mcons 'b '())))
  (mcons
   (mcons
    '**
    (mcons
     'a
     (mcons (mcons '+ (mcons (mcons '+ (mcons 'a (mcons 'b '()))) (mcons -1 '()))) '())))
   '())))
For comparison, here's the results we got in exercise 2.56:
> (deriv (make-exponentiation 'a 5) 'a)
(* 5 (** a 4))
> (deriv (make-exponentiation 'a 'b) 'a)
(* b (** a (+ b -1)))
> (deriv (make-exponentiation 'a (make-sum 'a 'b)) 'a)
(* (+ a b) (** a (+ (+ a b) -1)))
Again, if you were to replace mcons with cons in the produced results and evaluate them then you would get the results we originally got.

(d) Inverted Indexing

For the final part of the exercise, the authors question what changes would be required if we were to index the procedures in the opposite way, replacing the dispatch line in deriv with:
((get (operator exp) 'deriv) (operands exp) var)
Now the naïve solution would be to simply modify all of our install-XXX-routines procedures above, swapping the 'deriv and operator symbol entries around as the operands to put, so that the operator symbol became the operation and 'deriv became the type. This would work. However, it doesn't quite capture the subtlety of the change that is made...

By swapping the order the authors have changed the types of packages that should be registered in our system. In the first three parts of the exercise a package encapsulates operations related to a particular arithmetic operation. We have a package for sums, a package for products and a package for exponents. Sure, each package only installs one procedure, a way of calculating derivatives involving that arithmetic operation, but this is the implied packaging. However, for the fourth part of the operation, a package should encapsulate a set of procedures for processing an arithmetic expression, and so we should have a single package that installs procedures that can be used to calculate the derivatives of an arithmetic expression that involves various arithmetic operations. In this vein we could also, for example, have a single package that defines how to evaluate installs procedures that can be used to evaluate an arithmetic expression, or reduce it to minimal terms.

So our solution here should have a single procedure that contains inner procedures for each supported arithmetic operation and registers all of those. Here's one possible implementation:
(define (install-derivative-routines)
  (define (sum ops var)
    (make-sum
     (deriv (car ops) var)
     (deriv (cadr ops) var)))
  (define (product ops var)
    (make-sum
     (make-product (car ops)
                   (deriv (cadr ops) var))
     (make-product (deriv (car ops) var)
                   (cadr ops))))
  (define (exponent ops var)
    (make-product
     (make-product (cadr ops)
                   (make-exponentiation (car ops)
                                        (make-sum (cadr ops) -1)))
     (deriv (car ops) var)))
  (put '+ 'deriv sum)
  (put '* 'deriv product)
  (put '** 'deriv exponent))
If we update our deriv procedure as described and install the derivative routines we can, of course, run all the tests again:
> (deriv '(+ x 3) 'x)
1
> (deriv '(* x y) 'x)
'y
> (deriv '(* (* x y) (+ x 3)) 'x)
(mcons
 '+
 (mcons
  (mcons '* (mcons 'x (mcons 'y '())))
  (mcons (mcons '* (mcons 'y (mcons (mcons '+ (mcons 'x (mcons 3 '()))) '()))) '())))
> (deriv (make-exponentiation 'a 5) 'a)
(mcons '* (mcons 5 (mcons (mcons '** (mcons 'a (mcons 4 '()))) '())))
> (deriv (make-exponentiation 'a 'b) 'a)
(mcons
 '*
 (mcons
  'b
  (mcons (mcons '** (mcons 'a (mcons (mcons '+ (mcons 'b (mcons -1 '()))) '()))) '())))
> (deriv (make-exponentiation 'a (make-sum 'a 'b)) 'a)
(mcons
 '*
 (mcons
  (mcons '+ (mcons 'a (mcons 'b '())))
  (mcons
   (mcons
    '**
    (mcons
     'a
     (mcons (mcons '+ (mcons (mcons '+ (mcons 'a (mcons 'b '()))) (mcons -1 '()))) '())))
   '())))
As you can see, these results match the mutable pair results we saw previously.

2011-12-23

SICP Exercise 2.72: Efficiency of Huffman Encoding

Consider the encoding procedure that you designed in exercise 2.68. What is the order of growth in the number of steps needed to encode a symbol? Be sure to include the number of steps needed to search the symbol list at each node encountered. To answer this question in general is difficult. Consider the special case where the relative frequencies of the n symbols are as described in exercise 2.71, and give the order of growth (as a function of n) of the number of steps needed to encode the most frequent and least frequent symbols in the alphabet.

Here's the encoding procedure, encode-symbol, we designed in exercise 2.68:
(define (encode-symbol symbol tree)
  (cond ((not (memq symbol (symbols tree)))
         (error "bad symbol -- ENCODE-SYMBOL" symbol))
        ((leaf? tree) '())
        ((memq symbol (symbols (left-branch tree)))
         (cons 0 (encode-symbol symbol (left-branch tree))))
        ((memq symbol (symbols (right-branch tree)))
         (cons 1 (encode-symbol symbol (right-branch tree))))))
And to recap, the summary of exercise 2.71 was that, for an alphabet of n symbols where relative frequencies of the symbols are 1, 2, 4, ..., 2n-1, a Huffman tree will encode the most frequent symbol using a 1-bit encoding, while the least frequent symbol will be encoded using n-1 bits. The reasoning for these bit lengths is that for such an alphabet generate-huffman-tree will produce a tree that is n levels deep in which each non-leaf node has a right child node that is a leaf. The most frequent symbol will be in the right child of the root node (and so only one edge needs to be traversed to get to the symbol) while the least frequent symbol will be in the nth level of the tree, in the left-most branch of the tree (and so n-1 edges need to be traversed to get to the symbol).

As with previous exercises we will assume that list operations (such as list, cons, car, cadr and caddr) are all Θ(1) operations. Since left-branch and right-branch simply delegate to car and cadr respectively this means we can assume they're Θ(1) too. If we also assume that eq? is Θ(1) then this allows us to assume that leaf? is Θ(1) too... and if leaf? is Θ(1) then that means symbols will also be Θ(1), as it performs a fixed number of operations, all of which are Θ(1).

Now, how about memq? Well, memq iterates through a list and compares each element to the specified symbol. If it matches then it returns the tail of the list starting at the first matching element, otherwise it returns #f. Now iterating through a list is Θ(n), and we've already stated that we're assuming list operations are Θ(1), so this will make memq an Θ(n) operation.

Okay, now we've stated all our assumptions about the orders of growth of the operations used by encode-symbol. Now we can go on to calculate the best case (when the symbol is the most frequent in the alphabet) and worst case (when the symbol is the least frequent in the alphabet) orders of growth.

Let's start with the best case: encoding the most frequent symbol. In this case the symbol's corresponding leaf node is the right child node of the root, and so the order of operations is:
  1. Start at the root of the tree.
  2. Ensure that symbol is in the list of symbols for the root - otherwise we'd throw an error. The check uses symbols and memq, and so this is Θ(n+1)=Θ(n).
  3. Check whether the root is a leaf node (which it's not) using leaf?, an Θ(1) operation.
  4. Check whether the symbol is under the left branch of the tree (which it's not). This uses left-branch, symbols and memq, and so this is Θ(1+n+1)=Θ(n).
  5. Check whether the symbol is under the right branch of the tree which it is). This uses right-branch, symbols and memq, and so is also Θ(1+n+1)=Θ(n).
  6. cons the value 1 onto the result of invoking encode-symbol on the right branch of the tree. cons is an Θ(1) operation. The recursive call works out at Θ(n+1+1)=Θ(n), using the following two steps:
    1. Ensure that symbol is in the list of symbols for the node (Θ(n+1)).
    2. Check whether the node is a leaf node, which it is, so return the empty list (Θ(1)).
Put that all together and you get an order of growth when the symbol is the most frequent in the alphabet of Θ(4n+2)=Θ(n).

Now let's move onto the worst case: encoding the least frequent symbol. In this case the symbol's corresponding leaf node is the left most child node at the lowest level of the tree. That means that at each non-leaf node we'll need to perform the following steps:
  1. Ensure that symbol is in the list of symbols for the root (Θ(n)).
  2. Check whether the root is a leaf node, which it's not (Θ(1)).
  3. Check whether the symbol is under the left branch of the tree, which it is (Θ(n)).
  4. cons the value 0 onto the result of invoking encode-symbol on the left branch of the tree (Θ(1), plus the cost of the recursive call).
So, if we ignore the the recursive call, for any non-leaf node we process the order of growth is Θ(2n+2)=Θ(n). Starting at the root there'll be n-1 non-leaf nodes we'll need to process before we reach the leaf node corresponding to the symbol we're looking for, so the order of growth of processing all of the non-leaf nodes will be Θ(n(n-1))=Θ(n2-n)=Θ(n2).

We just need to incorporate the order of growth of processing a leaf node into this. We already calculated this as Θ(n) in step 6 for the best case, so the total order of growth for encoding the least frequent symbol is Θ(n2+n)=Θ(n2).

So, to summarize, for an alphabet of n symbols where relative frequencies of the symbols are 1, 2, 4, ..., 2n-1, a Huffman tree will encode:
  • The most frequent symbol with an order of growth of Θ(n)
  • The least frequent symbol with an order of growth of Θ(n2)

SICP Exercise 2.71: Huffman With Power-of-2 Frequencies

Suppose we have a Huffman tree for an alphabet of n symbols, and that the relative frequencies of the symbols are 1, 2, 4, ..., 2n-1. Sketch the tree for n=5; for n=10. In such a tree (for general n) how many bits are required to encode the most frequent symbol? the least frequent symbol?

Before we sketch the trees out, let's check what the interpreter gives us:
> (generate-huffman-tree '((A 1) (B 2) (C 4) (D 8) (E 16)))
'(((((leaf A 1) (leaf B 2) (A B) 3)
    (leaf C 4)
    (A B C)
    7)
   (leaf D 8)
   (A B C D)
   15)
  (leaf E 16)
  (A B C D E)
  31)
> (generate-huffman-tree '((A 1) (B 2) (C 4) (D 8) (E 16)
                           (F 32) (G 64) (H 128) (I 256) (J 512)))
'((((((((((leaf A 1) (leaf B 2) (A B) 3)
         (leaf C 4)
         (A B C)
         7)
        (leaf D 8)
        (A B C D)
        15)
       (leaf E 16)
       (A B C D E)
       31)
      (leaf F 32)
      (A B C D E F)
      63)
     (leaf G 64)
     (A B C D E F G)
     127)
    (leaf H 128)
    (A B C D E F G H)
    255)
   (leaf I 256)
   (A B C D E F G H I)
   511)
  (leaf J 512)
  (A B C D E F G H I J)
  1023)
You'll note that in the tree formed, each non-leaf node has a leaf node as its right-child. When you sketch them out you get the following for n=5:
...and the following for n=10:
You'll note that:
  • When n=5, the tree is 5 levels deep. When we use a Huffman tree for encoding we generate a bit for each edge we traverse in the tree (not for each node we visit), so the longest path from the root to a leaf node, which is for the least frequent symbol, will generate a 4-bit encoding.
  • When n=10, the tree is 10 levels deep, so the longest path from the root to a leaf node will generate a 9-bit encoding.
  • In both trees the shortest path from the root to a leaf node, which is for the most frequent symbol, traverses a single edge and so will generate a 1-bit encoding
In general we can therefore state that, for an alphabet of n symbols where relative frequencies of the symbols are 1, 2, 4, ..., 2n-1, a Huffman tree will encode the most frequent symbol using a 1-bit encoding, while the least frequent symbol will be encoded using n-1 bits.

SICP Exercise 2.70: Yip a Yip Wah Boom!

The following eight-symbol alphabet with associated relative frequencies was designed to efficiently encode the lyrics of 1950s rock songs. (Note that the ``symbols'' of an ``alphabet'' need not be individual letters.)

A2NA16
BOOM1SHA3
GET2YIP9
JOB2WAH1

Use generate-huffman-tree (exercise 2.69) to generate a corresponding Huffman tree, and use encode (exercise 2.68) to encode the following message:

Get a job
Sha na na na na na na na na
Get a job
Sha na na na na na na na na
Wah yip yip yip yip yip yip yip yip yip
Sha boom

How many bits are required for the encoding? What is the smallest number of bits that would be needed to encode this song if we used a fixed-length code for the eight-symbol alphabet?

Okay, first let's build our Huffman tree:
> (define fifties-song-tree
    (generate-huffman-tree
      '((A 2) (BOOM 1) (GET 2) (JOB 2) (NA 16) (SHA 3) (YIP 9) (WAH 1))))
> fifties-song-tree
'((leaf NA 16)
  ((leaf YIP 9)
   (((leaf A 2) ((leaf WAH 1) (leaf BOOM 1) (WAH BOOM) 2) (A WAH BOOM) 4)
   ((leaf SHA 3) ((leaf JOB 2) (leaf GET 2) (JOB GET) 4) (SHA JOB GET) 7)
   (A WAH BOOM SHA JOB GET)
   11)
  (YIP A WAH BOOM SHA JOB GET)
  20)
 (NA YIP A WAH BOOM SHA JOB GET)
 36)
And now let's encode our song:
> (define encoded-song (encode '(GET A JOB
                                 SHA NA NA NA NA NA NA NA NA
                                 GET A JOB
                                 SHA NA NA NA NA NA NA NA NA
                                 WAH YIP YIP YIP YIP YIP YIP YIP YIP YIP 
                                 SHA BOOM)
                               fifties-song-tree))
> encoded-song
'(1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
  1 0 0 1 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 0 1
  0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1)
> (length encoded-song)
84
So we need 84 bits to encode the song using our Huffman tree.

How about if we use a fixed-length code? Well, we have eight symbols to encode so, using a fixed-length code, we could encode the message using 3 bits per symbol (as 23=8). As there are a total of 36 symbols to encode, that means we could encode this using a fixed-length code using 36×3=108 bits.

So using Huffman encoding has saved us 24 bits - a 22% reduction in the size of the message.

2011-12-22

SICP Exercise 2.69: Generating Huffman Trees

The following procedure takes as its argument a list of symbol-frequency pairs (where no symbol appears in more than one pair) and generates a Huffman encoding tree according to the Huffman algorithm.
(define (generate-huffman-tree pairs)
  (successive-merge (make-leaf-set pairs)))
Make-leaf-set is the procedure given above that transforms the list of pairs into an ordered set of leaves. Successive-merge is the procedure you must write, using make-code-tree to successively merge the smallest-weight elements of the set until there is only one element left, which is the desired Huffman tree. (This procedure is slightly tricky, but not really complicated. If you find yourself designing a complex procedure, then you are almost certainly doing something wrong. You can take significant advantage of the fact that we are using an ordered set representation.)

The procedure generate-huffman-tree calls successive-merge with the results of (make-leaf-set pairs). As noted above, this is an ordered set of leaves. To be more precise, it is a list of leaves, ordered by increasing frequency of occurrence. This means that, initially at least, we can find the smallest-weight elements of the set at the start of the set. If this were an invariant of the procedure then this would mean that we could always find the smallest-weight elements of the set at the start of the list. We can do this by defining successive-merge as a recursive procedure that, at each call, picks the first two nodes off the list, merges them, inserts the result of the merge into the appropriate place in the remainder of the list (to maintain the invariant) and then makes the recursive call with this result.

So provided we can perform two actions (merge a pair of elements and insert a node into the appropriate place in the list) then we will be able to do this... and thankfully the authors provide us with a couple of procedures that will let us do this. make-code-tree takes two nodes and produces a new node with an appropriate symbols list and weight and with those nodes as its children. adjoin-set takes a node and inserts it into the appropriate position in a list of nodes that are ordered by increasing weight.

All we need to do now is to provide a base case and we're done. Or, as we'll actually do, a couple of base cases. The first base case is when we have completed merging all of the nodes in a non-empty list of nodes. In this situation we have a list with one node in it, and we can simply return that node as the result. The second base case is an edge case - we need to deal with what happens when pairs is an empty list. In that case make-leaf-set also returns an empty list. To keep it consistent we'll do the same.

Okay, so we've got all the components we need, we know what we need them to do, so let's put them together:
(define (successive-merge set)
  (cond ((null? set) '())
        ((null? (cdr set)) (car set))
        (else (successive-merge
               (adjoin-set (make-code-tree (car set) (cadr set))
                           (cddr set))))))
To test this, let's use the frequencies that were used to generate sample-tree for exercise 2.67, and compare the result against sample-tree:
> (generate-huffman-tree '((A 4) (B 2) (C 1) (D 1)))
'((leaf A 4) ((leaf B 2) ((leaf D 1) (leaf C 1) (D C) 2) (B D C) 4) (A B D C) 8)
> sample-tree
'((leaf A 4) ((leaf B 2) ((leaf D 1) (leaf C 1) (D C) 2) (B D C) 4) (A B D C) 8)

SICP Exercise 2.68: Encoding Symbols

The encode procedure takes as arguments a message and a tree and produces the list of bits that gives the encoded message.
(define (encode message tree)
  (if (null? message)
      '()
      (append (encode-symbol (car message) tree)
              (encode (cdr message) tree))))
Encode-symbol is a procedure, which you must write, that returns the list of bits that encodes a given symbol according to a given tree. You should design encode-symbol so that it signals an error if the symbol is not in the tree at all. Test your procedure by encoding the result you obtained in exercise 2.67 with the sample tree and seeing whether it is the same as the original sample message.

First things first, let's deal with the error case. We're given a procedure, symbols, that will give us a list of all the symbols in a particular node of the tree. We can use this to test whether a particular node contains a symbol or not by using it in conjunction with memq which was introduced way back in section 2.3.1. memq will return false if the symbol isn't in the list, and a non-empty list otherwise, so we can use this as a predicate (since non-false values are treated as true).

With the error case out of the way, let's consider how we can generate our list of bits. We have a binary tree, so we can recurse down this to find the leaf node corresponding to the symbol we're trying to encode. We know (from the error case) that we can test whether or not our symbol is under the left or right child nodes via symbols and memq, and we know that we want to put a 0 onto the list for each time we move to a left child and a 1 onto the list for each time we move to a right child. So all we need to do is determine which branch to take at each (non-leaf) node and then cons the appropriate binary digit onto the list returned by recursively calling encode-symbol for that branch. Oh, and we'll also need a base case... Once we reach a leaf node we have no further children to navigate, so we can simply return the empty list.

This gives us the following implementation:
(define (encode-symbol symbol tree)
  (cond ((not (memq symbol (symbols tree)))
         (error "bad symbol -- ENCODE-SYMBOL" symbol))
        ((leaf? tree) '())
        ((memq symbol (symbols (left-branch tree)))
         (cons 0 (encode-symbol symbol (left-branch tree))))
        ((memq symbol (symbols (right-branch tree)))
         (cons 1 (encode-symbol symbol (right-branch tree))))))
Let's see it in action:
> (encode '(A D A B B C A) sample-tree)
'(0 1 1 0 0 1 0 1 0 1 1 1 0)

SICP Exercise 2.67: Decoding a Message

Define an encoding tree and a sample message:
(define sample-tree
  (make-code-tree (make-leaf 'A 4)
                  (make-code-tree
                   (make-leaf 'B 2)
                   (make-code-tree (make-leaf 'D 1)
                                   (make-leaf 'C 1)))))

(define sample-message '(0 1 1 0 0 1 0 1 0 1 1 1 0))
Use the decode procedure to decode the message, and give the result.

Okay, let's stick it in the interpreter and see what we get:
> (decode sample-message sample-tree)
'(A D A B B C A)

SICP Exercise 2.66: Looking up Data

Implement the lookup procedure for the case where the set of records is structured as a binary tree, ordered by the numerical values of the keys.

First, let's define our record structure. To keep it simple, a record will be a list, with the numerical value of the key held at the head of the list. This means we can define key as follows:
(define (key record)
  (car record))
Then we can simply take element-of-set? and make the following modifications to it:
  • Instead of checking whether the entry matches the required value, it will check whether the value returned by applying key to the entry matches the requested key.
  • If it finds the requested entry then, instead of returning true, it will return the entry itself.
  • The recursive calls will need to pass the requested key through.
  • To save us having to extract the key of the entry of the current node each time, if the current node isn't null then we'll extract it once.
Here's element-of-set? again as a reminder:
(define (element-of-set? x set)
  (cond ((null? set) false)
        ((= x (entry set)) true)
        ((< x (entry set))
         (element-of-set? x (left-branch set)))
        ((> x (entry set))
         (element-of-set? x (right-branch set)))))
...and here's my version of lookup:
(define (lookup given-key set-of-records)
  (if (null? set-of-records)
      #f
      (let ((entry-key (key (entry set-of-records))))
        (cond ((= given-key entry-key) (entry set-of-records))
              ((< given-key entry-key)
               (lookup given-key (left-branch set-of-records)))
              ((> given-key entry-key)
               (lookup given-key (right-branch set-of-records)))))))
Right, let's put it to the test:
         
> (define records (list->tree '((1 one) (2 two) (4 four) (5 five) (7 seven) (10 ten))))
> records
'((4 four) ((1 one) () ((2 two) () ())) ((7 seven) ((5 five) () ()) ((10 ten) () ())))
> (lookup 1 records)
'(1 one)
> (lookup 4 records)
'(4 four)
> (lookup 7 records)
'(7 seven)
> (lookup 3 records)
#f

2011-12-21

SICP Exercise 2.65: Unioning and Intersection Binary Search Trees

Use the results of exercises 2.63 and 2.64 to give Θ(n) implementations of union-set and intersection-set for sets implemented as (balanced) binary trees.

Okay, so in exercise 2.63 we established that the procedure tree->list-2 is an Θ(n) procedure, and in exercise 2.64 we established that list->tree is also an Θ(n) procedure. This means that we can convert a balanced binary tree to a list and back again in Θ(n). Further to this, we can convert a balanced binary tree to a list, transform that list into another list, and then convert the resulting list to a balanced binary tree and, provided the transformation of the list is no worse than Θ(n), we can do all this in Θ(n).

If we assume that we're dealing with binary search trees, and not just binary trees, as the authors imply, then we know that tree->list-2 will produce an ordered list. We also know that, given an ordered list, list->tree will produce a balanced binary search tree. So one way of achieving our goal would be to take the balanced binary search trees to be unioned or intersected, convert them to ordered lists using tree->list-2, perform the union or intersection on the ordered lists, and then convert the resulting set back to a balanced binary search tree using list->tree.

If only we had some way of performing union-set and intersection-set on ordered list representations with Θ(n) growth. Oh look! We produced such a union-set in exercise 2.62, and the authors give us such an intersection-set in the "Sets as ordered lists" section. It's almost like that was planned...

Okay, so what we'll do is we'll convert these previously defined procedures to inner procedures in the corresponding union-set and intersection-set procedures for binary trees:
(define (union-set set1 set2)
  (define (union s1 s2)
    (cond ((null? s1) s2)
          ((null? s2) s1)
          (else (let ((x1 (car s1))
                      (x2 (car s2)))
                  (cond ((= x1 x2) (cons x1 (union (cdr s1) (cdr s2))))
                        ((< x1 x2) (cons x1 (union (cdr s1) s2)))
                        ((< x2 x1) (cons x2 (union s1 (cdr s2)))))))))
  (let ((list1 (tree->list-2 set1))
        (list2 (tree->list-2 set2)))
    (list->tree (union list1 list2))))

(define (intersection-set set1 set2)
  (define (intersect s1 s2)
    (if (or (null? s1) (null? s2))
        '()    
        (let ((x1 (car s1)) (x2 (car s2)))
          (cond ((= x1 x2) (cons x1 (intersect (cdr s1) (cdr s2))))
                ((< x1 x2) (intersect (cdr s1) s2))
                ((< x2 x1) (intersect s1 (cdr s2)))))))
  (let ((list1 (tree->list-2 set1))
        (list2 (tree->list-2 set2)))
    (list->tree (intersect list1 list2))))
Okay, so let's see these in action:
> (define odd-set (list->tree '(1 3 5)))
> (define even-set (list->tree '(2 4 6)))
> (define low-set (list->tree '(1 2 3 4)))
> (define high-set (list->tree '(3 4 5 6)))
> odd-set
'(3 (1 () ()) (5 () ()))
> even-set
'(4 (2 () ()) (6 () ()))
> low-set
'(2 (1 () ()) (3 () (4 () ())))
> high-set
'(4 (3 () ()) (5 () (6 () ())))
> (intersection-set odd-set odd-set)
'(3 (1 () ()) (5 () ()))
> (intersection-set odd-set even-set)
'()
> (intersection-set even-set odd-set)
'()
> (intersection-set low-set high-set)
'(3 () (4 () ()))
> (union-set odd-set odd-set)
'(3 (1 () ()) (5 () ()))
> (union-set odd-set even-set)
'(3 (1 () (2 () ())) (5 (4 () ()) (6 () ())))
> (union-set even-set odd-set)
'(3 (1 () (2 () ())) (5 (4 () ()) (6 () ())))
> (union-set low-set high-set)
'(3 (1 () (2 () ())) (5 (4 () ()) (6 () ())))

2011-12-20

SICP Exercise 2.64: Constructing Balanced Trees

The following procedure list->tree converts an ordered list to a balanced binary tree. The helper procedure partial-tree takes as arguments an integer n and list of at least n elements and constructs a balanced tree containing the first n elements of the list. The result returned by partial-tree is a pair (formed with cons) whose car is the constructed tree and whose cdr is the list of elements not included in the tree.
(define (list->tree elements)
  (car (partial-tree elements (length elements))))

(define (partial-tree elts n)
  (if (= n 0)
      (cons '() elts)
      (let ((left-size (quotient (- n 1) 2)))
        (let ((left-result (partial-tree elts left-size)))
          (let ((left-tree (car left-result))
                (non-left-elts (cdr left-result))
                (right-size (- n (+ left-size 1))))
            (let ((this-entry (car non-left-elts))
                  (right-result (partial-tree (cdr non-left-elts)
                                              right-size)))
              (let ((right-tree (car right-result))
                    (remaining-elts (cdr right-result)))
                (cons (make-tree this-entry left-tree right-tree)
                      remaining-elts))))))))

  1. Write a short paragraph explaining as clearly as you can how partial-tree works. Draw the tree produced by list->tree for the list (1 3 5 7 9 11).
  2. What is the order of growth in the number of steps required by list->tree to convert a list of n elements?
(a) Describe How partial-tree Works
Before we give our short paragraph let's first note what quotient does as it's used here without any description. It's official definition in the Scheme R5RS specification can be found here. However, its functionality can be summarized as returning the result of dividing the first numeric operand by the second numeric operand and, if this is not an integer value, rounding it down to the nearest integer towards zero.

Okay, so given the assumptions about the operands to partial-tree listed above let's now describe how partial-tree works...

In the terminating case, when n=0, partial-tree simply returns a pair containing a representation of an empty binary tree and the unaltered list of elements it was called with (as these elements were not added to the binary tree generated). However, in the non-terminating case partial-tree first calculates the number of elements that should go into the left sub-tree of a balanced binary tree of size n, then invokes partial-tree with the elements and that value which both produces such a sub-tree and the list of elements not in that sub-tree. It then takes the head of the unused elements as the value for the current node, the tail as the potential elements for the right sub-tree and calculates how many elements should go into the right sub-tree of a balanced binary tree of size n. To build the right sub-tree it then invokes partial-tree with the unused elements and the required size of the right sub-tree to build it and get any remaining elements. Finally it builds the balanced tree with the generated components and returns this tree along with any remaining elements.

Note that a more succinct way of putting this would be to say that, for the non-terminating case, partial-tree splits the list into four: elements preceding the (n/2)th position, the element at the (n/2)th position, elements between the (n/2)th position (exclusive) and the nth position (inclusive) in the list and elements after the nth position in the list. It then builds binary trees for the first and third of these lists using recursive calls to partial-tree, puts these sub-trees together, along with the element at the (n/2)th position, to produce a tree and returns this along with the fourth list. However, while this captures the effect of the procedure, it doesn't describe the actual implementation.

I'd like to add a further note to this part of the exercise. The authors state that "list->tree converts an ordered list to a balanced binary tree". I'd like to correct this a little... list->tree will convert any list into a balanced binary tree; however, if the list is ordered then list->tree will convert it into a balanced binary search tree.

As for the tree produced by list->tree for the list (1 3 5 7 9 11), well here's the procedure in action:
> (list->tree '(1 3 5 7 9 11))
'(5 (1 () (3 () ())) (9 (7 () ()) (11 () ())))
If, as in the book, we omit empty lists from the tree diagram, then this gives us the following tree:
(b) Order of Growth
First, lets make the assumption that all of the list and pair operations (i.e. car, cdr, cons and list) are Θ(1) operations. I include list in this as make-tree uses it.

Now, excluding the recursive calls, partial-tree performs a fixed set of these operations for each list it processes. As a result we can say that, excluding recursive calls partial-tree runs in constant time (i.e. Θ(1)). We also know that, due to the way in which partial-tree operates, it makes recursive calls such that uses each index in the range 1..n exactly once. Now if you perform n invocations of an Θ(1) operation that gives you an order of growth of Θ(n).

Finally, list->tree simply makes a single call to partial-tree using the length of the list of elements as n. If we make a further assumption that length is, in the worst case, Θ(n) (i.e. if it has to iterate through the list and count all the elements), then this means that list->tree is Θ(n).

2011-12-18

SICP Exercise 2.63: Binary Trees to Ordered Lists

Each of the following two procedures converts a binary tree to a list.
(define (tree->list-1 tree)
  (if (null? tree)
      '()
      (append (tree->list-1 (left-branch tree))
              (cons (entry tree)
                    (tree->list-1 (right-branch tree))))))
(define (tree->list-2 tree)
  (define (copy-to-list tree result-list)
    (if (null? tree)
        result-list
        (copy-to-list (left-branch tree)
                      (cons (entry tree)
                            (copy-to-list (right-branch tree)
                                          result-list)))))
  (copy-to-list tree '()))
  1. Do the two procedures produce the same result for every tree? If not, how do the results differ? What lists do the two procedures produce for the trees in figure 2.16?
  2. Do the two procedures have the same order of growth in the number of steps required to convert a balanced tree with n elements to a list? If not, which one grows more slowly?
(a) What Do the Procedures Do?
Before we actually look in detail at the internals of the two procedures, let's see how they behave with various trees. It's worth noting that the "Sets as binary trees" section is actually using binary search trees to represent sets, not just binary trees. As a result we'll test the procedures using valid binary search trees:
> (tree->list-1 '())
'()
> (tree->list-2 '())
'()
> (tree->list-1 (make-tree 5 '() '()))
'(5)
> (tree->list-2 (make-tree 5 '() '()))
'(5)
> (tree->list-1 (make-tree 3 (make-tree 2 (make-tree 1 '() '()) '()) '()))
'(1 2 3)
> (tree->list-2 (make-tree 3 (make-tree 2 (make-tree 1 '() '()) '()) '()))
'(1 2 3)
> (tree->list-1 (make-tree 1 '() (make-tree 2 '() (make-tree 3 '() '()))))
'(1 2 3)
> (tree->list-2 (make-tree 1 '() (make-tree 2 '() (make-tree 3 '() '()))))
'(1 2 3)
> (tree->list-1 (make-tree 5
                           (make-tree 3
                                      (make-tree 1
                                                 '()
                                                 (make-tree 2 '() '()))
                                      (make-tree 4 '() '()))
                           (make-tree 6
                                      '()
                                      (make-tree 7 '() '()))))
'(1 2 3 4 5 6 7)
> (tree->list-2 (make-tree 5
                           (make-tree 3
                                      (make-tree 1
                                                 '()
                                                 (make-tree 2 '() '()))
                                      (make-tree 4 '() '()))
                           (make-tree 6
                                      '()
                                      (make-tree 7 '() '()))))
'(1 2 3 4 5 6 7)
As the exercise also asks what the two procedures produce for the trees in figure 2.16, let's try those out too:
> (tree->list-1 (make-tree 7
                           (make-tree 3
                                      (make-tree 1 '() '())
                                      (make-tree 5 '() '()))
                           (make-tree 9
                                      '()
                                      (make-tree 11 '() '()))))
'(1 3 5 7 9 11)
> (tree->list-2 (make-tree 7
                           (make-tree 3
                                      (make-tree 1 '() '())
                                      (make-tree 5 '() '()))
                           (make-tree 9
                                      '()
                                      (make-tree 11 '() '()))))
'(1 3 5 7 9 11)
> (tree->list-1 (make-tree 3
                           (make-tree 1 '() '())
                           (make-tree 7
                                      (make-tree 5 '() '())
                                      (make-tree 9
                                                 '()
                                                 (make-tree 11 '() '())))))
'(1 3 5 7 9 11)
> (tree->list-2 (make-tree 3
                           (make-tree 1 '() '())
                           (make-tree 7
                                      (make-tree 5 '() '())
                                      (make-tree 9
                                                 '()
                                                 (make-tree 11 '() '())))))
'(1 3 5 7 9 11)
> (tree->list-1 (make-tree 5
                           (make-tree 3
                                      (make-tree 1 '() '())
                                      '())
                           (make-tree 9
                                      (make-tree 7 '() '())
                                      (make-tree 11 '() '()))))
'(1 3 5 7 9 11)
> (tree->list-2 (make-tree 5
                           (make-tree 3
                                      (make-tree 1 '() '())
                                      '())
                           (make-tree 9
                                      (make-tree 7 '() '())
                                      (make-tree 11 '() '()))))
'(1 3 5 7 9 11)
Okay, so it looks from the results as if both procedures are performing the equivalent of in-order (or symmetric) tree traversals. Given that these are binary search trees containing numeric values, with the left sub-tree of a node containing only values numerically less than the current node's value and the right-sub-tree of a node containing only values numerically greater than the current node's value, this means that we're expecting both procedures to produce a list containing all of the node values from the tree in numerically ascending order. Let's now look at what each of the two procedures do in turn in order to confirm this.

The first procedure, tree->list-1, processes the child nodes of each node recursively, until it reaches the null children. For any non-null node it appends the tree->list-1 for the left node onto the result of consing the current node's value onto the tree->list-1 for the right node. Assuming that we're dealing with a valid binary search tree then, for any node, this will produce a list containing the current node's value at some index, i such that all the elements in the list before index i are numerically less than the current node's value, while all the elements in the list that are after index i are numerically greater than the current node's value. Given that this is done recursively, we can say that tree->list-1 will indeed produce a list containing all of the node values from the tree in numerically ascending order.

Now let's look at tree->list-1. This procedure (or at least the inner procedure, copy-to-list) is more iterative in nature, building up a result list as it goes along, and stopping once it's run out of tree to process. For any non-null node it recursively calls copy-to-list with the left node as the tree to process, and the result of consing the current node's value onto the list returned by recursively calling copy-to-list with the right node as the tree and the result list as built so far. With the same assumption as before this means that: firstly the head of the list produced by the cons is less than all of the elements in the tail of the list, and secondly all of the elements to be consed onto the head of this list by the call to copy-to-list with the left node as the tree are less than the head of the result list so far. As this is performed recursively, we can also say that tree->list-2 will produce a list containing all of the node values from the tree in numerically ascending order.

How Do the Procedures Grow?
Well, first of all both procedures visit each node once in their traversal of the tree. This means that there's going to be a factor of n in both orders of growth. However, they differ in how they build up the results list...

The first procedure, tree->list-1, uses append to join the sublist produced for the left child node onto the list produced by consing the current node's value onto the sublist produced for the right child node. Now append is an Θ(n) operation, as it must iterate through, in this application, the entirety of the sublist produced for the left child node, and use cons (which we're assuming to be an Θ(1) operation) to append each element in reverse order onto the sublist produced for the right child node. As append is invoked once for each of the n nodes in the tree, tree->list-1 is an Θ(n2) operation.

On the other hand, in each recursive call the second procedure, tree->list-2, uses cons to put the current node's value onto the result list. As noted above we're assuming this to be an Θ(1) operation and, as it's performed once per node, this is performed a total of n times. As a result, tree->list-2 is an Θ(n) operation, and so grows more slowly than tree->list-1.

A Weight off My Shoulders

SICP is quite heavy going... Or at least it's quite heavy going when you're trying to work through it in large chunks on a regular basis and keep a (more than) full-time job going as a software engineer. Not to mention having a life outside work as well...

You may have noticed I've been somewhat lax in keeping up with SICP. I'm not the only one. Several of my colleagues have not been able to keep up with the exercises and, when we reached the end of chapter 2, we took the decision to stop work through the book as a group and instead we'll work through the Berkeley SICP video lectures together. Working through the book is still encouraged, but has been left to individuals to drive their own progress.

So I've been taking a little break from the book. Hey, it's been somewhat busy both at work and home of late, and I've not found enough time to do any of my huge list of personal projects. However, I'm back! I might not manage to maintain the same velocity as I've done previously, but I still intend on working my way through the whole book and posting my solutions to the exercises.

Anyway, I've dropped the progress chart, as that no longer makes sense. And, going forward, the SICP exercise index won't have any week dates on it for the exercises, since they don't make sense either.

SICP Exercise 2.62: Unioning Sets with Ordered Representations

Give a Θ(n) implementation of union-set for sets represented as ordered lists.

Just like in exercise 2.59, the implementation of union-set is similar in many respects to intersection-set. We'll iterate through the two list representations in parallel, comparing the head items of each list at each iteration. However, unlike intersection-set, which only keeps an item if it is present in the heads of both lists, we want to keep all of the items from both lists - but we need to remove duplicates and ensure the resulting set is ordered.

So what are the cases we're going to deal with? Well:
  • If set1 is null then the union is just set2.
  • Conversely, if set2 is null then the union is just set1.
  • If the head items of both lists are the same then we can produce the union by consing that item onto the result of generating the union of the cdrs of both lists. As with intersection-set it doesn't really matter whether we use the head if set1 or set2 for the cons, so we'll arbitrarily use the head of set1.
  • If the head item of set1 precedes the head item of set2 then we can produce the union by consing the head item of set1 onto the result of generating the union of the cdr of set1 and set2.
  • Conversely, if the head item of set2 precedes the head item of set1 then we can produce the union by consing the head item of set2 onto the result of generating the union of set1 and the cdr of set2.
Here's the procedure:
(define (union-set set1 set2)
  (cond ((null? set1) set2)
        ((null? set2) set1)
        (else (let ((x1 (car set1))
                    (x2 (car set2)))
                (cond ((= x1 x2)
                       (cons x1
                             (union-set (cdr set1)
                                        (cdr set2))))
                      ((< x1 x2)
                       (cons x1
                             (union-set (cdr set1)
                                        set2)))
                      ((< x2 x1)
                       (cons x2
                             (union-set set1
                                        (cdr set2)))))))))
Let's see it in action:
> (union-set '() '())
'()
> (union-set '(1 2 3) '())
'(1 2 3)
> (union-set '() '(4 5 6))
'(4 5 6)
> (union-set '(1 2 3) '(4 5 6))
'(1 2 3 4 5 6)
> (union-set '(4 5 6) '(1 2 3))
'(1 2 3 4 5 6)
> (union-set '(1 3 5) '(2 4 6))
'(1 2 3 4 5 6)
> (union-set '(2 4 6) '(1 3 5))
'(1 2 3 4 5 6)
> (union-set '(1 2 3) '(2 3 4))
'(1 2 3 4)

SICP Exercise 2.61: Adjoining to Orderered Representations

Give an implementation of adjoin-set using the ordered representation. By analogy with element-of-set? show how to take advantage of the ordering to produce a procedure that requires on the average about half as many steps as with the unordered representation.

The implementation of element-of-set? given for using the ordered representation iterates through the list until it either finds the item being checked or until it finds an element greater than the item being checked. Our implementation of adjoin-set has to produce a representation that maintains the ordering and can do so using a similar approach. We're going to iterate through the list until we find the appropriate location for the new item to be inserted, and we still have the same four cases to consider. However, as we need to build up a result list as we go along, the actions taken for each case differ:
  • If we're adjoining to a null list, then the result is simply a list containing only the item being adjoined.
  • If the head of the list is the same as the item being adjoined then the item is already present in the list and so we don't need to modify the list and can just use the current list as the result.
  • If the head of the list is greater than the item being adjoined then the item needs to go at the head of the list, and so we need to cons the item onto the list.
  • Otherwise (i.e. if the head of the list is less then the item being adjoined) then the item needs to be inserted later in the list. We can do this by consing the head of the list onto the result of adjoining the item to the cdr of the list.
An appropriate implementation of the procedure is therefore:
(define (adjoin-set x set)
  (cond ((null? set) (list x))
        ((= x (car set)) set)
        (( < x (car set)) (cons x set))
        (else (cons (car set)
                    (adjoin-set x (cdr set))))))
Here it is in action:
> (adjoin-set 4 '())
'(4)
> (adjoin-set 3 '(4))
'(3 4)
> (adjoin-set 10 '(3 4))
'(3 4 10)
> (adjoin-set 6 '(3 4 10))
'(3 4 6 10)
> (adjoin-set 7 '(3 4 6 10))
'(3 4 6 7 10)
> (adjoin-set 5 '(3 4 6 7 10))
'(3 4 5 6 7 10)
Similar to the element-of-set? implementation, assuming that the items we adjoin are of many different sizes and are randomly ordered, then we'll sometimes be able to insert items near (and so stop iterating near) the start of the list, while for other items we'll have to insert them later in the list, and so on average we'll need Θ(n/2) steps to adjoin items.

It's worth noting that the ordering of the items processed is important for adjoin-set in making this assertion, whereas it wasn't for element-of-set?. For example, if you take any set and then check each of it's members is present in the set using element-of-set? then this will require, on average, n/2 recursive calls to element-of-set? to perform the check, regardless of the ordering we use. However, if we start with an empty list and try to build a set by inserting items in numerically ascending order (i.e. (adjoin-set 3 (adjoin-set 2 (adjoin-set 1 '())))) then each adjoin will require n recursive calls to adjoin-set - the first will require 1 call, the second will require 2 calls, and so on, as for each new item it will need to iterate to the end of the current list to find the appropriate location for the item. On the other hand, if we build the set by inserting items in numerically descending order (i.e. (adjoin-set 1 (adjoin-set 2 (adjoin-set 3 '())))) then each adjoin will only require 1 call to adjoin-set, as each item will go at the head of the current list.

2011-12-16

SICP Exercise 2.60: Duplicate Sets

We specified that a set would be represented as a list with no duplicates. Now suppose we allow duplicates. For instance, the set {1,2,3} could be represented as the list (2 3 2 1 3 2 2). Design procedures element-of-set?, adjoin-set, union-set, and intersection-set that operate on this representation. How does the efficiency of each compare with the corresponding procedure for the non-duplicate representation? Are there applications for which you would use this representation in preference to the non-duplicate one?

We're still using a list as our representation here. We've just removed the restriction on duplicates. This means that those set operations which create a new set by taking an existing set and potentially adding new items to it no longer need to check if those items are present before adding them. This means adjoin-set and union-set no longer need to perform those checks. We'll look at those set operations in detail in a minute, but let's first look at the other two set operations: element-of-set and intersection-set.

The implementation of element-of-set given in the book simply iterates through the list representation of the set and returns true as soon as it encounters a matching element, or false if there is no such matching element. Allowing duplicates in the list representation doesn't change this approach. We still need to scan through the list to see if the element is present. It's just that, where duplicates exist, we may compare against the same value multiple times. So element-of-set can be used as is - it may run slower though, as the presence of duplicates means that the lists we're scanning may be longer than their equivalents when duplicates are not allowed.

We can see that the same holds true for intersection-set. The implementation in the book iterates through the list representation of set1 and, for each element, adds it to the result set iff it's also present in set2. If there are duplicates in either set1 or set2 this doesn't really change this approach. We still need to examine each item in set1 and only include it in the result set if it's also in set2, so we can reuse the existing implementation of intersection-set. Note that the result set could contain duplicates: any element that appears in both sets will appear as many times as it is present in set1. As with element-of-set this means that it may also run slower and may generate larger sets in comparison with using equivalent sets where duplicates are not allowed.

Now onto adjoin-set... In the "no duplicates" implementation provided in the book, adjoin-set checks to see whether the element is already in the representation before appending it. As we're allowed duplicates in our representation we don't need this test - we can just cons it onto the head. This gives us a very simple implementation:
(define (adjoin-set x set)
  (cons x set))
...or, to put it more succinctly...
(define adjoin-set cons)
As this simply puts the item on the head of the list this makes it an Θ(1) operation, and so is much more efficient than the "no duplicates" implementation (which is Θ(n)). Note that if we wanted to be slightly smart about it, still allow duplicates generally, and still retain the Θ(1) efficiency, we could simply check the head of the list to ensure that we're not putting another identical element onto the head of the list:
(define (adjoin-set x set)
  (if (or (null? set) (not (equal? x (car set))))
      (cons x set)
      set))
However, given that the example representation of the set {1,2,3} ((2 3 2 1 3 2 2)) includes two '2' values adjacent to each other, this isn't a necessary optimization.

The last operation, union-set needs to produce the set of all values in both set1 and set2. As we no longer need to worry about duplicates we can simply append the two sets together to produce the result we need:
(define (union-set set1 set2)
  (append set1 set2))
...or, more succinctly...
(define union-set append)
Okay, so let's build some sets:
> (define evens
    (adjoin-set 0 (adjoin-set 2 (adjoin-set 4 (adjoin-set 6 (adjoin-set 8 '()))))))
> (define odds
    (adjoin-set 1 (adjoin-set 3 (adjoin-set 5 (adjoin-set 7 (adjoin-set 9 '()))))))
> evens
'(0 2 4 6 8)
> odds
'(1 3 5 7 9)
> (adjoin-set 2 evens)
'(2 0 2 4 6 8)
> (adjoin-set 2 odds)
'(2 1 3 5 7 9)
> (intersection-set evens odds)
'()
> (intersection-set evens evens)
'(0 2 4 6 8)
> (union-set evens odds)
'(0 2 4 6 8 1 3 5 7 9)
> (union-set evens evens)
'(0 2 4 6 8 0 2 4 6 8)
Let's compare efficiencies of implementations:

OperationNo DuplicatesAllow Duplicates
element-of-set?Θ(n)Θ(n)
adjoin-setΘ(n)Θ(1)
intersection-setΘ(n2)Θ(n2)
union-setΘ(n2)Θ(n)

Of course we need to be slightly careful in the comparison here... While element-of-set? is Θ(n) regardless of whether or not we allow duplicates, the n here is the number of elements in the list representation of the set, not the number of distinct elements in the set. As a result, n could be much larger in the "allow duplicates" case than in the "no duplicates" case, and so the operation could be much slower. A similar issue arises for intersection-set too (except we're dealing with Θ(n2) here, so the effect can be much more exacerbated).

We also need to be aware of this issue when comparing the two union-set efficiencies. While the "allow duplicates" case is definitely more efficient (at Θ(n) as opposed to the no duplicates efficiency of Θ(n2)), the value of n in the "allow duplicates" case could potentially be much higher. For example, consider the set {1, 2, 3}. In the "no duplicates" case the size of the underlying list we have to process (and so the n we are dealing with) will always be 3. However, in the "allow duplicates" case all we know for sure is that it will be at least 3 - but (theoretically) there's no upper bound on the size of the underlying list we have to process, so with certain list representations we may find that the Θ(n) operation in the "allow duplicates" case may still perform worse than the Θ(n2) operation in the "no duplicates case.

With adjoin-set there is no such issue. The "allow duplicates" case will take constant time regardless of the size of the underlying representation (or at least we're assuming that this is how cons works). As a result it doesn't matter that there may be duplicates in the set - this has no effect on the efficiency of the operation, and so the "allow duplicates" case will generally be quicker than the "no duplicates" case.

So when would we use this representation in preference to the non-duplicate one? Well, in applications where we're going to use adjoin-set much more frequently than any of the other operations (and where memory is not a concern) it may be preferable to use the implementation that allows duplicates.