2013-04-27

A Temporary Interruption to Service

Honest, I'm working on SICP exercise 2.97. In fact I've finished it and I'm in the process of writing it up. However, I'm in Seattle for a couple of weeks of work, which is keeping me hectically busy. I've also just started Andrew Ng's Machine Learning course on Coursera. Incidentally, he gave a keynote speech at a conference I was attending the other day. A real smart guy! Add to that my work being busy as ever and the minor additional factor of my beautiful almost-5-month old daughter and I've got no free time at all at the moment!
Normal service will be resumed shortly!

2013-03-12

SICP Exercise 2.96: Pseudo-Remainder Terms

  1. Implement the procedure pseudoremainder-terms, which is just like remainder-terms except that it multiplies the dividend by the integerizing factor described above before calling div-terms. Modify gcd-terms to use pseudoremainder-terms, and verify that greatest-common-divisor now produces an answer with integer coefficients on the example in exercise 2.95.
  2. The GCD now has integer coefficients, but they are larger than those of P1. Modify gcd-terms so that it removes common factors from the coefficients of the answer by dividing all the coefficients by their (integer) greatest common divisor.

Part A: Introducing the Integerizing Factor

The calculation of the integerizing factor and the implementation of pseudoremainder-terms are described in the book as follows:
[…] if P and Q are polynomials, let O1 be the order of P (i.e., the order of the largest term of P) and let O2 be the order of Q. Let c be the leading coefficient of Q. Then it can be shown that, if we multiply P by the integerizing factor c1 + O1 - O2, the resulting polynomial can be divided by Q by using the div-terms algorithm without introducing any fractions. The operation of multiplying the dividend by this constant and then dividing is sometimes called the pseudodivision of P by Q. The remainder of the division is called the pseudoremainder.
Note that, in order to calculated the integerizing factor, we need to be able to calculate exponents. Note also that, within our system, the c in c1 + O1 - O2 is a tagged type. This means that we'll need to add support for calculating exponentiation to our system. On the other hand, O1 and O2 are primitive integer values, so we'll need to convert the results of 1 + O1 - O2 to a tagged integer value so that it can be used as the exponent.
Let's begin with implementing support for exponentiation.
The book states that the modified GCD algorithm "really works only in the case of polynomials with integer coefficients". This means that our coefficient will always be an integer. As a result we can be pragmatic here and define the exponent operation for the integer package only. While defining it for the rational and real packages would be straightforward, this allows us to avoid having to define a generic logarithm operation in order to support complex numbers, as well as avoiding the complexities of calculating exponents for polynomial types.
We could define the integer implementation using the fast-expt implementation given in section 1.2.4 of the book. However, I chose simply to delegate the calculation to Scheme's built-in expt procedure:
(put 'exponent '(integer integer)
  (lambda (x y) (tag (expt x y))))
With that in place we can define a generic exponent operation in the usual manner:
(define (exponent b e)
  (apply-generic 'exponent b e))
We can now implement pseudoremainder-terms. This will still call through to div-terms and return the cadr of the results as before. However, we need to apply the integerizing factor to the first term-list and then apply div-terms to divide the resulting term-list by second term-list. As noted above, the result of the calculation 1 + O1 - O2 needs converted to a tagged integer type using make-integer in order that it can be used as the exponent when applying the generic exponent operation. In the implementation below we then convert this integerizing factor to a zero-order term in order that we can use our existing mul-term-by-all-terms procedure to apply the factor to the first term-list. Here's the code:
(define (pseudoremainder-terms L1 L2)
  (let* ((t1 (first-term L1))
         (t2 (first-term L2))
         (factor (exponent (coeff t2)
                           (make-integer (+ 1 (order t1) (- (order t2))))))
         (pseudoL1 (mul-term-by-all-terms (make-term 0 factor) L1)))
    (cadr (div-terms pseudoL1 L2))))
All that remains in part A is to modify gcd-terms to use pseudoremainder-terms instead of remainder-terms and we're done:
 (define (gcd-terms L1 L2)
  (if (empty-termlist? L2)
      L1
      (gcd-terms L2 (pseudoremainder-terms L1 L2))))
Let's see this in action. If you recall, in exercise 2.95 we defined two polynoials, q1 and q2, and calculating their GCD gave the following result:
> (greatest-common-divisor q1 q2)
(polynomial x dense-terms (rational (integer . 18954) integer . 2197)
                          (rational (integer . -37908) integer . 2197)
                          (rational (integer . 1458) integer . 169))
When we repeat this calculation with pseudoremainder-terms we now get the following result:
> (greatest-common-divisor q1 q2)
(polynomial x dense-terms (integer . 1458) (integer . -2916) (integer . 1458))
As you can see, just like in exercise 2.95, the coefficients hold the same relative ratios as the coefficients of P1, i.e. 1:-2:1. However, the resulting polynomial now has (large) integer coefficients throughout.

Part B: Removing Common Factors

We now want to remove the common factors from the coefficients of the result of gcd-terms so that we simplify the calculated GCD as much as possible. If we assume we have a procedure, remove-common-factors, which does this then our gcd-terms implementation becomes:
(define (gcd-terms L1 L2)
  (if (empty-termlist? L2)
      (remove-common-factors L1)
      (gcd-terms L2 (pseudoremainder-terms L1 L2))))
So what does remove-common-factors need to do? Well firstly it will need to determine the highest common factor (a.k.a. the GCD) of all of the coefficients and then it will need produce a new term-list by dividing each of the coefficients in the term-list by this common factor.
In order to find the GCD of the coefficients, first observe that the GCD of any two coefficients must itself have the GCD of all of the coefficients as a factor. So in order to determine the GCD of all of the coefficients we can simply start with the first coefficient as our initial GCD candidate and then repeatedly find the GCD of our current candidate and the next coefficient until we've processed the entire term-list. Programmatically this can be expressed as:
(define (find-gcd c L)
  (if (empty-termlist? L)
      c
      (find-gcd (greatest-common-divisor c (coeff (first-term L)))
                (rest-terms L))))
Obviously we can't use this to find the GCD of an entirely empty term-list, as there will be no first term to start with. However, this is just a helper procedure to support the implementation of remove-common-factors. As an empty term-list with its common factors removed is just an empty term-list we won't even need to invoke find-gcd in such a case. This provides a basic skeleton for our implementation of remove-common-factors: we determine whether we're dealing with an empty term-list, returning an empty term-list if we are and actually performing the removal if we're not. Here's the skeleton:
(define (remove-common-factors L)
  (if (empty-termlist? L)
      L
      <RemoveCommonFactors>))
All that remains is to use find-gcd to determine the GCD of the coefficients and then divide all of the coefficients by this GCD. One way of achieving this division would be to write a procedure that iterates through the term-list, applying the division to each coefficient and building a new term-list with the results. However, we already have a method of dividing a term-list by another value: div-terms, so it makes sense to use that instead. Unfortunately div-terms requires a term-list as the divisor, and find-gcd produces a tagged integer value, not a term-list. This means we'll need to first convert our tagged integer value into a zero-order term-list. We can do this by creating an empty termlist (with the-empty-termlist), creating a zero-order term with the GCD of the coefficients as its coefficient, and then adding this term to the empty term-list (with adjoin-term). Note that the division is guaranteed to produce no remainder, so we can simply return the calculated quotient as the simplified GCD.
Here's the full implementation of remove-common-factors:
  (define (remove-common-factors L)
    (if (empty-termlist? L)
        L
        (let* ((gcd-coeff (find-gcd (coeff (first-term L)) (rest-terms L)))
               (divisor (adjoin-term (make-term 0 gcd-coeff) (the-empty-termlist))))
          (car (div-terms L divisor)))))
With that in place we can now retry our calculation of the GCD of q1 and q2 again:
> (greatest-common-divisor q1 q2)
(polynomial x dense-terms (integer . 1) (integer . -2) (integer . 1))
This time we get the expected result. We can now move on to the last exercise in the chapter, where we'll use gcd-poly to reduce rational functions to their simplest terms.

2013-02-28

SICP Exercise 2.95: GCD and Non-Integer Values

Define P1, P2, and P3 to be the polynomials
P1: x2 - 2x + 1
P2: 11x2 + 7
P3: 13x + 5
Now define Q1 to be the product of P1 and P2 and Q2 to be the product of P1 and P3, and use greatest-common-divisor (exercise 2.94) to compute the GCD of Q1 and Q2. Note that the answer is not the same as P1. This example introduces noninteger operations into the computation, causing difficulties with the GCD algorithm. To understand what is happening, try tracing gcd-terms while computing the GCD or try performing the division by hand.
First let's define P1, P2 and P3 as p1, p2 and p3 respectively:
> (define p1 (make-polynomial-from-coeffs 'x
                                          (list (make-integer 1)
                                                (make-integer -2)
                                                (make-integer 1))))
> (define p2 (make-polynomial-from-coeffs 'x
                                          (list (make-integer 11)
                                                zero
                                                (make-integer 7))))
> (define p3 (make-polynomial-from-coeffs 'x
                                          (list (make-integer 13)
                                                (make-integer 5))))
We then use these to define Q1 and Q2 as q1 and q2 respectively:
> (define q1 (mul p1 p2))
> (define q2 (mul p1 p3))
> q1
(polynomial x dense-terms (integer . 11)
                          (integer . -22)
                          (integer . 18)
                          (integer . -14)
                          (integer . 7))
> q2
(polynomial x dense-terms (integer . 13)
                          (integer . -21)
                          (integer . 3)
                          (integer . 5))
Finally we calculate the GCD of Q1 and Q2:
> (greatest-common-divisor q1 q2)
(polynomial x dense-terms (rational (integer . 18954) integer . 2197)
                          (rational (integer . -37908) integer . 2197)
                          (rational (integer . 1458) integer . 169))
As the authors note in the exercise statement, the answer is not the same as P1, it does introduce non-integer operations into the computation and these then cause difficulties with the GCD algorithm. However, if you look closely at the numbers you might notice a correlation with P1. Note that 2197 / 13 = 169. We can use this to first express all the coefficients as multiples of 1/169:
  (18954/2197)x2 - (37908/2197)x + 1458/169
= ((18954/13)/(2197/13))x2 - ((37908/13)/(2197/13))x + 1458/169
= (1458/169)x2 - (2916/169)x + 1458/169
Now let's normalize this by multiplying all the coefficients by 169/1458:
  ((1458/169)x2 - (2916/169)x + 1458/169) * (169/1458)
= ((1458*169)/(169*1458))x2 - ((2916*169)/(169*1458))x + (1458*169)/(169*1458)
= ((1458*169)/(169*1458))x2 - ((2916*169)/(169*1458))x + (1458*169)/(169*1458)
= (246402/246402)x2 - (492804/246402)x + 246402/246402
= x2 - 2x + 1
= P1
As you can see, the calculated GCD's term coefficients have the same relative ratios as the coefficients of P1, i.e. 1:-2:1.
Let's trace the calculation. The Scheme interpreter I'm using has a trace procedure, which takes a procedure as an argument and turns on tracing of all calls to that passed procedure. However, I found this to be limited in use: when a trace output line is "too long" it arbitrarily collapses each argument passed to the procedure to "#". This is obviously no use for what we'd like to achieve here. As a result, I quickly hacked a trace directly into the procedures gcd-terms and div-terms so that I could get a better picture of what's going on.
With this in place my traced call produced the following output, with the point where non-integer values enter the calculation highlighted in yellow:
> (greatest-common-divisor q1 q2)
gcd-terms:
+-> (dense-terms (integer . 11)
|                (integer . -22)
|                (integer . 18)
|                (integer . -14)
|                (integer . 7))
+-> (dense-terms (integer . 13)
|                (integer . -21)
|                (integer . 3)
|                (integer . 5))
| div-terms:
| +-> (dense-terms (integer . 11)
| |                (integer . -22)
| |                (integer . 18)
| |                (integer . -14)
| |                (integer . 7))
| +-> (dense-terms (integer . 13)
| |                (integer . -21)
| |                (integer . 3)
| |                (integer . 5))
| | div-terms:
| | +-> (dense-terms (rational (integer . -55) integer . 13)
| | |                (rational (integer . 201) integer . 13)
| | |                (rational (integer . -237) integer . 13)
| | |                (integer . 7))
| | +-> (dense-terms (integer . 13)
| | |                (integer . -21)
| | |                (integer . 3)
| | |                (integer . 5))
| | | div-terms:
| | | +-> (sparse-terms (term 2 (rational (integer . 18954) integer . 2197))
| | | |                 (term 1 (rational (integer . -37908) integer . 2197))
| | | |                 (term 0 (rational (integer . 1458) integer . 169)))
| | | +-> (dense-terms (integer . 13)
| | | |                (integer . -21)
| | | |                (integer . 3)
| | | |                (integer . 5))
| | | +-= ((sparse-terms)
| | |      (sparse-terms (term 2 (rational (integer . 18954) integer . 2197))
| | |                    (term 1 (rational (integer . -37908) integer . 2197))
| | |                    (term 0 (rational (integer . 1458) integer . 169))))
| | +-= ((sparse-terms (term 0 (rational (integer . -55) integer . 169)))
| |      (sparse-terms (term 2 (rational (integer . 18954) integer . 2197))
| |                    (term 1 (rational (integer . -37908) integer . 2197))
| |                    (term 0 (rational (integer . 1458) integer . 169))))
| +-= ((sparse-terms (term 1 (rational (integer . 11) integer . 13))
|                    (term 0 (rational (integer . -55) integer . 169)))
|      (sparse-terms (term 2 (rational (integer . 18954) integer . 2197))
|                    (term 1 (rational (integer . -37908) integer . 2197))
|                    (term 0 (rational (integer . 1458) integer . 169))))
| gcd-terms:
| +-> (dense-terms (integer . 13)
| |                (integer . -21)
| |                (integer . 3)
| |                (integer . 5))
| +-> (sparse-terms (term 2 (rational (integer . 18954) integer . 2197))
| |                 (term 1 (rational (integer . -37908) integer . 2197))
| |                 (term 0 (rational (integer . 1458) integer . 169)))
| | div-terms:
| | +-> (dense-terms (integer . 13)
| | |                (integer . -21)
| | |                (integer . 3)
| | |                (integer . 5))
| | +-> (sparse-terms (term 2 (rational (integer . 18954) integer . 2197))
| | |                 (term 1 (rational (integer . -37908) integer . 2197))
| | |                 (term 0 (rational (integer . 1458) integer . 169)))
| | | div-terms:
| | | +-> (dense-terms (rational (integer . 208209690) integer . 41641938)
| | | |                (rational (integer . -32032260) integer . 3203226)
| | | |                (integer . 5))
| | | +-> (sparse-terms (term 2 (rational (integer . 18954) integer . 2197))
| | | |                 (term 1 (rational (integer . -37908) integer . 2197))
| | | |                 (term 0 (rational (integer . 1458) integer . 169)))
| | | | div-terms:
| | | | +-> (sparse-terms)
| | | | +-> (sparse-terms (term 2 (rational (integer . 18954) integer . 2197))
| | | | |                 (term 1 (rational (integer . -37908) integer . 2197))
| | | | |                 (term 0 (rational (integer . 1458) integer . 169)))
| | | | +-= ((sparse-terms)
| | | |      (sparse-terms))
| | | +-= ((sparse-terms (term 0 (rational (integer . 457436688930)
| | |                                       integer . 789281292852)))
| | |      (sparse-terms))
| | +-= ((sparse-terms (term 1 (rational (integer . 28561) integer . 18954))
| |                    (term 0 (rational (integer . 457436688930)
| |                                       integer . 789281292852)))
| |      (sparse-terms))
| | gcd-terms:
| | +-> (sparse-terms (term 2 (rational (integer . 18954) integer . 2197))
| | |                 (term 1 (rational (integer . -37908) integer . 2197))
| | |                 (term 0 (rational (integer . 1458) integer . 169)))
| | +-> (sparse-terms)
| | +-= (sparse-terms (term 2 (rational (integer . 18954) integer . 2197))
| |                   (term 1 (rational (integer . -37908) integer . 2197))
| |                   (term 0 (rational (integer . 1458) integer . 169)))
| +-= (sparse-terms (term 2 (rational (integer . 18954) integer . 2197))
|                   (term 1 (rational (integer . -37908) integer . 2197))
|                   (term 0 (rational (integer . 1458) integer . 169)))
+-= (sparse-terms (term 2 (rational (integer . 18954) integer . 2197))
                  (term 1 (rational (integer . -37908) integer . 2197))
                  (term 0 (rational (integer . 1458) integer . 169)))
(polynomial x dense-terms (rational (integer . 18954) integer . 2197)
                          (rational (integer . -37908) integer . 2197)
                          (rational (integer . 1458) integer . 169))
As you can see the non-integer values enter the calculation when the first invocation of div-terms by gcd-terms (via remainder-terms) makes a recursive call to itself. The fraction first enters the calculations when div-terms calculates the coefficient to use for the first term of the result. This is calculated by dividing the coefficient of the highest-order term from the first term-list by the coefficient of the highest-order term from the second term-list. Provided the latter is a factor of the former we will avoid non-integer values entering the computation. However in this case the coefficients are 11 and 13 respectively which, being prime numbers, are relatively prime to each other and so share no factors. Not only does this mean the coefficient of the first term of the result will be non-integer, but it also affects the calculations of both the coefficients for the other terms in the result and the remainder.
You may also have noticed in the trace above that the denominators and numerators of the rational numbers that appear in our calculations get exponentially larger as the algorithm progresses. This is a known limitation of Euclidean division, which is the algorithm we're using to implement div-terms. In the next exercise we'll address this by first implementing a procedure which calculates pseudo-remainders, which guarantees that no non-integer coefficents will result from the calculation, and then incorporating this into our gcd-terms implementation.