JoT's Jottings: 2011-09

2011-09-28

SICP Exercise 2.28: On the Fringe

Write a procedure fringe that takes as argument a tree (represented as a list) and returns a list whose elements are all the leaves of the tree arranged in left-to-right order. For example,

(define x (list (list 1 2) (list 3 4)))

(fringe x)
(1 2 3 4)

(fringe (list x x))
(1 2 3 4 1 2 3 4)

What we want to do here is to iterate through the list we're given and, for each element determine whether or not it's a list itself. If it isn't then we can just cons the current item onto the head of the list we're going to produce by applying fringe to the rest of the list. If it is a list then we want to apply fringe to the element itself to get a list of all the leaf node values under that element. However, we can't just cons this onto the head of the list we're going to produce by applying fringe to the rest of the list, as that will give a nested list. Instead we need to append the lists together so that we get a single non-nested list.

Here's my implementation:

(define (fringe l)
  (cond ((null? l) nil)
        ((pair? (car l)) (append (fringe (car l)) (fringe (cdr l))))
        (else (cons (car l) (fringe (cdr l))))))

...and here it is in action:

> (define x (list (list 1 2) (list 3 4)))
> (define y (list 1 (list 2 (list 3 4) 5) (list 6 7) 8))
> (fringe x)
(1 2 3 4)
> (fringe (list x x))
(1 2 3 4 1 2 3 4)
> (fringe y)
(1 2 3 4 5 6 7 8)
> (fringe (list y y))
(1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8)

SICP Exercise 2.27: Deep Reverse

Modify your reverse procedure of exercise 2.18 to produce a deep-reverse procedure that takes a list as argument and returns as its value the list with its elements reversed and with all sublists deep-reversed as well. For example,

> (define x (list (list 1 2) (list 3 4)))

> x
((1 2) (3 4))

> (reverse x)
((3 4) (1 2))

> (deep-reverse x)
((4 3) (2 1))

Here's the reverse procedure we produced for exercise 2.18:

(define (reverse l)
  (define (iter remaining result)
    (if (null? remaining)
        result
        (iter (cdr remaining) (cons (car remaining) result))))
  (iter l nil))

...and we can indeed confirm its behavior matches that given above:

> (define x (list (list 1 2) (list 3 4)))
> (reverse x)
((3 4) (1 2))

reverse is designed to reverse the ordering of elements in the top-level list. It does not touch the contents of the elements themselves.

To implement deep-reverse we need to reverse the list as we do in reverse, but we also need to identify those elements of the list that are pairs themselves and apply the reversal process to those as well. This means at each iteration we now have three cases instead of two to consider:

If the remaining list is empty then return the result.
If the head of remaining is not a pair then put it onto the head of the result list and make the iterative call as before.
If the head of remaining is a pair then put the result of deep reversing that onto the head of the result list and make the iterative call.

We can translate this into a procedure as follows:

(define (deep-reverse l)
  (define (iter remaining result)
    (cond ((null? remaining) result)
          ((pair? (car remaining))
            (iter (cdr remaining) (cons (deep-reverse (car remaining)) result)))
          (else (iter (cdr remaining) (cons (car remaining) result)))))
  (iter l nil))

...and check it works as expected:

> (deep-reverse x)
((4 3) (2 1))
> (deep-reverse (list (list (list 1 2) (list 3 4) 5) (list 6 (list 7 8) 9) (list 10)))
((10) (9 (8 7) 6) (5 (4 3) (2 1)))

SICP Exercise 2.26: Combining Lists

Suppose we define x and y to be two lists:

(define x (list 1 2 3))
(define y (list 4 5 6))

What result is printed by the interpreter in response to evaluating each of the following expressions:

(append x y)

(cons x y)

(list x y)

Let's fire up our interpreter and find out...

> (define x (list 1 2 3))
> (define y (list 4 5 6))
> (append x y)
(1 2 3 4 5 6)
> (cons x y)
((1 2 3) 4 5 6)
> (list x y)
((1 2 3) (4 5 6))

In the first case we're using append which takes two lists and produces a new list containing all of the elements from both lists in order.

In the second case we're using cons, which creates a pair using its two arguments. As the second argument is a list the resulting structure will also be a list... and as the first argument is a list the first element of the list will be that list.

In the final case we're using list, which creates a list containing all of its arguments. We have two arguments, both of which are lists, so the resulting structure is a two-element list, each element of which is a list.

SICP Exercise 2.25: Nested `car`s and `cdr`s

Give combinations of cars and cdrs that will pick 7 from each of the following lists:

(1 3 (5 7) 9)

((7))

(1 (2 (3 (4 (5 (6 7))))))

We can take this exercise quite literally and just use car and cdr. Let's do that to start with. Then we'll look at using abbreviations for the nested combinations as outlined in footnote 9 in this section.

Let's start at the top then. (list 1 3 (list 5 7) 9) defines a list with 4 elements, the third of which is a list of two elements, the last element of which is 7. First of all we need to get to the third element in the outer list. We know that, when processing lists, car gives us head of the current list and cdr gives us the tail of the current list. Taking the tail of the outer list will give us the three tail elements, of which the second is the one we want, so we need to take the tail of this again to give us the tail two elements, and then take the head to give us the inner list. The inner list has 7 as its second element, so we need to take the tail of this list, which is a single element list (i.e. a pair with a value, 7 in this case, as the first element and nil as the second). So we finally take the head of this list to get to 7.

In practice this is:

> (car (cdr (car (cdr (cdr (list 1 3 (list 5 7) 9))))))
7

The second list, (list (list 7)), is a list with a single element which is itself a list with a single element, 7. So to get to 7 we can take the head of the outer list, giving us the inner list, and then take the head of the inner list, to give us 7. In other words:

> (car (car (list (list 7))))
7

In the case of the third list we have a repeated structure running through it. Starting at the outer list we have a list of two elements: the first is a numeric value and the second is itself a list of the same structure. This nesting of lists through the second element repeats all the way down to the sixth list, where the second element is the value 7 instead of another list. Now to get the second value of a list we need to take the head of the tail of the list (i.e. (car (cdr list))). As the value 7 is the second element of the sixth list in this repeated structure, we can simply repeat this action six times to pick it out:

> (car
    (cdr
      (car
        (cdr
          (car
            (cdr
              (car
                (cdr
                  (car
                    (cdr
                      (car
                        (cdr
                          (list 1
                                (list 2
                                      (list 3
                                            (list 4
                                                  (list 5
                                                        (list 6
                                                              7))))))))))))))))))
7

All these nested cars and cdrs are pretty ugly though and, as noted in the footnote referenced above, Scheme provides abbreviations for such nesting. The footnote states that "The names of all such procedures start with c and end with r. Each a between them stands for a car operation and each d for a cdr operation, to be applied in the same order in which they appear in the name." However, we can't just change e.g. the last one to (cadadadadadadr <list>). The Scheme standard only requires that interpreters support combinations up to 4 deep. We can still nest those though, and so reduce the levels of nesting greatly:

> (car (cdaddr (list 1 3 (list 5 7) 9)))
7
> (caar (list (list 7)))
7
> (cadadr (cadadr (cadadr (list 1 (list 2 (list 3 (list 4 (list 5 (list 6 7)))))))))
7

SICP Exercise 2.24: Lists, Boxes, Pointers and Trees

Suppose we evaluate the expression (list 1 (list 2 (list 3 4))). Give the result printed by the interpreter, the corresponding box-and-pointer structure, and the interpretation of this as a tree (as in figure 2.6).

Okay, interpreter first:

> (list 1 (list 2 (list 3 4)))
(1 (2 (3 4)))

Now to produce the corresponding box-and-pointer representation we need to remember that a list is a sequence of pairs where the first element of the n^th pair is the n^th value in the list, and the second element of the n^th pair is either another pair (the (n+1)^th pair) or is nil if the list has n elements. So, for example, at the top level we have a list containing two elements, the first of which is the value 1 and the second is the list produced by (list 2 (list 3 4)). This will be represented by pair with the first element being the value 1 and the second element being another pair. This second pair will have its first element being the list produced by (list 2 (list 3 4)) and its second element being nil.

Applying this reasoning to the whole structure we can produce the following box-and-pointer representation:

When representing lists of lists as trees we can ignore the underlying pair-based representation and simply concentrate on the members of each level of list and their ordering. Now in this case every list has two members: a numeric value as the first member and a sublist as the second member (apart from the last list, where the second member is also a numeric value). This can be represented as a tree as follows:

2011-09-27

SICP Exercise 2.23: For Each

The procedure for-each is similar to map. It takes as arguments a procedure and a list of elements. However, rather than forming a list of the results, for-each just applies the procedure to each of the elements in turn, from left to right. The values returned by applying the procedure to the elements are not used at all -- for-each is used with procedures that perform an action, such as printing. For example,

> (for-each (lambda (x) (newline) (display x))
            (list 57 321 88))
57
321
88

The value returned by the call to for-each (not illustrated above) can be something arbitrary, such as true. Give an implementation of for-each.

Well, the definition of map given in the book is:

(define (map proc items)
  (if (null? items)
      nil
      (cons (proc (car items))
            (map proc (cdr items)))))

The procedure for-each differs from map in only a couple of points:

It doesn't need to cons the results of applying proc to (car items) - it just needs to apply the procedure.
It can just return #t instead of a list.

We could just use map to implement this. After all, map applies proc to each element in turn. It happens to produce a list as a result, but we could throw this away and return #t:

(define (for-each proc items)
  (map proc items)
  #t)

This works as expected:

> (for-each (lambda (x) (newline) (display x))
            (list 57 321 88))
57
321
88 #t

However, this has the drawback that we're generating a potentially very large intermediate list and just throwing it away, so let's try to be a bit more space efficient.

When we try this we'll discover that, while we can replace nil with #t okay, we can't just drop the cons from around the application of proc to the head of the list and the recursion of the tail of the list. While we now want to evaluate a couple of procedures separately in the alternative of if, if doesn't support this. It takes three parameters only: the predicate, consequent and alternative. However, we can get around this by replacing the if with a cond as follows:

(define (for-each proc items)
  (cond ((null? items) #t)
        (else (proc (car items))
              (for-each proc (cdr items)))))

Again, this gives the expected results

> (for-each (lambda (x) (newline) (display x))
            (list 57 321 88))
57
321
88 #t
> (for-each (lambda (x) (newline) (display x))
            (list 1 2 3 4 5))
1
2
3
4
5 #t

SICP Exercise 2.22: Reversed Squares

Louis Reasoner tries to rewrite the first square-list procedure of exercise 2.21 so that it evolves an iterative process:

(define (square-list items)
  (define (iter things answer)
    (if (null? things)
        answer
        (iter (cdr things) 
              (cons (square (car things))
                    answer))))
  (iter items nil))

Unfortunately, defining square-list this way produces the answer list in the reverse order of the one desired. Why?

Louis then tries to fix his bug by interchanging the arguments to cons:

(define (square-list items)
  (define (iter things answer)
    (if (null? things)
        answer
        (iter (cdr things)
              (cons answer
                    (square (car things))))))
  (iter items nil))

This doesn't work either. Explain.

Back in exercise 2.18 we defined the following procedure for reversing a list:

(define (reverse l)
  (define (iter remaining result)
    (if (null? remaining)
        result
        (iter (cdr remaining) (cons (car remaining) result))))
  (iter l nil))

Now looking carefully at Louis' first iterative implementation of square-list we can see that it is practically identical to our implementation of reverse. Ignoring the different names and formatting, it differs only in the first argument passed to cons when building up the result and equivalent answer lists. In the case of reverse it simply uses (car remaining). However, for Louis' first iterative implementation of square-list it uses (square (car things)). I.e. it squares the car of the remaining elements to process, rather than just using the car of the remaining list directly.

Based on this we can say that what Louis has done is to simply extend the reverse procedure so that it squares each value during the reversal procedure. He's starting with an empty list as the result and then going through the list to square starting with the head, squaring the current value and putting the result onto the head of the result list. So the square of the first element becomes the last item in the list, the square of the second element becomes the second-last item in the list and so on.

Louis' second attempt at producing an iterative implementation of square-list exchanges the two arguments to cons. While this corrects the ordering of the squares it no longer produces a list as the result.

What we find instead is that answer, the result list build up so far, is used as the first argument, while (square (car things)), the square of the current element, is used as the second argument. Now cons only builds a pair - it does not guarantee to produce a validly constructed list. To produce a valid list, although not necessarily a flattened list, from cons the second argument must be itself a valid list. Instead what Louis' procedure does is to return a pair, the second element of which is the square of the last item in the list, while the first element in the pair is itself a pair, with the second element being the square of the second-last item in the list and the first element being a pair... And this nesting of pairs in the first element continues until we reach a pair in which the first element is the empty list and the second element is the square of the first element in the list.

Of course a picture's worth a thousand words, so let's see it in action:

> (square-list (list 1 2 3 4 5 6 7 8 9 10))
((((((((((() . 1) . 4) . 9) . 16) . 25) . 36) . 49) . 64) . 81) . 100)

If you look at this structure closely what you'll see that the pairs that are constructed are a sort of "reverse list structure". A list is constructed of pairs in which the second element of each pair in the list is itself a pair, except for the last pair where it is nil. The values held in the list are the first elements of each pair. What Louis' implementation has constructed is a data structure composed of pairs in which the first element of each pair is itself a pair, except for the first pair where it is nil. In Louis' structure the values are the second elements of each pair.

SICP Exercise 2.21: Square Maps

The procedure square-list takes a list of numbers as argument and returns a list of the squares of those numbers.

> (square-list (list 1 2 3 4))
(1 4 9 16)

Here are two different definitions of square-list. Complete both of them by filling in the missing expressions:

(define (square-list items)
  (if (null? items)
      nil
      (cons <??> <??>)))
(define (square-list items)
  (map <??> <??>))

We're given two skeleton procedures here. The first procedure is similar to the recursive definition of scale-list in the Mapping over lists section of the book and is obviously the basis for a recursive procedure, while the second is analogous to the map-based definition of scale-list. The difference between scale-list and square-list is simply that the former scales each element of the list by some factor while the latter squares each element of the list. So our square-list implementations will bear an uncanny resemblance to scale-list, with just the scaling replaced with squaring.

Assuming we've got square defined as before:

(define (square x) (* x x))

...then we can produce the recursive implementation as follows:

(define (square-list items)
  (if (null? items)
      nil
      (cons (square (car items)) (square-list (cdr items)))))

...which produces the results:

> (square-list (list 1 2 3 4 5 6 7 8 9 10))
(1 4 9 16 25 36 49 64 81 100)

When it comes to the map-based version, we don't need to define a lambda-procedure as was necessary for scale-list, as we already have a procedure defined that provides the functionality we require: square. So we can define the map-based version as follows:

(define (square-list items)
  (map square items))

Of course this gives exactly the same results:

> (square-list (list 1 2 3 4 5 6 7 8 9 10))
(1 4 9 16 25 36 49 64 81 100)

SICP Exercise 2.20: Parity Lists

The procedures +, *, and list take arbitrary numbers of arguments. One way to define such procedures is to use define with dotted-tail notation. In a procedure definition, a parameter list that has a dot before the last parameter name indicates that, when the procedure is called, the initial parameters (if any) will have as values the initial arguments, as usual, but the final parameter's value will be a list of any remaining arguments. For instance, given the definition

(define (f x y . z) <body>)

the procedure f can be called with two or more arguments. If we evaluate

(f 1 2 3 4 5 6)

then in the body of f, x will be 1, y will be 2, and z will be the list (3 4 5 6). Given the definition

(define (g . w) <body>)

the procedure g can be called with zero or more arguments. If we evaluate

(g 1 2 3 4 5 6)

then in the body of g, w will be the list (1 2 3 4 5 6).

Use this notation to write a procedure same-parity that takes one or more integers and returns a list of all the arguments that have the same even-odd parity as the first argument. For example,

> (same-parity 1 2 3 4 5 6 7)
(1 3 5 7)
> (same-parity 2 3 4 5 6 7)
(2 4 6)

Okay, so if we our definition of same-parity takes the following form:

(define (same-parity i . l) <body>)

...then this allows us to implement a procedure in the following manner:

Figure out whether i is even or odd.
Recurse through l and filter it to those elements with the same parity as i.
Finally, use cons to prepend i onto the head of the filtered list.

The second step here is the most complex. We can implement this as an internal recursive procedure that takes an indication of the parity to filter for and the remainder of the list to filter and has three cases:

If the list is empty then return an empty list.
If the head of the list has the parity we're looking for then we cons the head onto the results of filtering the tail of the list via a recursive call and return that.
Otherwise we want to filter the head of the list, so we just return the results of filtering the tail of the list via a recursive call and return that.

Here's my implementation:

(define (same-parity i . l)
  (define (filter-list filter-to-even ll)
    (cond ((null? ll) nil)
          ((equal? filter-to-even (even? (car ll)))
              (cons (car ll) (filter-list filter-to-even (cdr ll))))
          (else (filter-list filter-to-even (cdr ll)))))
  (cons i (filter-list (even? i) l)))

...and here's the results of running it:

> (same-parity 1 2 3 4 5 6 7)
(1 3 5 7)
> (same-parity 2 3 4 5 6 7)
(2 4 6)

SICP Exercise 2.19: Counting UK Change

Consider the change-counting program of section 1.2.2. It would be nice to be able to easily change the currency used by the program, so that we could compute the number of ways to change a British pound, for example. As the program is written, the knowledge of the currency is distributed partly into the procedure first-denomination and partly into the procedure count-change (which knows that there are five kinds of U.S. coins). It would be nicer to be able to supply a list of coins to be used for making change.

We want to rewrite the procedure cc so that its second argument is a list of the values of the coins to use rather than an integer specifying which coins to use. We could then have lists that defined each kind of currency:

(define us-coins (list 50 25 10 5 1))
(define uk-coins (list 100 50 20 10 5 2 1 0.5))

We could then call cc as follows:

> (cc 100 us-coins)
292

To do this will require changing the program cc somewhat. It will still have the same form, but it will access its second argument differently, as follows:

(define (cc amount coin-values)
  (cond ((= amount 0) 1)
        ((or (< amount 0) (no-more? coin-values)) 0)
        (else
         (+ (cc amount
                (except-first-denomination coin-values))
            (cc (- amount
                   (first-denomination coin-values))
                coin-values)))))

Define the procedures first-denomination, except-first-denomination, and no-more? in terms of primitive operations on list structures. Does the order of the list coin-values affect the answer produced by cc? Why or why not?

The exercise already tells us the representation we're going to be using for denominations: lists of the denomination values. As a result, the procedures first-denomination, except-first-denomination, and no-more? are straightforward to define. first-denomination just needs to return the car of the list, except-first-denomination just needs to return the cdr of the list, and no-more? just needs to check to see if the list is empty:

(define (first-denomination coin-values)
  (car coin-values))

(define (except-first-denomination coin-values)
  (cdr coin-values))

(define (no-more? coin-values)
  (null? coin-values))

We can check this works as expected:

> (cc 100 us-coins)
292
> (cc 100 uk-coins)
104561

As for whether or not the order of the list coin-values affects the answer produced by cc, we can easily test this:

> (cc 100 (list 1 5 10 25 50))
292
> (cc 100 (list 5 25 50 10 1))
292

So no, it doesn't appear as if it does affect ordering. So why not? Well, looking at the algorithm we can see that each non-terminal call to cc sums the results of two (also potentially non-terminal) recursive calls:

One of these finds the number of ways we can change amount using denominations other than the first one in the list of denominations. As a result, this counts all of the combinations for changing amount that do not use the first denomination.
The other deducts the value of the first denomination in the list of denominations from amount and finds how many ways we can change that using any and all of the denominations in the list. As a result, this counts all of the combinations for changing amount that do use the first denomination.

In other words, these two recursive calls split the counting to cover two distinct sets of coin combinations which, when merged together, produces a set containing all valid combinations of denominations for changing amount. As a result the order of the denominations does not matter.

SICP Exercise 2.18: Reversing a List

Note: Neither of the interpreters I've been using support nil, but will quite happily accept () as an equivalent expression. If you're in the same situation you can get around this by using:

(define nil ())

Define a procedure reverse that takes a list as argument and returns a list of the same elements in reverse order:

> (reverse (list 1 4 9 16 25))
(25 16 9 4 1)

The logical first thought here is to just recurse through the list, take each item in turn and add it to the end of the list produced by recursing through the remainder of the list. This however doesn't work. Here's an implementation of this broken version:

(define (broken-reverse l)
  (if (null? l)
      nil
      (cons (broken-reverse (cdr l)) (car l))))

And here's what happens when we run it:

> (broken-reverse (list 1 2 3 4))
((((() . 4) . 3) . 2) . 1)

As you can see what we're actually doing here is to build up a pair where the first element of the pair is the "paired-reversal" of the remainder of the list and the second element is the head of the list. Part of the problem here is that cons produces pairs, not lists. When we perform (cons (broken-reverse (cdr l)) (car l)) we're creating a pair in which the second item is a value, not a pair as it would be in a list. Changing cons to list doesn't help either:

(define (broken-reverse-2 l)
  (if (null? l)
      nil
      (list (broken-reverse-2 (cdr l)) (car l))))

...gives:

> (broken-reverse-2 (list 1 2 3 4))
((((() 4) 3) 2) 1)

We've got a list alright, but it's a two-element list in which the first element of the list is a recursive reversal of the remainder of the list (and so has a list for its first element) and the second element is the head of the list. To use this we'd need to flatten the list on completion.

An alternative approach would be to use append to produce the flattened list as we go:

(define (reverse l)
  (if (null? l)
      nil
      (append (reverse (cdr l)) (list (car l)))))

This produces the effect we want, but is less than efficient. For each element in the list we have to append the results of reversing the remainder of the list to the head of the list containing the current element. append is an Θ(n) operation, so this implementation of reverse will be Θ(n²).

No, we can do better than that, and we can do it with an iterative procedure. Iteratively we can state the problem of reversing the list as starting with two lists: the list we want to reverse and an empty list. We iterate through the first list and for each item in the first list we cons it onto the head of the second list. So for the first item in the first list we cons it onto an empty list, for the second item we cons it onto a list containing just the first item and so on. Here's my implementation:

(define (reverse l)
  (define (iter remaining result)
    (if (null? remaining)
        result
        (iter (cdr remaining) (cons (car remaining) result))))
  (iter l nil))

And here's a few tests:

> (reverse (list 1 2 3 4))
(4 3 2 1)
> (reverse (list 4 3 2 1))
(1 2 3 4)
> (reverse (list 1 4 9 16 25))
(25 16 9 4 1)

SICP Exercise 2.17: Last Pair

Define a procedure last-pair that returns the list that contains only the last element of a given (nonempty) list:

> (last-pair (list 23 72 149 34))
(34)

We know that we can identify the last pair in a list by the cdr of the pair being nil, and returning this pair will give us the result we need. All we need to do is to walk through the list recursively, in a similar manner to the list operations defined in the book, and stop when we reach such a pair.

Here's my implementation:

(define (last-pair l)
  (let ((tail (cdr l)))
     (if (null? tail)
         l
         (last-pair tail))))

And here it is in action:

> (last-pair (list 23 72 149 34))
(34)
> (last-pair (list 1 2 3 4))
(4)
> (last-pair (list 4 3 2 1))
(1)

2011-09-26

SICP Exercise 2.16: IEEE Interval Standard – P1788

Explain, in general, why equivalent algebraic expressions may lead to different answers. Can you devise an interval-arithmetic package that does not have this shortcoming, or is this task impossible? (Warning: This problem is very difficult.)

In exercise 2.14, the different arithmetic operations have different effects upon the center values and tolerances of the produced intervals. We also saw that our system as implemented has no concept of identity. I.e. A/A ≠ 1 ± 0%. These issues mean that we cannot apply standard algebraic transformations to a formula and expect it to produce an interval with the same bounds afterwards. This was demonstrated quite clearly in exercise 2.15, where two algebraically equivalent formulae were shown to produce intervals with different bounds.

In general we can state that the issue is that the more interval-arithmetic operations we perform during a calculation, the wider the resulting interval will be.

As to devising an interval-arithmetic package that does not have this shortcoming... As the authors state, "This problem is very difficult". So much so that the IEEE started working on a standard for interval arithmetic back in 2008... and they're still going as I write this. So, yes, it's very difficult - and certainly beyond the scope of our current exposure to Scheme at this point in the book!

SICP Exercise 2.15: Better Bounds

Eva Lu Ator, another user, has also noticed the different intervals computed by different but algebraically equivalent expressions. She says that a formula to compute with intervals using Alyssa's system will produce tighter error bounds if it can be written in such a form that no variable that represents an uncertain number is repeated. Thus, she says, par2 is a "better" program for parallel resistances than par1. Is she right? Why?

Yes, Eva is right, and we can show this to be the case by substitution. Firstly, let's assume we have two intervals, x and y that have the lower and upper bounds [l_x, u_x] and [l_y, u_y] respectively. Also, let's assume that x and y are non-negative intervals as we defined in exercise 2.11. We then can greatly simplify mul-interval to:

(define (mul-interval x y)
  (make-interval (* (lower-bound x) (lower-bound y))
                 (* (upper-bound x) (upper-bound y))))

We can then work through the evaluation of each program in turn.

Let's start with par1:

  (par1 x y)
= (div-interval (mul-interval x y) (add-interval x y))
= (div-interval (make-interval (* (lower-bound x) (lower-bound y))
                               (* (upper-bound x) (upper-bound y)))
                (add-interval x y))
= (div-interval (make-interval (* l_x l_y) (* u_x u_y))
                (add-interval x y))
= (div-interval (cons (* l_x l_y) (* u_x u_y))
                (add-interval x y))
= (div-interval (cons (* l_x l_y) (* u_x u_y))
                (make-interval (+ (lower-bound x) (lower-bound y))
                               (+ (upper-bound x) (upper-bound y))))
= (div-interval (cons (* l_x l_y) (* u_x u_y)) (make-interval (+ l_x l_y) (+ u_x u_y)))
= (div-interval (cons (* l_x l_y) (* u_x u_y)) (cons (+ l_x l_y) (+ u_x u_y)))
= (mul-interval (cons (* l_x l_y) (* u_x u_y))
                (make-interval (/ 1.0 (upper-bound (cons (+ l_x l_y) (+ u_x u_y))))
                               (/ 1.0 (lower-bound (cons (+ l_x l_y) (+ u_x u_y))))))
= (mul-interval (cons (* l_x l_y) (* u_x u_y))
                (make-interval (/ 1.0 (+ u_x u_y)) (/ 1.0 (+ l_x l_y))))
= (mul-interval (cons (* l_x l_y) (* u_x u_y))
                (cons (/ 1.0 (+ u_x u_y)) (/ 1.0 (+ l_x l_y))))
= (make-interval (* (lower-bound (cons (* l_x l_y) (* u_x u_y)))
                    (lower-bound (cons (/ 1.0 (+ u_x u_y)) (/ 1.0 (+ l_x l_y)))))
                 (* (upper-bound (cons (* l_x l_y) (* u_x u_y)))
                    (upper-bound (cons (/ 1.0 (+ u_x u_y)) (/ 1.0 (+ l_x l_y)))))))
= (make-interval (* (* l_x l_y) (/ 1.0 (+ u_x u_y))) (* (* u_x u_y) (/ 1.0 (+ l_x l_y))))
= (cons (* (* l_x l_y) (/ 1.0 (+ u_x u_y))) (* (* u_x u_y) (/ 1.0 (+ l_x l_y))))
≡ (cons (/ (* l_x l_y) (+ u_x u_y)) (/ (* u_x u_y) (+ l_x l_y)))

We now have the bounds for intervals produced by par1, so let's move on to par2:

  (par2 x y)
= (let ((one (make-interval 1 1))) 
    (div-interval one
                  (add-interval (div-interval one x)
                                (div-interval one y))))
= (let ((one (cons 1 1))) 
    (div-interval one
                  (add-interval (div-interval one x)
                                (div-interval one y))))
= (div-interval (cons 1 1)
                (add-interval (div-interval (cons 1 1) x)
                              (div-interval (cons 1 1) y))))
= (div-interval (cons 1 1)
                (add-interval (div-interval (cons 1 1) x)
                              (mul-interval (cons 1 1) 
                                            (make-interval (/ 1.0 (upper-bound y))
                                                           (/ 1.0 (lower-bound y))))))
= (div-interval (cons 1 1)
                (add-interval (div-interval (cons 1 1) x)
                              (mul-interval (cons 1 1) 
                                            (make-interval (/ 1.0 u_y) (/ 1.0 l_y)))))
= (div-interval (cons 1 1)
                (add-interval (div-interval (cons 1 1) x)
                              (mul-interval (cons 1 1) 
                                            (cons (/ 1.0 u_y) (/ 1.0 l_y)))))
= (div-interval (cons 1 1)
                (add-interval (div-interval (cons 1 1) x)
                              (make-interval (* (lower-bound (cons 1 1))
                                                (lower-bound (cons (/ 1.0 u_y)
                                                                   (/ 1.0 l_y))))
                                             (* (upper-bound (cons 1 1))
                                                (upper-bound (cons (/ 1.0 u_y)
                                                                   (/ 1.0 l_y)))))))
= (div-interval (cons 1 1)
                (add-interval (div-interval (cons 1 1) x)
                              (make-interval (* 1 (/ 1.0 u_y))
                                             (* 1 (/ 1.0 l_y)))))
≡ (div-interval (cons 1 1)
                (add-interval (div-interval (cons 1 1) x)
                              (make-interval (/ 1.0 u_y) (/ 1.0 l_y))))
= (div-interval (cons 1 1)
                (add-interval (mul-interval (cons 1 1) 
                                            (make-interval (/ 1.0 u_x) (/ 1.0 l_x)))
                              (cons (/ 1.0 u_y) (/ 1.0 l_y))))
= (div-interval (cons 1 1)
                (add-interval (mul-interval (cons 1 1) 
                                            (cons (/ 1.0 u_x) (/ 1.0 l_x)))
                              (cons (/ 1.0 u_y) (/ 1.0 l_y))))
= (div-interval (cons 1 1)
                (add-interval (make-interval (* (lower-bound (cons 1 1))
                                                (lower-bound (cons (/ 1.0 u_x)
                                                                   (/ 1.0 l_x))))
                                             (* (upper-bound (cons 1 1))
                                                (upper-bound (cons (/ 1.0 u_x)
                                                                   (/ 1.0 l_x))))
                              (cons (/ 1.0 u_y) (/ 1.0 l_y))))
= (div-interval (cons 1 1)
                (add-interval (make-interval (* 1 (/ 1.0 u_x))
                                             (* 1 (/ 1.0 l_x)))
                              (cons (/ 1.0 u_y) (/ 1.0 l_y))))
= (div-interval (cons 1 1)
                (add-interval (cons (* 1 (/ 1.0 u_x)) (* 1 (/ 1.0 l_x)))
                              (cons (/ 1.0 u_y) (/ 1.0 l_y))))
≡ (div-interval (cons 1 1)
                (add-interval (cons (/ 1.0 u_x) (/ 1.0 l_x))
                              (cons (/ 1.0 u_y) (/ 1.0 l_y))))
= (div-interval (cons 1 1)
                (make-interval (+ (lower-bound (cons (/ 1.0 u_x) (/ 1.0 l_x)))
                                  (lower-bound (cons (/ 1.0 u_y) (/ 1.0 l_y))))
                               (+ (upper-bound (cons (/ 1.0 u_x) (/ 1.0 l_x)))
                                  (upper-bound (cons (/ 1.0 u_y) (/ 1.0 l_y))))))
= (div-interval (cons 1 1)
                (make-interval (+ (/ 1.0 u_x) (/ 1.0 u_y))
                               (+ (/ 1.0 l_x) (/ 1.0 l_y))))
= (div-interval (cons 1 1)
                (cons (+ (/ 1.0 u_x) (/ 1.0 u_y)) (+ (/ 1.0 l_x) (/ 1.0 l_y))))
= (mul-interval (cons 1 1)
                (make-interval (/ 1.0 (upper-bound (cons (+ (/ 1.0 u_x) (/ 1.0 u_y))
                                                         (+ (/ 1.0 l_x) (/ 1.0 l_y)))))
                               (/ 1.0 (lower-bound (cons (+ (/ 1.0 u_x) (/ 1.0 u_y))
                                                         (+ (/ 1.0 l_x) (/ 1.0 l_y))))))))
= (mul-interval (cons 1 1)
                (make-interval (/ 1.0 (+ (/ 1.0 l_x) (/ 1.0 l_y)))
                               (/ 1.0 (+ (/ 1.0 u_x) (/ 1.0 u_y)))))
= (mul-interval (cons 1 1)
                (cons (/ 1.0 (+ (/ 1.0 l_x) (/ 1.0 l_y)))
                      (/ 1.0 (+ (/ 1.0 u_x) (/ 1.0 u_y)))))
= (make-interval (* (lower-bound (cons 1 1))
                    (lower-bound (cons (/ 1.0 (+ (/ 1.0 l_x) (/ 1.0 l_y)))
                                       (/ 1.0 (+ (/ 1.0 u_x) (/ 1.0 u_y))))))
                 (* (upper-bound (cons 1 1))
                    (upper-bound (cons (/ 1.0 (+ (/ 1.0 l_x) (/ 1.0 l_y)))
                                       (/ 1.0 (+ (/ 1.0 u_x) (/ 1.0 u_y))))))))
= (make-interval (* 1 (/ 1.0 (+ (/ 1.0 l_x) (/ 1.0 l_y))))
                 (* 1 (/ 1.0 (+ (/ 1.0 u_x) (/ 1.0 u_y)))))
= (cons (* 1 (/ 1.0 (+ (/ 1.0 l_x) (/ 1.0 l_y))))
        (* 1 (/ 1.0 (+ (/ 1.0 u_x) (/ 1.0 u_y)))))
≡ (cons (/ 1.0 (+ (/ 1.0 l_x) (/ 1.0 l_y))) (/ 1.0 (+ (/ 1.0 u_x) (/ 1.0 u_y))))
≡ (cons (/ 1.0 (/ (+ l_x l_y) (* l_x l_y))) (/ 1.0 (/ (+ u_x u_y) (* u_x u_y))))
≡ (cons (/ (* l_x l_y) (+ l_x l_y)) (/ (* u_x u_y) (+ u_x u_y)))

And now we have the bounds for intervals produced by par2. Let's bring them together and have a look:

(par1 x y) = (cons (/ (* l_x l_y) (+ u_x u_y)) (/ (* u_x u_y) (+ l_x l_y)))
(par2 x y) = (cons (/ (* l_x l_y) (+ l_x l_y)) (/ (* u_x u_y) (+ u_x u_y)))

As you can see they differ only in the denominators of the bounds. (par1 x y) uses the sum of the upper bounds of x and y as the denominator for the lower bound, and the sum of the upper bounds of x and y as the denominator for the lower bound. (par2 x y) swaps these around. What effect does this have? Well, as we're dealing with non-negative intervals we know summing the upper bounds produces a larger non-negative value than summing the lower bounds. Therefore, dividing a non-negative value by the former will produce a smaller value than if we divide it by the latter. This means that par1 will produce a smaller lower bound than par2 and a larger upper bound than par2. Or, to put it in other words, par1 will produce a larger interval than par2, and so par2 will provide a more accurate calculation of parallel resistances.

SICP Exercise 2.14: Broken Intervals

After considerable work, Alyssa P. Hacker delivers her finished system. Several years later, after she has forgotten all about it, she gets a frenzied call from an irate user, Lem E. Tweakit. It seems that Lem has noticed that the formula for parallel resistors can be written in two algebraically equivalent ways:

  R₁R₂ 
R₁ + R₂

and

     1     
1/R₁ + 1/R₂

He has written the following two programs, each of which computes the parallel-resistors formula differently:

(define (par1 r1 r2)
  (div-interval (mul-interval r1 r2)
                (add-interval r1 r2)))
(define (par2 r1 r2)
  (let ((one (make-interval 1 1))) 
    (div-interval one
                  (add-interval (div-interval one r1)
                                (div-interval one r2)))))

Lem complains that Alyssa's program gives different answers for the two ways of computing. This is a serious complaint.

Demonstrate that Lem is right. Investigate the behavior of the system on a variety of arithmetic expressions. Make some intervals A and B, and use them in computing the expressions A/A and A/B. You will get the most insight by using intervals whose width is a small percentage of the center value. Examine the results of the computation in center-percent form (see exercise 2.12).

Before we start building intervals and trying things out, let's first define a procedure for displaying intervals in center-percent form. The implementation I defined uses the center and percent selectors from exercise 2.12:

(define (print-center-percent i)
  (newline)
  (display (center i))
  (display " ± ")
  (display (percent i))
  (display "%"))

Now we can define a couple of intervals, a and b, and check this works okay:

> (define a (make-center-percent 100 1))
> (define b (make-center-percent 150 2))
> a
'(99 . 101)
> b
'(147 . 153)
> (print-center-percent a)
100 ± 1%
> (print-center-percent b)
150 ± 2%

All seems to be in order here, so let's try out par1 and par2:

> (print-center-percent (par1 a a))
50.02000200020002 ± 2.999200239928031%
> (print-center-percent (par2 a a))
50.0 ± 1.000000000000007%
> (print-center-percent (par1 a b))
60.05617438064145 ± 4.597193908142474%
> (print-center-percent (par2 a b))
59.99855963126561 ± 1.400072020165649%
> (print-center-percent (par1 b b))
75.12004801920769 ± 5.993607670795042%
> (print-center-percent (par2 b b))
75.0 ± 2.0%

So Lem is right - two equivalent formulae produce different results!

Note that if we change the tolerances to 0% for both values (i.e. we just use the center values for calculating the resistance for the parallel circuits), we find that R_aa = 50Ω, R_ab = 60Ω and R_bb = 75Ω. So not only do they produce different results, but par2 both produces results with center values closer to the values we might expect and produces tighter tolerances.

Now onto A/A and A/B. Let's also throw B/A and B/B into the mix as well:

> (print-center-percent (div-interval a a))
1.0002000200020003 ± 1.9998000199980077%
> (print-center-percent (div-interval a b))
0.6670668267306923 ± 2.999400119976001%
> (print-center-percent (div-interval b a))
1.5004500450045004 ± 2.9994001199760123%
> (print-center-percent (div-interval b b))
1.0008003201280513 ± 3.9984006397441028%

The first point of interest to note is that our current system has no idea of identity. If a represents a particular instance of an interval then, while we may be unsure of its exact value, it does have an exact value and so (div-interval a a) should evaluate to 1 ± 0%. However it doesn't treat them as identical instances, but instead as two separate intervals that happen to have the same center value and tolerance, and so this doesn't happen.

The second point of interest is, with small tolerances at least, dividing one interval by another produces a new interval with a center point that is approximately equal to (actually slightly higher than) the value produced by dividing the center value of the numerator interval by the center value of the denominator interval and with a tolerance approximately equal to (actually slightly lower than) the sum of the tolerances of the two original intervals.

Assuming we use c_i to represent the center value of an interval and t_i to represent the percentage tolerance we can express this more succinctly as:

c_x/y ≈ c_x/c_y ∧ c_x/y ≥ c_x/c_y
t_x/y ≈ t_x/t_y ∧ t_x/y ≤ t_x + t_y

What about some other operations:

> (print-center-percent (mul-interval a a))
10001 ± 20000/10001%
> (print-center-percent (mul-interval a b))
15003 ± 5000/1667%
> (print-center-percent (mul-interval b a))
15003 ± 5000/1667%
> (print-center-percent (mul-interval b b))
22509 ± 10000/2501%
>
> (print-center-percent (add-interval a a))
200 ± 1%
> (print-center-percent (add-interval a b))
250 ± 8/5%
> (print-center-percent (add-interval b a))
250 ± 8/5%
> (print-center-percent (add-interval b b))
300 ± 2%
>
> (print-center-percent (sub-interval a a))
0 ± 1/0%
> (print-center-percent (sub-interval a b))
-50 ± -8%
> (print-center-percent (sub-interval b a))
50 ± 8%
> (print-center-percent (sub-interval b b))
0 ± 1/0%

I'll translate some of those percentages...

20000/10001% ≈ 1.9998%
5000/1667% ≈ 2.9994%
10000/2501% ≈ 3.9984%
8/5% = 1.6%

So we can see some further relationships here:

c_xy ≈ c_xc_y ∧ c_xy ≥ c_xc_y
t_xy ≈ t_xt_y ∧ t_xy ≤ t_x + t_y

c_x+y = c_x + c_y
t_x+y = (t_x + t_y) / 2

c_x-y = c_x - c_y
t_x-y = 100(w_x + w_y)/(c_x - c_y)

Okay, so I "cheated" with t_x-y and actually worked out the formula from the definitions of centers, widths tolerance and sub-interval. However hopefully all of the above gives us some indication as to what's going on with Lem's programs. While the two programs are algebraically equivalent, the calculations they perform are different. As a result the center values and tolerances of the intervals they produce will also be different. In fact this is perhaps hinting at the answer to the next exercise...

2011-09-25

SICP Exercise 2.13: Tolerance is Simple

Show that under the assumption of small percentage tolerances there is a simple formula for the approximate percentage tolerance of the product of two intervals in terms of the tolerances of the factors. You may simplify the problem by assuming that all numbers are positive.

Okay, so let's assume we have two intervals, a and b, with centers c_a and c_b and percentage tolerances t_a and t_b. We can then say that the widths of a and b are given by:

w_a = c_at_a / 100
w_b = c_bt_b / 100

...and so we can express the bounds of our intervals, a and b, as:

a = [l_a, u_a] = [c_a - w_a, c_a + w_a]
b = [l_b, u_b] = [c_b - w_b, c_b + w_b]

If we further assume that a and b are non-negative integers (as we defined in the exercise 2.11) then we know that the bounds of the product of the two intervals, ab, are:

  ab
= [l_al_b, u_au_b]
= [(c_a - w_a)(c_b - w_b), (c_a + w_a)(c_b + w_b)]
= [c_ac_b - c_aw_b - w_ac_b + w_aw_b, c_ac_b + c_aw_b + w_ac_b + w_aw_b]

Now as t_a and t_b are small, we know that w_aw_b will also be small. Let's assume that w_aw_b is close to zero, or at least negligible with respect to the other terms in the lower and upper bounds, and so discard it to give:

  ab
= [c_ac_b - c_aw_b - w_ac_b + w_aw_b, c_ac_b + c_aw_b + w_ac_b + w_aw_b]
≈ [c_ac_b - c_aw_b - w_ac_b, c_ac_b + c_aw_b + w_ac_b]
= [c_ac_b - (c_aw_b + w_ac_b), c_ac_b + (c_aw_b + w_ac_b)]

Note that these bounds take the same form as that of the bounds of an interval expressed in terms of its center point and widths. So we can say that:

c_ab ≈ c_ac_b
w_ab ≈ c_aw_b + w_ac_b

...and so we can express the percentage tolerance as:

  t_ab
= 100w_a/c_a
≈ 100(c_aw_b + w_ac_b)/c_ac_b
= 100(c_aw_b)/c_ac_b + 100(w_ac_b)/c_ac_b
= 100w_b/c_b + 100w_a/c_a
= t_b + t_a

SICP Exercise 2.12: Tolerating Engineers

After debugging her program, Alyssa shows it to a potential user, who complains that her program solves the wrong problem. He wants a program that can deal with numbers represented as a center value and an additive tolerance; for example, he wants to work with intervals such as 3.5± 0.15 rather than [3.35, 3.65]. Alyssa returns to her desk and fixes this problem by supplying an alternate constructor and alternate selectors:

(define (make-center-width c w)
  (make-interval (- c w) (+ c w)))
(define (center i)
  (/ (+ (lower-bound i) (upper-bound i)) 2))
(define (width i)
  (/ (- (upper-bound i) (lower-bound i)) 2))

Unfortunately, most of Alyssa's users are engineers. Real engineering situations usually involve measurements with only a small uncertainty, measured as the ratio of the width of the interval to the midpoint of the interval. Engineers usually specify percentage tolerances on the parameters of devices, as in the resistor specifications given earlier.

Define a constructor make-center-percent that takes a center and a percentage tolerance and produces the desired interval. You must also define a selector percent that produces the percentage tolerance for a given interval. The center selector is the same as the one shown above.

The percentage tolerance is another way of specifying the width of an interval, so our constructor will share some commonality with make-center-width. It will take two parameters, the first of which is the center value of the interval. It will also call make-interval with two bounds calculated by respectively subtracting and adding the width of the interval from the center value. Where it will differ will be in the second operand and what we do with it. The second parameter will express the width of the interval as a percentage of the center value, rather than as an absolute value. We will then have to use this to calculate the width of the interval so that we can use that to calculate the lower and upper bounds of the interval. This is simply a case of multiplying the center value by the percentage value and then dividing by 100.

Here's our constructor:

(define (make-center-percent c p)
  (let ((w (/ (* c p) 100)))
    (make-interval (- c w) (+ c w))))

To define the selector percent we need to find the width of the interval and then determine what percentage of the center value the width is. I.e. reverse the calculation we used above to produce the width from the percentage tolerance. Given that we have the center and width selectors Alyssa defined above we can define our selector simply as:

(define (percent i)
  (/ (* (width i) 100) (center i)))

Alternatively, if we express this as an algebraic expression it's easy to see how we could reduce the operations in the calculation. If we express the lower bound of the interval as l_i and the upper bound as u_i then the calculation of (percent i) is equivalent to:

  100 × (u_i - l_i)/2
     (l_i + u_i)/2
= 100 × (u_i - l_i)/2
        (l_i + u_i)/2
= 100 × u_i - l_i
        l_i + u_i

So we could also define percent as:

(define (percent i)
  (* 100
     (/ (- (upper-bound i) (lower-bound i))
        (+ (upper-bound i) (lower-bound i)))))

This removes the divide-by-two that occurs in the calculations of both the width and the center values, only to have the division of width by center values to cancel out the divide-by-twos.

Let's test it out to confirm it works:

> (define cp1 (make-center-percent 100 10))
> (define cp2 (make-center-percent 50 25))
> (center cp1)
100
> (center cp2)
50
> (percent cp1)
10
> (percent cp2)
25

SICP Exercise 2.11: Cryptic Multiplication

In passing, Ben also cryptically comments: "By testing the signs of the endpoints of the intervals, it is possible to break mul-interval into nine cases, only one of which requires more than two multiplications."' Rewrite this procedure using Ben's suggestion.

Okay, so here Ben is suggesting that we don't need to calculate all of {p1, p2, p3, p4} in order to calculate mul-interval. Instead by looking at the signs of the lower and upper bounds of the two intervals it is possible, in the majority of cases, to know which two of {p1, p2, p3, p4} to use as the lower and upper bounds and so only need to calculate those. However, before we can produce such a procedure we need to determine the mapping from signs of bounds of intervals to pairs.

Before we get into that though, in order to simplify the discussion here I'm going to introduce a shorthand for refering to lower and upper bounds of intervals: for any interval x I'm going to use l_x to represent the lower bound of the interval and u_x to represent the upper bound. I'll also note that the set of values {p1, p2, p3, p4} corresponds to the set of products {l_xl_y, l_xu_y, u_xl_y, u_xu_y}. I'm going to use the latter in this discussion as we will no longer calculate the former set up front in the final procedure.

Now, back to determining the mapping from signs of bounds of intervals to pairs... In order to do this we must first consider not just the signs of the bounds but also the relationship between their absolute values. We can easily show this to be important if we consider multiplying a pair of intervals where the bounds are all non-negative (non-negative intervals) and then compare this to multiplying a pair of intervals where the bounds are all negative (negative intervals):

In the former case we know that l_x and l_y are the two smallest values from the two intervals and u_x and u_y are the two largest values from the two intervals. As a result we can deduce that l_xl_y will be the minimum of the four values in the set, while u_xu_y will be the maximum.
Looking at the latter case we can first observe that all four values in the set will be positive (as each value is the result of multiplying two negative numbers). However, looking at the absolute values of the bounds we can see that l_x and l_y have the two largest absolute values and u_x and u_y have the two smallest absolute values. As a result we can deduce that u_xu_y will be the minimum of the four values in the set, while l_xl_y will be the maximum.

We can also use the absolute values to deduce what happens when one interval, let's say x, is a non-negative interval and the other interval, y, is negative. In this case all of the values in the resulting set will be negative as each value is the product of a (non-negative) bound of x and a (negative) bound of y. As all the values in the set are negative, the lower bound of the resulting interval will be the value with the largest absolute value, while the upper bound will be the value with the smallest absolute value. So to produce the lower bound we need to determine the bounds which have the largest absolute values and multiply those together. Similarly, to produce the upper bound we need to determine the bounds which have the smallest absolute values and multiply those together. As x is a non-negative interval, the bound with the largest absolute value is u_x, while the bound with the smallest is l_x. However, as y is a negative interval the converse is true: the bound with the largest absolute value is l_y, while the bound with the smallest is u_y. Therefore the lower bound for the interval produced by multiplying x and y will be u_xl_y and the upper bound will be l_xu_y.

When we have a mixture of types of intervals things become slightly more complex, but there's an additional technique we can add to considering the signs and absolute values. Note that it is always possible to split up the set {l_xl_y, l_xu_y, u_xl_y, u_xu_y} into two subsets: one containing all the negative values and the other containing all the non-negative values. When there are a mixture of interval types (e.g. one is a negative interval, the other is a non-negative interval, or one is a non-negative interval and the other is a zero-spanning interval (where the lower bound is negative and the upper bound is non-negative)) we are guaranteed that both subsets will be non-empty. In such cases the lower bound will then be the value with the largest absolute value from the set of negative values, while the upper bound will be the value with the largest absolute value from the set of positive values.

For example, let's consider the case where the interval x spans zero and interval y is non-negative:

In this case we can see that any value in the set that is a multuple of l_x will be negative, while any value in the set that is a multuple of u_x will be positive. This gives us our two subsets:
- {l_xl_y, l_xu_y} contains the negative values.
- {u_xl_y, u_xu_y} contains the non-negative values.
To determine the lower bound we need to find the value with the largest absolute value from the set of negative values. The only difference between the two values is the y bound, so we just need to determine which of the two bounds of y has the largest absolute value. Now as y is a non-negative interval this will be its upper bound, u_y. As a result, the lower bound for the interval produced by multiplying x and y will be l_xu_y.
To determine the upper bound we need to find the value with the largest absolute value from the set of positive values. Again the only difference between the two values is the y bound, so we just need to determine which of the two bounds of y has the largest absolute value, and we already know this to be u_y. As a result, the upper bound for the interval produced by multiplying x and y will be u_xu_y.

We can continue this for all of the different combinations of negative, non-negative and zero-spanning intervals.

However, as Ben hinted, there's one case that we can't determine in this manner. It's the case when we have two zero-spanning intervals. In this case our two subsets are:

{l_xu_y, u_xl_y} for the negative values.
{l_xl_y, u_xu_y} for the non-negative values.

Unfortunately it's not possible to determine which member of each subset has the largest absolute value without examining the values. Why is this? Well, as both intervals are zero-spanning it's not possible to say anything about the relationship between the absolute values of their bounds. I.e. for a given zero-spanning interval i, we don't know whether it's the case that |l_i|≤|u_i| or it's the case that |l_i|≥|u_i|. As a result, when we multiply one bound from x with one bound from y and multiply the other bound from x with the other bound from y we won't be able to tell which will provide the larger or smaller value without actually comparing the values. So in this case we'll need to calculate all four values and then perform the appropriate comparisons to determine our bounds.

Okay, so let's work through each of the valid combinations of signs for the lower and upper bounds of x and y and determine how to calculate the lower and upper values for the result of multiplying x and y. I say "valid" as we need to remember that the lower bound is always less than or equal to the upper bound, so we'll never encounter an interval where the upper bound is negative, while the lower bound is non-negative. N.B. In the following table I'm using -ve to indicate a negative value and +ve to represent a non-negative value.

l_x	u_x	l_y	u_y	l_xl_y	l_xu_y	u_xl_y	u_xu_y	min	max
+ve	+ve	+ve	+ve	+ve	+ve	+ve	+ve	l_xl_y	u_xu_y
+ve	+ve	-ve	+ve	-ve	+ve	-ve	+ve	u_xl_y	u_xu_y
+ve	+ve	-ve	-ve	-ve	-ve	-ve	-ve	u_xl_y	l_xu_y
-ve	+ve	+ve	+ve	-ve	-ve	+ve	+ve	l_xu_y	u_xu_y
-ve	+ve	-ve	+ve	+ve	-ve	-ve	+ve	min(l_xu_y,u_xl_y)	max(l_xl_y,u_xu_y)
-ve	+ve	-ve	-ve	+ve	+ve	-ve	-ve	u_xl_y	l_xl_y
-ve	-ve	+ve	+ve	-ve	-ve	-ve	-ve	l_xu_y	u_xl_y
-ve	-ve	-ve	+ve	+ve	-ve	+ve	-ve	l_xu_y	l_xl_y
-ve	-ve	-ve	-ve	+ve	+ve	+ve	+ve	u_xu_y	l_xl_y

Now we now know how to calculate the lower and upper bounds, here's my procedure for doing so:

(define (mul-interval x y)
  (define not-negative? (lambda (x) (not (negative? x))))
  (let ((lx (lower-bound x))
        (ux (upper-bound x))
        (ly (lower-bound y))
        (uy (upper-bound y)))
    (cond ((and (not-negative? lx)
                (not-negative? ux)
                (not-negative? ly)
                (not-negative? uy))
             (make-interval (* lx ly) (* ux uy)))
          ((and (not-negative? lx)
                (not-negative? ux)
                (negative? ly)
                (not-negative? uy))
             (make-interval (* ux ly) (* ux uy)))
          ((and (not-negative? lx)
                (not-negative? ux)
                (negative? ly)
                (negative? uy))
             (make-interval (* ux ly) (* lx uy)))
          ((and (negative? lx)
                (not-negative? ux)
                (not-negative? ly)
                (not-negative? uy))
             (make-interval (* lx uy) (* ux uy)))
          ((and (negative? lx)
                (not-negative? ux)
                (negative? ly)
                (not-negative? uy))
             (make-interval (min (* lx uy) (* ux ly)) (max (* lx ly) (* ux uy))))
          ((and (negative? lx)
                (not-negative? ux)
                (negative? ly)
                (negative? uy))
             (make-interval (* ux ly) (* lx ly)))
          ((and (negative? lx)
                (negative? ux)
                (not-negative? ly)
                (not-negative? uy))
             (make-interval (* lx uy) (* ux ly)))
          ((and (negative? lx)
                (negative? ux)
                (negative? ly)
                (not-negative? uy))
             (make-interval (* lx uy) (* lx ly)))
          ((and (negative? lx)
                (negative? ux)
                (negative? ly)
                (negative? uy))
             (make-interval (* ux uy) (* lx ly)))
          (else (error "Invalid Combination!")))))

Finally, we can see this in action:

> (mul-interval (make-interval 1 2) (make-interval 2 3))
'(2 . 6)
> (mul-interval (make-interval 1 2) (make-interval -2 3))
'(-4 . 6)
> (mul-interval (make-interval 1 2) (make-interval -2 -1))
'(-4 . -1)
> (mul-interval (make-interval -1 2) (make-interval 2 3))
'(-3 . 6)
> (mul-interval (make-interval -1 2) (make-interval -2 3))
'(-4 . 6)
> (mul-interval (make-interval -1 2) (make-interval -2 -1))
'(-4 . 2)
> (mul-interval (make-interval -2 -1) (make-interval 2 3))
'(-6 . -2)
> (mul-interval (make-interval -2 -1) (make-interval -2 3))
'(-6 . 4)
> (mul-interval (make-interval -2 -1) (make-interval -2 -1))
'(1 . 4)

2011-09-24

SICP Exercise 2.10: Divide by Zero

Ben Bitdiddle, an expert systems programmer, looks over Alyssa's shoulder and comments that it is not clear what it means to divide by an interval that spans zero. Modify Alyssa's code to check for this condition and to signal an error if it occurs.

I must confess that, being the old AI graduate that I am, I read this and thought "expert systems? what on earth have they got to do with this exercise?" Nothing, of course - and this nicely illustrates why natural language understanding is difficult. Anyway...

How can we tell if an interval spans zero? Well, the lower bound of the number will be zero or less, while the upper bound of the number will be zero or more. Here's my modified procedure:

(define (div-interval x y)
  (if (and (<= (lower-bound y) 0) (>= (upper-bound y) 0))
      (error "Divisor spans zero")
      (mul-interval x
                    (make-interval (/ 1.0 (upper-bound y))
                                   (/ 1.0 (lower-bound y))))))

And here it is in action:

> (div-interval (make-interval 3 9) (make-interval 1 5))
'(0.6000000000000001 . 9.0)
> (div-interval (make-interval 3 9) (make-interval -4 -1))
'(-9.0 . -0.75)
> (div-interval (make-interval 3 9) (make-interval -4 0))
. . Divisor spans zero
> (div-interval (make-interval 3 9) (make-interval 0 5))
. . Divisor spans zero
> (div-interval (make-interval 3 9) (make-interval -4 5))
. . Divisor spans zero

2011-09-23

SICP Exercise 2.9: Interval Widths

The width of an interval is half of the difference between its upper and lower bounds. The width is a measure of the uncertainty of the number specified by the interval. For some arithmetic operations the width of the result of combining two intervals is a function only of the widths of the argument intervals, whereas for others the width of the combination is not a function of the widths of the argument intervals. Show that the width of the sum (or difference) of two intervals is a function only of the widths of the intervals being added (or subtracted). Give examples to show that this is not true for multiplication or division.

Let's begin by defining a procedure for calculating an interval's width:

(define (interval-width i)
  (/ (- (upper-bound i) (lower-bound i)) 2))

Given this definition we can show by substitution that the width of any interval produced by evaluating add-interval or sub-interval with any two argument intervals is a function of the widths of the argument intervals. To do this, let's first assume that we have four values, l1, l2, u1 and u2 such that l1<u1 and l2<u2. Then we define two intervals, i1 and i2, as follows:

(define i1 (make-interval l1 u1))
(define i2 (make-interval l2 u2))

We can first of all produce expressions for their widths by substitution as follows:

  (interval-width i1)
= (/ (- (upper-bound i1) (lower-bound i1)) 2)
= (/ (- (max (car i1) (cdr i1)) (min (car i1) (cdr i1))) 2)
= (/ (- (max l1 u1) (min l1 u1)) 2)
= (/ (- u1 l1) 2)

  (interval-width i2)
= (/ (- (upper-bound i2) (lower-bound i2)) 2)
= (/ (- (max (car i2) (cdr i2)) (min (car i2) (cdr i2))) 2)
= (/ (- (max l2 u2) (min l2 u2)) 2)
= (/ (- u2 l2) 2)

Now let's see what happens when we try to calculate the width of the interval produced by evaluating the expression (add-interval i1 i2):

  (interval-width (add-interval i1 i2))
= (interval-width (make-interval (+ (lower-bound i1) (lower-bound i2))
                                 (+ (upper-bound i1) (upper-bound i2))))
= (interval-width (make-interval (+ (min (car i1) (cdr i1)) (min (car i2) (cdr i2)))
                                 (+ (max (car i1) (cdr i1)) (max (car i2) (cdr i2)))))
= (interval-width (make-interval (+ (min l1 u1) (min l2 u2))
                                 (+ (max l1 u1) (max l2 u2))))
= (interval-width (make-interval (+ (min l1 u1) (min l2 u2))
                                 (+ (max l1 u1) (max l2 u2))))
= (interval-width (make-interval (+ l1 l2) (+ u1 u2)))
= (interval-width (cons (+ l1 l2) (+ u1 u2)))
= (/ (- (upper-bound (cons (+ l1 l2) (+ u1 u2)))
        (lower-bound (cons (+ l1 l2) (+ u1 u2)))) 2)
= (/ (- (+ u1 u2) (+ l1 l2)) 2)
≡ (/ (+ (- u1 l1) (- u2 l2)) 2)
≡ (+ (/ (- u1 l1) 2) (/ (- u2 l2) 2))
≡ (+ (interval-width i1) (interval-width i2))

So we can see that the width of the interval produced by evaluating the expression (add-interval i1 i2) is the sum of the widths of intervals i1 and i2.

We can try to calculate the width of the interval produced by evaluating the expression (sub-interval i1 i2) in the same way:

  (interval-width (sub-interval i1 i2))
= (interval-width (make-interval (- (lower-bound i1) (upper-bound i2))
                                 (- (upper-bound i1) (lower-bound i2))))
= (interval-width (make-interval (- (min (car i1) (cdr i1)) (max (car i2) (cdr i2)))
                                 (- (max (car i1) (cdr i1)) (min (car i2) (cdr i2)))))
= (interval-width (make-interval (- (min l1 u1) (max l2 u2))
                                 (- (max l1 u1) (min l2 u2))))
= (interval-width (make-interval (- l1 u2) (- u1 l2)))
= (interval-width (cons (- l1 u2) (- u1 l2)))
= (/ (- (upper-bound (cons (- l1 u2) (- u1 l2)))
        (lower-bound (cons (- l1 u2) (- u1 l2)))) 2)
= (/ (- (- u1 l2) (- l1 u2)) 2)
≡ (/ (+ (- u1 l1) (- u2 l2)) 2)
≡ (+ (/ (- u1 l1) 2) (/ (- u2 l2) 2))
≡ (+ (interval-width i1) (interval-width i2))

Similarly we can see that the width of the interval produced by evaluating the expression (sub-interval i1 i2) is also the sum of the widths of intervals i1 and i2.

Let's check these both seem to hold:

> (define i1 (make-interval 3 9))
> (define i2 (make-interval 4 14))
> (interval-width i1)
3
> (interval-width i2)
5
> (interval-width (add-interval i1 i2))
8
> (interval-width (sub-interval i1 i2))
8

For the last part of this exercise we're asked to show that this isn't true for multiplication or division by providing examples. Now we could just stick our intervals from above into the interpreter and claim that this shows that there's no sensible function here:

> (interval-width (mul-interval i1 i2))
57
> (interval-width (div-interval i1 i2))
1.0178571428571428

But this doesn't prove anything. For all we know it may be the case that:

  (interval-width (mul-interval i1 i2))
≡ (+ (* (interval-width i1) 2) (* (interval-width i2) 10) 1)

and:

  (interval-width (div-interval i1 i2))
≡ (/ (+ (* (interval-width i1) 2) (* (interval-width i2) 10) 1.0)
     (+ (* (interval-width i1) 2) (* (interval-width i2) 10)))

However, while these expressions give the same results for the intervals, i1 and i2, we defined above, they're not actually equivalent expressions, and wouldn't work for other intervals. If you're not convinced, try just swapping the intervals around in the expressions.

Let's instead prove it by substitution again. Note that the interval produced by mul-interval can be different combinations of the bounds of the two intervals passed to it depending upon which of the set of values {p1, p2, p3, p4} are the maximum and minimum. Let's assume that l1 and l2 are integer values >1 (which implies that u1 and u2 are too). This means that p1 (the value of the expression (* l1 l2)) will be the minimum and p4 (the value of the expression (* u1 u2)) will be the maximum. Given that assumption, here's what we get when we try to calculate the width of the interval produced by evaluating the expression (mul-interval i1 i2):

  (interval-width (mul-interval i1 i2))
= (interval-width
    (let ((p1 (* (lower-bound i1) (lower-bound i2)))
          (p2 (* (lower-bound i1) (upper-bound i2)))
          (p3 (* (upper-bound i1) (lower-bound i2)))
          (p4 (* (upper-bound i1) (upper-bound i2))))
      (make-interval (min p1 p2 p3 p4)
                     (max p1 p2 p3 p4)))
= (interval-width
    (let ((p1 (* (min (car i1) (cdr i1)) (min (car i2) (cdr i2))))
          (p2 (* (min (car i1) (cdr i1)) (max (car i2) (cdr i2))))
          (p3 (* (max (car i1) (cdr i1)) (min (car i2) (cdr i2))))
          (p4 (* (max (car i1) (cdr i1)) (max (car i2) (cdr i2)))))
      (make-interval (min p1 p2 p3 p4)
                     (max p1 p2 p3 p4)))
= (interval-width
    (let ((p1 (* (min l1 u1) (min l2 u2)))
          (p2 (* (min l1 u1) (max l2 u2)))
          (p3 (* (max l1 u1) (min l2 u2)))
          (p4 (* (max l1 u1) (max l2 u2))))
      (make-interval (min p1 p2 p3 p4)
                     (max p1 p2 p3 p4)))
= (interval-width
    (let ((p1 (* l1 l2))
          (p2 (* l1 u2))
          (p3 (* u1 l2))
          (p4 (* u1 u2)))
      (make-interval (min p1 p2 p3 p4)
                     (max p1 p2 p3 p4)))
= (interval-width
    (make-interval (min (* l1 l2) (* l1 u2) (* u1 l2) (* u1 u2))
                   (max (* l1 l2) (* l1 u2) (* u1 l2) (* u1 u2))))
= (interval-width (make-interval (* l1 l2) (* u1 u2)))
= (interval-width (cons (* l1 l2) (* u1 u2)))
= (/ (- (upper-bound (cons (* l1 l2) (* u1 u2)))
        (lower-bound (cons (* l1 l2) (* u1 u2))))
     2)
= (/ (- (* u1 u2) (* l1 l2)) 2)

Now it is not possible to convert this expression to an equivalent expression that consists of terms of (interval-width i1), (interval-width i2) and constants alone. As a result, it is not possible to express the width of multiplying two arbitrary intervals in terms of the widths of those intervals.

Let's do division now... Using the same assumption as above, here's what we get when we try to calculate the width of the interval produced by evaluating the expression (div-interval i1 i2):

  (interval-width (div-interval i1 i2))
= (interval-width
    (mul-interval i1
                  (make-interval (/ 1.0 (upper-bound i2))
                                 (/ 1.0 (lower-bound i2)))))
= (interval-width (mul-interval i1 (make-interval (/ 1.0 u2) (/ 1.0 l2))))
= (interval-width (mul-interval i1 (cons (/ 1.0 u2) (/ 1.0 l2))))
= (interval-width
    (let ((p1 (* (lower-bound i1) (lower-bound (cons (/ 1.0 u2) (/ 1.0 l2)))))
          (p2 (* (lower-bound i1) (upper-bound (cons (/ 1.0 u2) (/ 1.0 l2)))))
          (p3 (* (upper-bound i1) (lower-bound (cons (/ 1.0 u2) (/ 1.0 l2)))))
          (p4 (* (upper-bound i1) (upper-bound (cons (/ 1.0 u2) (/ 1.0 l2))))))
      (make-interval (min p1 p2 p3 p4)
                     (max p1 p2 p3 p4)))

At this point we should note that, as l2 is the lower bound of interval i2, and u2 the upper bound, constructing an interval with the reciprocals of these two bounds with mean that (/ 1.0 l2) is the upper bound and (/ 1.0 u2) is the lower bound. So:

  (interval-width
    (let ((p1 (* (lower-bound i1) (lower-bound (cons (/ 1.0 u2) (/ 1.0 l2)))))
          (p2 (* (lower-bound i1) (upper-bound (cons (/ 1.0 u2) (/ 1.0 l2)))))
          (p3 (* (upper-bound i1) (lower-bound (cons (/ 1.0 u2) (/ 1.0 l2)))))
          (p4 (* (upper-bound i1) (upper-bound (cons (/ 1.0 u2) (/ 1.0 l2))))))
      (make-interval (min p1 p2 p3 p4)
                     (max p1 p2 p3 p4)))
= (interval-width
    (let ((p1 (* l1 (/ 1.0 u2)))
          (p2 (* l1 (/ 1.0 l2)))
          (p3 (* u1 (/ 1.0 u2)))
          (p4 (* u1 (/ 1.0 l2))))
      (make-interval (min p1 p2 p3 p4)
                     (max p1 p2 p3 p4)))
≡ (interval-width
    (let ((p1 (/ l1 u2))
          (p2 (/ l1 l2))
          (p3 (/ u1 u2))
          (p4 (/ u1 l2)))
      (make-interval (min p1 p2 p3 p4)
                     (max p1 p2 p3 p4)))
= (interval-width
    (make-interval (min (/ l1 u2) (/ l1 l2) (/ u1 u2) (/ u1 l2))
                   (max (/ l1 u2) (/ l1 l2) (/ u1 u2) (/ u1 l2))))
= (interval-width (make-interval (/ l1 u2) (/ u1 l2)))
= (interval-width (cons (/ l1 u2) (/ u1 l2)))
= (/ (- (upper-bound  (cons (/ l1 u2) (/ u1 l2)))
        (lower-bound (cons (/ l1 u2) (/ u1 l2))))
     2)
= (/ (- (/ u1 l2) (/ l1 u2)) 2)

Again it is not possible to convert this expression to an equivalent expression that consists of terms of (interval-width i1), (interval-width i2) and constants alone, and so it is also not possible to express the width of dividing two arbitrary intervals in terms of the widths of those intervals.

2011-09-21

SICP Exercise 2.8: Interval Subtraction

Using reasoning analogous to Alyssa's, describe how the difference of two intervals may be computed. Define a corresponding subtraction procedure, called sub-interval.

Well, let's check Alyssa's reasoning for her procedure add-interval: "the minimum value the sum could be is the sum of the two lower bounds and the maximum value it could be is the sum of the two upper bounds". Let's see if we can provide a bit more in-depth analysis for subtracting intervals.

The result of subtracting one interval from another should be the interval consisting of the minimum and maximum values out of the four values produced by subtracting each of the bounds of the second interval from each of the bounds of the first interval. To produce the minimum value we want to start with the lowest value possible and subtract the largest value possible from it. I.e. subtract the upper bound of the second interval from the lower bound of the first interval. To produce the maximum value we do the converse of this - we want to start with the largest value possible and subtract the least from it. I.e. subtract the lower bound of the second interval from the upper bound of the first interval.

We can show this via example. Suppose we have an interval with the bounds [5,7] and we want to subtract the interval [2,4] from it. What happens when we subtract each of the bounds of the second interval from each of the bounds of the first one:

First Interval Bound	Second Interval Bound	Subtracted Value
5	2	3
5	4	1
7	2	5
7	4	3

As you can see the minimum value is produced by subtracting the upper bound of the second interval from the lower bound of the first interval and the maximum value is produced by subtracting the lower bound of the second interval from the upper bound of the first interval.

Here's an implementation that encapsulates this functionality:

(define (sub-interval x y)
  (make-interval (- (lower-bound x) (upper-bound y))
                 (- (upper-bound x) (lower-bound y))))

Let's try it out:

> (sub-interval (make-interval 5 7) (make-interval 2 4))
'(1 . 5)
>  (sub-interval (make-interval 4 6) (make-interval 3 10))
'(-6 . 3)

Pages