2011-09-07

SICP Exercise 1.24: Faster Prime Time

Modify the timed-prime-test procedure of exercise 1.22 to use fast-prime? (the Fermat method), and test each of the 12 primes you found in that exercise. Since the Fermat test has Θ(log n) growth, how would you expect the time to test primes near 1,000,000 to compare with the time needed to test primes near 1000? Do your data bear this out? Can you explain any discrepancy you find?

Okay, for this one I've gone back to the original procedure definitions of exercise 1.22. I'll run timed-prime-test directly instead of searching for primes.

Now the implementation of fast-prime? executes the fermat-test the number of times indicated by the second operand. This is useful for us as we've already seen that we need to test the numbers a large number of times in order for us to be able to get useful timing data. After a wee bit of experimentation I opted to use 250,000 times. Similar to exercise 1.23, the exercise here says to "Modify the timed-prime-test procedure of exercise 1.22 to use fast-prime?", but it's actually start-prime-test that was making the call to prime? in the original implementation, and so is the procedure we actually modify. Here's my updated start-prime-test procedure:
(define (start-prime-test n start-time)
  (if (fast-prime? n 250000)
      (report-prime (- (runtime) start-time))
      (display "")))
And here's what happens when I run the tests:
> (timed-prime-test 1009)
1009 *** 1588.0
> (timed-prime-test 1013)
1013 *** 1663.0
> (timed-prime-test 1019)
1019 *** 1689.0
> (timed-prime-test 10007)
10007 *** 2064.0
> (timed-prime-test 10009)
10009 *** 1974.0
> (timed-prime-test 10037)
10037 *** 2048.0
> (timed-prime-test 100003)
100003 *** 6108.0
> (timed-prime-test 100019)
100019 *** 5984.0
> (timed-prime-test 100043)
100043 *** 5986.0
> (timed-prime-test 1000003)
1000003 *** 9727.0
> (timed-prime-test 1000033)
1000033 *** 8330.0
> (timed-prime-test 1000037)
1000037 *** 8772.0
Now the Fermat test has Θ(log n) growth so we can work out the expected ratios between the times taken for the first three primes larger than 1,000, and for the subsequent orders of magnitude. Also, we can take the average times for each of these tuples of primes and work out the actual ratios:

Start1st Prime Time (ms)2nd Prime Time (ms)3rd Prime Time (ms)Average Time (ms)Expected RatioActual Ratio
1,0001588.01663.01689.01646.7--
10,0002064.01974.02048.02028.71.3331.232
100,0006108.05984.05986.06026.01.6673.659
1,000,0009727.08330.08772.08943.02.05.431

For the first three primes above 10,000 the ratio seems to be about right. But for 100,000 and 1,000,000 the ratios are much higher than expected. Why would this be?

A hint at an answer is perhaps given in the footnote to section 1.2.3:

"...if we count process steps as "machine operations" we are making the assumption that the number of machine operations needed to perform, say, a multiplication is independent of the size of the numbers to be multiplied, which is false if the numbers are sufficiently large."

I've noted before that I'm using DrRacket for most of my SICP work. It so happens that I'm using a 32-bit version of DrRacket. The user manual indicates that it uses 30-bits plus one sign bit to represent fixnums, and if a value is too large to represent as a fixnum then it will be represented as a scheme_bignum_type - an arbitrarily large integer value. Now 230 = 1073741824, so provided the calculation of expmod remains within this range DrRacket will be able to do all the calculations using fixnums and so should be nice and efficient. However, if we look at the procedure expmod we see that it contains a line where it squares the results of a recursive call to expmod. If the result returned by this recursive call to expmod exceeds √230 = 215 = 32,768 then DrRacket will have to represent its square as a scheme_bignum_type, which is likely to have inferior performance to fixnums for calculations. So when can this happen?

Well, the result of this recursive call will be in the range 0..n, where n is the number we are testing the primality of. So it can't happen for the first three primes larger than either 1,000 or 10,000, but it can happen for the first three primes larger than 100,000 and 1,000,000. Whether it does or not will depend upon the random number chosen for the test. However, as we are running the fermat test 250,000 times with random numbers in the range 0..n, chances are we're going to hit on quite a few that result in a recursive call to expmod returning such a value. As soon as it does that, the interpreter has to use the less efficient representation and so the performance degrades.

2011-09-04

SICP Exercise 1.23: Halving the Search Space

The smallest-divisor procedure shown at the start of this section does lots of needless testing: After it checks to see if the number is divisible by 2 there is no point in checking to see if it is divisible by any larger even numbers. This suggests that the values used for test-divisor should not be 2, 3, 4, 5, 6, ..., but rather 2, 3, 5, 7, 9, .... To implement this change, define a procedure next that returns 3 if its input is equal to 2 and otherwise returns its input plus 2. Modify the smallest-divisor procedure to use (next test-divisor) instead of (+ test-divisor 1). With timed-prime-test incorporating this modified version of smallest-divisor, run the test for each of the 12 primes found in exercise 1.22. Since this modification halves the number of test steps, you should expect it to run about twice as fast. Is this expectation confirmed? If not, what is the observed ratio of the speeds of the two algorithms, and how do you explain the fact that it is different from 2?

The definition of next is nice and straightforward:
(define (next n)
  (if (= n 2)
      3
      (+ n 2)))
The exercise then says to "Modify the smallest-divisor procedure to use (next test-divisor) instead of (+ test-divisor 1)." The smallest-divisor procedure doesn't actually call (+ test-divisor 1); it simply calls through to find-divisor which does the call. So we modify the find-divisor procedure as follows:
(define (find-divisor n test-divisor)
  (cond ((> (square test-divisor) n) n)
        ((divides? test-divisor n) test-divisor)
        (else (find-divisor n (next test-divisor)))))
I reused my implementation of search-for-primes from exercise 1.22 to test the 12 primes we found there:
> (search-for-primes 1000 3)
1001
1003
1005
1007
1009 *** 561.0
1011
1013 *** 562.0
1015
1017
1019 *** 546.0...done
> (search-for-primes 10000 3)
10001
10003
10005
10007 *** 1669.0
10009 *** 1654.0
10011
10013
10015
10017
10019
10021
10023
10025
10027
10029
10031
10033
10035
10037 *** 1669.0...done
> (search-for-primes 100000 3)
100001
100003 *** 5148.0
100005
100007
100009
100011
100013
100015
100017
100019 *** 5242.0
100021
100023
100025
100027
100029
100031
100033
100035
100037
100039
100041
100043 *** 5226.0...done
> (search-for-primes 1000000 3)
1000001
1000003 *** 16474.0
1000005
1000007
1000009
1000011
1000013
1000015
1000017
1000019
1000021
1000023
1000025
1000027
1000029
1000031
1000033 *** 16458.0
1000035
1000037 *** 16536.0...done
Right, let's calculate the averages, compare them with the results from exercise 1.22 and find the ratios:
Start1st Prime Time (ms)2nd Prime Time (ms)3rd Prime Time (ms)Average Time (ms)Ex.1.22 Avg Time (ms)Ratio
1,000561.0562.0546.0556.3780.01.40
10,0001669.01654.01669.01664.02532.31.52
100,0005148.05242.05226.05205.37987.71.53
1,000,00016474.016458.016536.016489.325558.01.55

Well, it's not quite the expected two-times improvement in speed... But it does appear to be fairly a constant improvements at around 1.5-times faster. So why's it not the two-times improvement? Let's have a closer look at the change we made: we replaced (+ test-divisor 1) with (next test-divisor). There are a couple of things of note here that could account for the difference:
  1. We've replaced a built-in primitive operator (+) with a user-defined one (next). Built-in operators, particularly primitive ones such as basic numerical operations, can have various optimizations applied to them by the interpreter's authors and so may run faster than user-defined procedures. For example, the interpreter may be able to avoid the overhead of storing another stack frame on the call stack.
  2. To evaluate the procedure next, the interpreter must evaluate the predicate of the if operator in order to determine whether to evaluate the consequent or the alternative as the result of the procedure. This adds additional overhead to determining the value of test-divisor to pass in the recursive call.

2011-09-03

SICP Exercise 1.22: Prime Time

Most Lisp implementations include a primitive called runtime that returns an integer that specifies the amount of time the system has been running (measured, for example, in microseconds). The following timed-prime-test procedure, when called with an integer n, prints n and checks to see if n is prime. If n is prime, the procedure prints three asterisks followed by the amount of time used in performing the test.
(define (timed-prime-test n)
  (newline)
  (display n)
  (start-prime-test n (runtime)))
(define (start-prime-test n start-time)
  (if (prime? n)
      (report-prime (- (runtime) start-time))))
(define (report-prime elapsed-time)
  (display " *** ")
  (display elapsed-time))
Using this procedure, write a procedure search-for-primes that checks the primality of consecutive odd integers in a specified range. Use your procedure to find the three smallest primes larger than 1000; larger than 10,000; larger than 100,000; larger than 1,000,000. Note the time needed to test each prime. Since the testing algorithm has order of growth of Θ(√n), you should expect that testing for primes around 10,000 should take about √10 times as long as testing for primes around 1000. Do your timing data bear this out? How well do the data for 100,000 and 1,000,000 support the √n prediction? Is your result compatible with the notion that programs on your machine run in time proportional to the number of steps required for the computation?

Like I mentioned way back at the start, I'm mostly using Racket as my interpreter. Racket's environment doesn't include a runtime primitive. However, it does include current-inexact-milliseconds which returns the number of milliseconds elapsed since midnight, January 1st 1970 UTC (a.k.a. Unix time), and can also include a fractional part (i.e. it can include microseconds as well). Using this we can provide our own implementation of runtime:
(define runtime current-inexact-milliseconds)
I also had a slight problem with the definition of start-prime-test. The interpreter didn't like the fact that the if operator was missing an alternative, so I changed it to return 0 if the number wasn't prime. Why not #f? All will be revealed shortly! Anyway, here's the updated start-prime-test
(define (start-prime-test n start-time)
  (if (prime? n)
      (report-prime (- (runtime) start-time))
      0))
Now the definition of timed-prime-test we are provided with does not indicate whether or not it has found a prime, but we're asked to produce search-for-primes and to use it to find the next three primes larger than various values. I chose to produce an implementation of search-for-primes that searched a range of numbers between n and m:
(define (search-for-primes n m)
  (define (iter i)
    (timed-prime-test i)
    (if (> (+ i 2) m) (display "") (iter (+ i 2))))
  (iter (if (odd? n) n (+ n 1))))
We can use this to search for the primes but, if your machine is anything like mine you'll encounter a slight problem with the exercise:
> (search-for-primes 1000 1020)
1001
1003
1005
1007
1009 *** 0.0
1011
1013 *** 0.0
1015
1017
1019 *** 0.0
> (search-for-primes 10000 10040)
10001
10003
10005
10007 *** 0.0
10009 *** 0.0
10011
10013
10015
10017
10019
10021
10023
10025
10027
10029
10031
10033
10035
10037 *** 0.0
10039 *** 0.0
> (search-for-primes 100000 100050)
100001
100003 *** 0.0
100005
100007
100009
100011
100013
100015
100017
100019 *** 0.0
100021
100023
100025
100027
100029
100031
100033
100035
100037
100039
100041
100043 *** 0.0
100045
100047
100049 *** 0.0
> (search-for-primes 1000000 1000040)
1000001
1000003 *** 0.0
1000005
1000007
1000009
1000011
1000013
1000015
1000017
1000019
1000021
1000023
1000025
1000027
1000029
1000031
1000033 *** 0.0
1000035
1000037 *** 0.0
1000039 *** 0.0
Bear in mind that SICP was originally published in 1984, when the fastest PC CPUs were tearing along at 16MHz. The average PC CPU nowadays is running coming on 200 times faster than this and has a whole raft of architectural improvements, refinements and extensions, not to mention the many semiconductor manufacturing improvements. It should come as no surprise therefore that our CPUs can find these primes quicker than the accuracy of the clock can measure.

What can we do to work around this? Well, we can modify start-prime-test so that it executes prime? multiple times before finally deciding whether or not to report it as a prime. While we're at it, if we modify the procedures so that start-prime-test returns 1 if the number's prime (and it's already returning 0 if it's not) then we can additionally modify search-for-primes so that it can keep a running total of the sum of the results of calling timed-prime-test and then stop when it's found the number of primes we're looking for. Here's my updated version:
(define (report-prime elapsed-time)
  (display " *** ")
  (display elapsed-time)
  1)

(define (start-prime-test n start-time)
  (define (iter i)
    (if (prime? n)
        (if (= i 100000)
            (report-prime (- (runtime) start-time))
            (iter (+ i 1)))
        0))
  (iter 0))

(define (timed-prime-test n)
  (newline)
  (display n)
  (start-prime-test n (runtime)))

(define (search-for-primes start n)
  (define (iter found next)
    (if (= found n)
        (display "...done")
        (iter (+ found (timed-prime-test next))
              (+ next 2))))
  (iter 0
        (if (odd? start)
            start
            (+ start 1))))
Here's what happens when we run it:
> (search-for-primes 1000 3)
1001
1003
1005
1007
1009 *** 780.0
1011
1013 *** 780.0
1015
1017
1019 *** 780.0...done
> (search-for-primes 10000 3)
10001
10003
10005
10007 *** 2527.0
10009 *** 2527.0
10011
10013
10015
10017
10019
10021
10023
10025
10027
10029
10031
10033
10035
10037 *** 2543.0...done
> (search-for-primes 100000 3)
100001
100003 *** 7988.0
100005
100007
100009
100011
100013
100015
100017
100019 *** 7987.0
100021
100023
100025
100027
100029
100031
100033
100035
100037
100039
100041
100043 *** 7988.0...done
> (search-for-primes 1000000 3)
1000001
1000003 *** 25303.0
1000005
1000007
1000009
1000011
1000013
1000015
1000017
1000019
1000021
1000023
1000025
1000027
1000029
1000031
1000033 *** 25600.0
1000035
1000037 *** 25771.0...done
We now need to assess the timing data and see whether it supports the Θ(√n) order of growth in the number of steps. I.e. do "programs on [my] machine run in time proportional to the number of steps required for the computation?" Here are the times from above:
Start1st Prime Time (ms)2nd Prime Time (ms)3rd Prime Time (ms)Average Time (ms)
1,000780.0780.0780.0780.0
10,0002527.02527.02543.02532.3
100,0007988.07987.07988.07987.7
1,000,00025303.025600.025771.025558.0
A cursory glance seems to suggest that the timing data does indeed support the Θ(√n) order of growth in the number of steps. The average time when starting at 100,000 is approximately 10-times the average time when starting at 1,000. We also have an approximately 10-times increase when moving from starting at 10,000 to starting at 1,000,000. In both these cases there is a 100-times increase in the number of steps and so we should expect a √100 = 10 times increase.

Let's actually work out the sums...
  • Finding the first three primes larger than 1,000 takes an average of 780.0ms, so we should expect finding the first three primes larger than 10,000 to take 780×√10 = 2466.6ms. We get the average of 2532.3ms, 2.7% longer than expected.
  • Finding the first three primes larger than 10,000 takes an average of 2532.3ms, so we should expect finding the first three primes larger than 100,000 to take 2532.3×√10 = 8007.8ms. We get the average of 7987.7ms, 2.5% less than expected.
  • Finding the first three primes larger than 100,000 takes an average of 7987.7ms, so we should expect finding the first three primes larger than 1,000,000 to take 7987.7×√10 = 25259.3ms. We get the average of 25558.0ms, 1.2% less than expected.
So, yes, the timing data does seem to support the Θ(√n) order of growth in the number of steps and the result is compatible with the notion that programs on my machine run in time proportional to the number of steps required for the computation.