2011-09-03

SICP Exercise 1.22: Prime Time

Most Lisp implementations include a primitive called runtime that returns an integer that specifies the amount of time the system has been running (measured, for example, in microseconds). The following timed-prime-test procedure, when called with an integer n, prints n and checks to see if n is prime. If n is prime, the procedure prints three asterisks followed by the amount of time used in performing the test.
(define (timed-prime-test n)
  (newline)
  (display n)
  (start-prime-test n (runtime)))
(define (start-prime-test n start-time)
  (if (prime? n)
      (report-prime (- (runtime) start-time))))
(define (report-prime elapsed-time)
  (display " *** ")
  (display elapsed-time))
Using this procedure, write a procedure search-for-primes that checks the primality of consecutive odd integers in a specified range. Use your procedure to find the three smallest primes larger than 1000; larger than 10,000; larger than 100,000; larger than 1,000,000. Note the time needed to test each prime. Since the testing algorithm has order of growth of Θ(√n), you should expect that testing for primes around 10,000 should take about √10 times as long as testing for primes around 1000. Do your timing data bear this out? How well do the data for 100,000 and 1,000,000 support the √n prediction? Is your result compatible with the notion that programs on your machine run in time proportional to the number of steps required for the computation?

Like I mentioned way back at the start, I'm mostly using Racket as my interpreter. Racket's environment doesn't include a runtime primitive. However, it does include current-inexact-milliseconds which returns the number of milliseconds elapsed since midnight, January 1st 1970 UTC (a.k.a. Unix time), and can also include a fractional part (i.e. it can include microseconds as well). Using this we can provide our own implementation of runtime:
(define runtime current-inexact-milliseconds)
I also had a slight problem with the definition of start-prime-test. The interpreter didn't like the fact that the if operator was missing an alternative, so I changed it to return 0 if the number wasn't prime. Why not #f? All will be revealed shortly! Anyway, here's the updated start-prime-test
(define (start-prime-test n start-time)
  (if (prime? n)
      (report-prime (- (runtime) start-time))
      0))
Now the definition of timed-prime-test we are provided with does not indicate whether or not it has found a prime, but we're asked to produce search-for-primes and to use it to find the next three primes larger than various values. I chose to produce an implementation of search-for-primes that searched a range of numbers between n and m:
(define (search-for-primes n m)
  (define (iter i)
    (timed-prime-test i)
    (if (> (+ i 2) m) (display "") (iter (+ i 2))))
  (iter (if (odd? n) n (+ n 1))))
We can use this to search for the primes but, if your machine is anything like mine you'll encounter a slight problem with the exercise:
> (search-for-primes 1000 1020)
1001
1003
1005
1007
1009 *** 0.0
1011
1013 *** 0.0
1015
1017
1019 *** 0.0
> (search-for-primes 10000 10040)
10001
10003
10005
10007 *** 0.0
10009 *** 0.0
10011
10013
10015
10017
10019
10021
10023
10025
10027
10029
10031
10033
10035
10037 *** 0.0
10039 *** 0.0
> (search-for-primes 100000 100050)
100001
100003 *** 0.0
100005
100007
100009
100011
100013
100015
100017
100019 *** 0.0
100021
100023
100025
100027
100029
100031
100033
100035
100037
100039
100041
100043 *** 0.0
100045
100047
100049 *** 0.0
> (search-for-primes 1000000 1000040)
1000001
1000003 *** 0.0
1000005
1000007
1000009
1000011
1000013
1000015
1000017
1000019
1000021
1000023
1000025
1000027
1000029
1000031
1000033 *** 0.0
1000035
1000037 *** 0.0
1000039 *** 0.0
Bear in mind that SICP was originally published in 1984, when the fastest PC CPUs were tearing along at 16MHz. The average PC CPU nowadays is running coming on 200 times faster than this and has a whole raft of architectural improvements, refinements and extensions, not to mention the many semiconductor manufacturing improvements. It should come as no surprise therefore that our CPUs can find these primes quicker than the accuracy of the clock can measure.

What can we do to work around this? Well, we can modify start-prime-test so that it executes prime? multiple times before finally deciding whether or not to report it as a prime. While we're at it, if we modify the procedures so that start-prime-test returns 1 if the number's prime (and it's already returning 0 if it's not) then we can additionally modify search-for-primes so that it can keep a running total of the sum of the results of calling timed-prime-test and then stop when it's found the number of primes we're looking for. Here's my updated version:
(define (report-prime elapsed-time)
  (display " *** ")
  (display elapsed-time)
  1)

(define (start-prime-test n start-time)
  (define (iter i)
    (if (prime? n)
        (if (= i 100000)
            (report-prime (- (runtime) start-time))
            (iter (+ i 1)))
        0))
  (iter 0))

(define (timed-prime-test n)
  (newline)
  (display n)
  (start-prime-test n (runtime)))

(define (search-for-primes start n)
  (define (iter found next)
    (if (= found n)
        (display "...done")
        (iter (+ found (timed-prime-test next))
              (+ next 2))))
  (iter 0
        (if (odd? start)
            start
            (+ start 1))))
Here's what happens when we run it:
> (search-for-primes 1000 3)
1001
1003
1005
1007
1009 *** 780.0
1011
1013 *** 780.0
1015
1017
1019 *** 780.0...done
> (search-for-primes 10000 3)
10001
10003
10005
10007 *** 2527.0
10009 *** 2527.0
10011
10013
10015
10017
10019
10021
10023
10025
10027
10029
10031
10033
10035
10037 *** 2543.0...done
> (search-for-primes 100000 3)
100001
100003 *** 7988.0
100005
100007
100009
100011
100013
100015
100017
100019 *** 7987.0
100021
100023
100025
100027
100029
100031
100033
100035
100037
100039
100041
100043 *** 7988.0...done
> (search-for-primes 1000000 3)
1000001
1000003 *** 25303.0
1000005
1000007
1000009
1000011
1000013
1000015
1000017
1000019
1000021
1000023
1000025
1000027
1000029
1000031
1000033 *** 25600.0
1000035
1000037 *** 25771.0...done
We now need to assess the timing data and see whether it supports the Θ(√n) order of growth in the number of steps. I.e. do "programs on [my] machine run in time proportional to the number of steps required for the computation?" Here are the times from above:
Start1st Prime Time (ms)2nd Prime Time (ms)3rd Prime Time (ms)Average Time (ms)
1,000780.0780.0780.0780.0
10,0002527.02527.02543.02532.3
100,0007988.07987.07988.07987.7
1,000,00025303.025600.025771.025558.0
A cursory glance seems to suggest that the timing data does indeed support the Θ(√n) order of growth in the number of steps. The average time when starting at 100,000 is approximately 10-times the average time when starting at 1,000. We also have an approximately 10-times increase when moving from starting at 10,000 to starting at 1,000,000. In both these cases there is a 100-times increase in the number of steps and so we should expect a √100 = 10 times increase.

Let's actually work out the sums...
  • Finding the first three primes larger than 1,000 takes an average of 780.0ms, so we should expect finding the first three primes larger than 10,000 to take 780×√10 = 2466.6ms. We get the average of 2532.3ms, 2.7% longer than expected.
  • Finding the first three primes larger than 10,000 takes an average of 2532.3ms, so we should expect finding the first three primes larger than 100,000 to take 2532.3×√10 = 8007.8ms. We get the average of 7987.7ms, 2.5% less than expected.
  • Finding the first three primes larger than 100,000 takes an average of 7987.7ms, so we should expect finding the first three primes larger than 1,000,000 to take 7987.7×√10 = 25259.3ms. We get the average of 25558.0ms, 1.2% less than expected.
So, yes, the timing data does seem to support the Θ(√n) order of growth in the number of steps and the result is compatible with the notion that programs on my machine run in time proportional to the number of steps required for the computation.

No comments:

Post a Comment