Know your implementation
How well do you know your language implementation? Below you'll see four different implementations of the same function.
Too easy? Well, then try not only to pick the fastest, but also to rank them according to speed.
You are welcome to leave a comment, if you try the snippets.
The task is simple: we want to append two linked lists. That is, we'll reimplement the builtin function append, although we'll restrict our-selves to two arguments. The four versions are of course not allowed to use the builtin append.
First version: Building the list in reverse
Second version: Naïve recursion
A naïve, non-tail-recursive solution.
Third version: Continuation passing style
Using side-effects
Let's avoid any unnecessary consing.
Benchmarking
And now for the bencmarking. Before each timing, I invoke the garbage collector. In this way garbage produced by test one, doesn't affect the timings of test two.
Can you predict which version is fastest?
Too easy? Well, then try not only to pick the fastest, but also to rank them according to speed.
You are welcome to leave a comment, if you try the snippets.
The task is simple: we want to append two linked lists. That is, we'll reimplement the builtin function append, although we'll restrict our-selves to two arguments. The four versions are of course not allowed to use the builtin append.
First version: Building the list in reverse
(define (append1 xs ys)
(append-rev (reverse xs) ys))
(define (append-rev rev-xs ys)
(cond
[(null? rev-xs) ys]
[else (append-rev (cdr rev-xs)
(cons (car rev-xs) ys))]))
Second version: Naïve recursion
A naïve, non-tail-recursive solution.
(define (append2 xs ys)
(cond
[(null? xs) ys]
[else (let ([result (append2 (cdr xs) ys)])
(cons (car xs) result))]))
Third version: Continuation passing style
(define (append3 xs ys)
(define (append3-k xs ys k)
(cond
[(null? xs) (k ys)]
[else (append3-k (cdr xs) ys
(lambda (tail) (k (cons (car xs) tail))))]))
(append3-k xs ys (lambda (x) x)))
Using side-effects
Let's avoid any unnecessary consing.
(define (for-each-reverse! f xs)
(cond
[(null? xs) 'done]
[else (for-each-reverse! f (cdr xs))
(f (car xs))]))
(define (append4 xs ys)
(let ([result ys])
(for-each-reverse!
(lambda (x)
(set! result (cons x result)))
xs)
result))
Benchmarking
And now for the bencmarking. Before each timing, I invoke the garbage collector. In this way garbage produced by test one, doesn't affect the timings of test two.
(define (test n)
(let ([xs (vector->list (make-vector n))])
(collect-garbage)
(time (begin (append1 xs xs) 'done))
(collect-garbage)
(time (begin (append2 xs xs) 'done))
(collect-garbage)
(time (begin (append3 xs xs) 'done))
(collect-garbage)
(time (begin (append4 xs xs) 'done))))
(test 100)
(test 1000)
(test 10000)
(test 100000)
(test 1000000)
(test 2000000)
7 Comments:
(define (reverse-append! xs ys)
(if (null? xs)
ys
(begin
(let ((xs-tail (cdr xs)))
(set-cdr! xs ys)
(reverse-append! xs-tail xs)))))
(define (append5 xs ys)
(reverse-append! (reverse xs) ys))
Nice. Your version is the fastest on my implementation of choice.
This comment has been removed by the author.
Here's my version. Be gentle, this is my first scheme program. I'm a C++/Haskell/Prolog programmer.
The trick is updating the cons cell destructively to regain tail recursiveness. In prolog this is free and pure.
I had failed miserably in predicting the speed of the posted versions, but was confident that I could beat Lucier's version, because mine has better cache coherence. However, Lucier's version was initially faster. Once I changed his version to use a homebrewed reverse, I took the lead.
I don't like the way most code is repeated in my version. How can it be cleaned up?
Here's the code:
(define (append6-impl node xs ys)
(if (null? xs)
'done
(begin
(let ((new-node (cons (car xs) ys)))
(set-cdr! node new-node)
(append6-impl
new-node
(cdr xs)
ys)))))
(define (append6 xs ys)
(if (null? xs)
ys
(begin
(let ((new-node (cons (car xs) ys)))
(append6-impl
new-node
(cdr xs)
ys)
new-node))))
Hi Jens,
As I just completed my flatten-benchmark results, I thought I would see what happens to the ranking of the four versions you show in your article when exposed to another interpreter. The first results are for the final run of your benchmark on mzscheme, followed by the exact same coded (except you have to call (collect 4) in place of (collect-garbage). Notice algorithms 3 and 4 have swapped rank. Also note worthy is that MzScheme has the better run times overall.
It didn't happen within your benchmark, but the rankings of functions when run in DrScheme v.s. MzScheme changed in my own benchmark of list flattening functions. I couldn't agree with you more, as regards the title of your article.
Cheers,
--kyle
Welcome to MzScheme v370.3 [3m], Copyright (c) 2004-2007 PLT Scheme Inc.
;final run
cpu time: 296 real time: 296 gc time: 226
cpu time: 797 real time: 797 gc time: 350
cpu time: 313 real time: 313 gc time: 258
cpu time: 881 real time: 879 gc time: 354
Petite Chez Scheme Version 7.3
Copyright (c) 1985-2006 Cadence Research Systems
;final run
(time (begin (append1 xs ...) ...))
29 collections
315 ms elapsed cpu time, including 195 ms collecting
315 ms elapsed real time, including 196 ms collecting
32006728 bytes allocated, including 1144408 bytes reclaimed
(time (begin (append2 xs ...) ...))
97 collections
961 ms elapsed cpu time, including 805 ms collecting
960 ms elapsed real time, including 802 ms collecting
105047792 bytes allocated, including 3044880 bytes reclaimed
(time (begin (append3 xs ...) ...))
75 collections
1047 ms elapsed cpu time, including 875 ms collecting
1047 ms elapsed real time, including 871 ms collecting
80017480 bytes allocated, including 17072592 bytes reclaimed
(time (begin (append4 xs ...) ...))
60 collections
474 ms elapsed cpu time, including 309 ms collecting
473 ms elapsed real time, including 311 ms collecting
64874040 bytes allocated, including 9260560 bytes reclaimed
Running from inside DrScheme (PLT v370):
> (test 1000000)
cpu time: 547 real time: 547 gc time: 0
cpu time: 1625 real time: 1766 gc time: 516
cpu time: 875 real time: 1062 gc time: 328
cpu time: 1844 real time: 3063 gc time: 563
----------
Curious that the consing version is the fastest.
I visited this site and found it informative u also visit this site and surely u will find this visit productive it certification !
Post a Comment
<< Home