Description of Benchmarks
Note: Our educated guesses concerning the reasons for
a particular system's performance appear within [square brackets].
Microbenchmarks: floating point arithmetic
Sums the integers from 0 to 10000, 10000 iterations.
A test of floating point arithmetic.
Uses essentially the same code as the sum
benchmark.
[In Larceny, all floating point calculations are done using generic
arithmetic.]
Calculation of the 25th Fibonacci number using inexact numbers,
50 iterations.
A test of floating point arithmetic.
Uses essentially the same code as the fib
benchmark.
Microbenchmarks: calls, closures, continuations
Sums the integers from 0 to 10000, 10000 iterations.
Doubly recursive computation of the 25th fibonacci number
(75025), using (< n 2)
to terminate the
recursion.
[Register windows should perform well on this
benchmark, but gcc may squander part of this advantage by allocating
a register window even for the base case.]
A triply recursive integer function related to the Takeuchi function,
a Gabriel benchmark, 1000 iterations.
A test of non-tail calls and arithmetic.
Historical note:
The Symbolics 3600 performed 1 iteration of this benchmark in 0.43 seconds
using generic arithmetic, which is about 65 times as slow as a Sun Ultra 1.
The tak
benchmark in continuation-passing
style, 300 iterations.
A test of closure creation.
[Improperly tail recursive implementations convert this benchmark's
tail recursion into deep recursion, which may exceed limits on the
size of a control stack.]
The tak
benchmark in continuation-capturing
style, 30 iterations.
A test of call-with-current-continuation
.
[Larceny's code for call-with-current-continuation
is now
written in C, and most of its time on this benchmark is spent crossing
the Scheme/C barrier.]
Microbenchmarks: lists
The tak
benchmark using lists to represent
integers, a Gabriel benchmark, 200 iterations.
This benchmark contains a peculiar boolean expression for which some
compilers generate very poor code:
(define (shorterp x y)
(and (not (null? y))
(or (null? x)
(shorterp (cdr x)
(cdr y)))))
Compare with the ntakl
benchmark.
A version of the takl
benchmark
in which the peculiar boolean expression has been rewritten to make it
less confusing for most people and for some compilers:
(define (shorterp x y)
(cond ((null? y) #f)
((null? x) #t)
(else
(shorterp (cdr x) (cdr y)))))
The rewritten expression is logically equivalent to the original.
[The C versions of the
takl
and
ntakl
benchmarks make no attempt to reclaim heap storage,
which gives C an advantage over garbage-collected languages.
This advantage disappears in benchmarks that allocate large amounts
of heap storage, such as
diviter
,
divrec
,
gcbench
,
and
perm9
.]
Table-driven symbolic differentiation, a Gabriel benchmark, 800000 iterations.
Uses association lists instead of the original benchmark's property lists.
Symbolic differentiation, a Gabriel benchmark, 800000 iterations.
Destructive list operations, a Gabriel benchmark, 300 iterations.
Divides 200 by 2 using lists as a unary notation for integers,
a Gabriel benchmark, 400000 iterations.
This benchmark tests
null?
, cons
, car
,
cdr
, and little else.
[This benchmark allocates 320 megabytes of heap storage, so
the C version can't get by without deallocating storage.
The C code for this benchmark performs node-at-a-time deallocation
using either free
or a custom deallocator.
On our test machine, the standard library version of free
is very slow for this benchmark, and even the custom deallocator can't
reclaim storage quite as fast as a generational garbage collector.
Compare this benchmark with
perm9
,
for which a custom node-at-a-time deallocator outperforms generational
garbage collection.]
This benchmark is the same as idiv2 except it uses deep recursion
instead of iteration.
[Implementations that use the SPARC's register windows
to implement recursion will perform poorly on this benchmark.
That is why gcc
performs so poorly even with the
custom storage allocator.]
Browsing a data base, a Gabriel benchmark, 600 iterations.
[May be a test of string->symbol
and/or
symbol->string
.]
Larger benchmarks: floating point
Fast Fourier Transform, a Gabriel benchmark, 2000 iterations.
A test of floating point arithmetic.
(I haven't yet checked to see whether it has the same bugs as the
original Gabriel benchmark.)
Generation of a Mandelbrot set, 30 iterations.
A test of floating point arithmetic.
Determination of a nucleic acid's spatial structure, 10 iterations.
A test of floating point arithmetic, and a real program.
Testing to see whether a point is contained within a 2-dimensional
polygon, 10000 iterations.
A test of floating point arithmetic.
Ray tracing a simple scene, 10 iterations.
A test of floating point arithmetic.
This program is translated from the Common Lisp code in
Example 9.8 of Paul Graham's book on ANSI Common Lisp.
Simplex algorithm, 60000 iterations.
A test of floating point arithmetic, and a real program.
Larger benchmarks: puzzles
Combinatorial search of a state space, a Gabriel benchmark, 100 iterations.
A test of arrays and classical compiler optimizations.
This benchmark was originally written in Pascal by Forrest Baskett.
Another combinatorial search similar to
puzzle
,
a Gabriel benchmark,
10 iterations.
Larger benchmarks: miscellaneous
A type checker written by Jim Miller, 20 iterations.
Dynamic type inference, self-applied, 10 iterations.
Written by Fritz Henglein.
A real program.
Earley's parsing algorithm, parsing a 9-symbol input according to one
of the simplest ambiguous grammars, 150 iterations.
A real program, applied to toy data.
This program was provided by Andrew Wright, but we don't know much
about it, and would appreciate more information.
This higher order program creates closures almost as often as it
performs non-tail procedure calls.
Another program that was provided by Andrew Wright.
This program may have been written by Jim Miller.
It enumerates the order-preserving maps between finite lattices.
Constructs a maze on a hexagonal grid, 2500 iterations.
Written by Olin Shivers.
Constructs a maze on a rectangular grid using purely functional style,
100 iterations.
Written by Olin Shivers.
Partial evaluation of Scheme code, 100 iterations.
Written by Marc Feeley.
A Scheme interpreter evaluating a merge sort, 200 iterations.
Written by Marc Feeley.
Scheme to LaTeX processor, 20 iterations.
A test of file i/o and probably much else.
Part of a real program written by Dorai Sitaram.
Larger benchmarks: variations on boyer
Bob Boyer's theorem proving benchmark, a Gabriel benchmark, 10 iterations.
This benchmark uses association lists in place of the original benchmark's
property lists. The nboyer
benchmark
is an updated version of this benchmark.
Translation into Scheme of a translation into Standard ML of a translation
into Scheme of the Common Lisp code for the original
boyer
benchmark, 10 iterations.
Thanks to Standard ML, the data structures are somewhat more convoluted
than in the boyer
benchmark, but aren't
much better.
An updated and exponentially scalable version of the
boyer
benchmark,
1 iteration on a problem of size 3.
(The boyer
and smlboyer
benchmarks solve a problem of size 0.)
This benchmark's data structures are considerably more appropriate than
the data structures used in the
boyer
and
smlboyer
benchmarks.
A test of lists, vectors, and garbage collection.
A version of nboyer that has been tuned (by Henry
Baker) to reduce storage allocation, making it less of a garbage collection
benchmark and more of a compiler benchmark. Only 4 lines of code were
changed, and another 7 lines of code were added.
Synthetic benchmarks: garbage collection
This program was written to mimic the phase structure that has been
conjectured for a class of application programs for which garbage
collection may represent a significant fraction of the execution time.
This benchmark warms up by allocating and then dropping a binary tree
of height 18.
Then it allocates a permanent tree of height 16 and a permanent array
of 500000 floating point numbers.
Then it allocates about 350 megabytes of tree storage in seven phases,
allocating about 50 megabytes per phase.
The first phase allocates 67648 trees of height 4, dropping each tree
before allocating the next.
The second phase allocates and drops 16512 trees of height 6,
and so on to the last phase, which allocates 8 trees of height 16.
Each phase is divided into two subphases. The first subphase allocates
trees top-down using side effects, while the second subphase allocates
trees bottom-up without using side effects.
This benchmark was written in Java by John Ellis and Pete Kovac,
modified by Hans Boehm, and translated into Scheme, Standard ML,
C++, and C by William Clinger.
One glaring difference between this benchmark and typical application
code is that none of the trees are traversed after they are constructed.
[This puts C and C++ at a disadvantage, because languages that rely on
explicit deallocation must traverse the trees to deallocate their storage.]
[Generational garbage collectors perform extremely well on the early phases
of this benchmark, but are likely to perform worse than non-generational
collectors on the last phase or two.]
Creates a list containing all 362880 permutations of a
list of 9 integers, in grey code order, using Zaks's
algorithm, 5 iterations.
A test of storage allocation and garbage collection.
At the end of each iteration, the oldest half of the
live storage becomes garbage.
[This benchmark is particularly difficult for generational garbage
collectors, since it violates their assumption that young objects
have a shorter future life expectancy than older objects.
Nevertheless the generational collectors used in Larceny and in
Chez Scheme perform reasonably well.]
[This benchmark is also difficult for C, because the permutation lists
share much of their substructure. To avoid freeing nodes more than
once, the C version of this benchmark performs an extra pass over the
lists to compute (biased negative) reference counts for each node:
for ( p = perms; p != 0; p = p->cdr ) {
for ( x = p->car; x != 0; x = x->cdr ) {
if ( x->num > 0 )
x->num = 0;
else --(x->num);
}
}
This extra pass can be omitted when using a storage allocator written
especially for this benchmark instead of using malloc
and
free
. With this custom allocator, the code generated by
gcc runs almost 10 times as fast. The code shown above does not
account for very much of this improvement; most of the improvement
must be blamed on the library code for free
.]
Last updated 6 July 1999.