Description of Benchmarks

The benchmarks described below are portable R7RS programs. None define any libraries. Each description of a benchmark begins with a link to its source code, omitting the part that is shared by all of these benchmarks.

All of these benchmarks, their inputs, and the Unix script used to run them, are online and can be downloaded using git or svn.

The timings report execution time only, as calculated using current-jiffy.

geometricMean

This pseudo-benchmark is an aggregate statistic that shows the geometric mean for all benchmarks. Where other benchmarks display timings in seconds, the numerical scores for the geometric mean show the (geometric) average ratio of the system's time to the fastest system's time. An average ratio of 1.0 is the lowest possible, and can be achieved only by a system that is fastest on every benchmark.

The R7RS (small) standard does not require implementations to provide all of the R7RS standard libraries, and some implementations are unable to run some of the benchmarks for other reasons as well; furthermore, our benchmarking script gives up on any benchmark that takes more than an hour to run. The geometric means for each system were calculated using only the benchmarks that returned correct results in less than an hour. When implementations of the R7RS are more mature, the geometric means will be recalculated to impose a penalty for each benchmark an implementation is unable to run.

Gabriel Benchmarks

browse
Browsing a data base, a Gabriel benchmark, 2000 iterations. [May be a test of string->symbol and/or symbol->string.]
deriv
Symbolic differentiation, a Gabriel benchmark, ten million iterations.
destruc
Destructive list operations, a Gabriel benchmark, 4000 iterations of a 600x50 problem.
diviter
Divides 1000 by 2 using lists as a unary notation for integers, a Gabriel benchmark, one million iterations. This benchmark tests null?, cons, car, cdr, and little else.
divrec
This benchmark is the same as diviter except it uses deep recursion instead of iteration.
puzzle
Combinatorial search of a state space, a Gabriel benchmark, 1000 iterations. A test of arrays and classical compiler optimizations. This benchmark was originally written in Pascal by Forrest Baskett.
triangl
Another combinatorial search similar to puzzle, a Gabriel benchmark, 50 iterations.
tak
A triply recursive integer function related to the Takeuchi function, a Gabriel benchmark. 1 iteration of (tak 40 20 11). A test of non-tail calls and arithmetic. [Historical note: The Symbolics 3600 performed 1 iteration of (tak 18 12 6) in 0.43 seconds using generic arithmetic. On our test machine, Larceny runs that benchmark in 0.00016 seconds. That's 2500 times as fast.]
takl
Calculates (tak 40 20 12), which is faster than calculating (tak 40 20 11), using the same recursive algorithm as for the tak:32:16:8 benchmark but using lists to represent integers. This too was a Gabriel benchmark (with different arguments). 1 iteration.
ntakl
The takl benchmark contains a peculiar boolean expression. Rewriting that expression into a more readable idiom allows some compilers to generate better code for it.
cpstak
The tak:40:20:11 benchmark in continuation-passing style, 1 iteration. A test of closure creation.
ctak
The tak:32:16:8 benchmark in continuation-capturing style, 1 iteration. A test of call-with-current-continuation.

Numerical Benchmarks

fib
Doubly recursive computation of the 40th fibonacci number (102334155), using (< n 2) to terminate the recursion; 5 iterations.
fibc
A version of fib that uses first class continuations; written by Kent Dybvig. Calculates the 30th Fibonacci number (832040) 10 times.
fibfp
Calculation of the 35th Fibonacci number using inexact numbers; 10 iterations. A test of floating point arithmetic. Uses essentially the same code as the fib benchmark.
sum
Sums the integers from 0 to 10000, 200000 iterations.
sumfp
Sums the integers from 0 to 1e6, 500 iterations. A test of floating point arithmetic. Uses essentially the same code as the sum benchmark.
fft
Fast Fourier Transform on 65536 real-valued points, 100 iterations. A test of floating point arithmetic.
mbrot
Generation of a Mandelbrot set, 1000 iterations on a problem of size 75. A test of floating point arithmetic on reals.
mbrotZ
Same as the mbrot benchmark, but using complex instead of real arithmetic.
nucleic
Determination of a nucleic acid's spatial structure, 50 iterations. A test of floating point arithmetic, and a real program.
pi
A bignum-intensive benchmark that calculates digits of pi.
pnpoly
Testing to see whether a point is contained within a 2-dimensional polygon, 1000000 iterations (with 12 tests per iteration). A test of floating point arithmetic.
ray
Ray tracing a simple scene, 50 iterations. A test of floating point arithmetic. This program is translated from the Common Lisp code in Example 9.8 of Paul Graham's book on ANSI Common Lisp.
simplex
Simplex algorithm, one million iterations. A test of floating point arithmetic, and a real program.

Kernighan and Van Wyk Benchmarks

Brian W Kernighan and Christopher J Van Wyk wrote a set of small benchmarks to compare the performance of several scripting languages, including C and Scheme. Marc Feeley and I modified some of these benchmarks to correct bugs and to increase the number of iterations. When I translated them into R6RS Scheme, I rewrote most of them into slightly more idiomatic Scheme.

ack
A version of the Ackermann function, with arguments 3,12. Two iterations.
array1
This benchmark allocates, initializes, and copies some fairly large one-dimensional arrays. 500 iterations on a problem size of one million.
string
This tests string-append and substring, and very little else. 25 iterations on a problem size of 500000.
sum1
This benchmark reads and sums 100,000 floating point numbers 25 times. It is primarily a test of floating point input.
cat
This file-copying benchmark is a simple test of character i/o. It copies the King James Bible 50 times.
tail
This benchmark performs considerable character i/o. It prints the King James Bible verse by verse, in reverse order of the verses, 25 times.
wc
Another character i/o benchmark. It counts the number of words in the King James Bible 50 times.

More Input/Output Benchmarks

read1
Reads nboyer.scm 2500 times.

Other Benchmarks

compiler
A compiler kernel that looks as though it was written by Marc Feeley. 2000 iterations on a 47-line input.
conform
A type checker written by Jim Miller, 500 iterations.
dynamic
Dynamic type inference, self-applied, 500 iterations. Written by Fritz Henglein. A real program.
earley
Earley's parsing algorithm, parsing a 15-symbol input according to one of the simplest ambiguous grammars, 1 iteration. A real program, applied to toy data whose exponential behavior leads to a peak heap size of half a gigabyte or more.
graphs
This program was provided by Andrew Wright, but we don't know much about it, and would appreciate more information. This higher order program creates closures almost as often as it performs non-tail procedure calls. Three iterations on a problem of size 7.
lattice
Another program that was provided by Andrew Wright, though it may have been written by Jim Miller. It enumerates the order-preserving maps between finite lattices. 10 iterations.
matrix
Another program that was provided by Andrew Wright. Computes maximal matrices; similar to some puzzle programs. 2500 iterations on a problem of size 5.
maze
Constructs a maze on a hexagonal grid, 10000 iterations. Written by Olin Shivers.
mazefun
Constructs a maze on a rectangular grid using purely functional style, 10000 iterations on a problem of size 11. Written by Marc Feeley.
nqueens
Computes the number of solutions to the 13-queens problem, 10 times.
paraffins
Computes the number of paraffins that have 23 carbon atoms, 10 times.
parsing
Parses the nboyer benchmark 2500 times using a scanner and parser generated using Will Clinger's LexGen and ParseGen.
peval
Partial evaluation of Scheme code, 2000 iterations. Written by Marc Feeley.
primes
Computes the primes less than 1000, 10000 times, using a list-based Sieve of Eratosthenes. Written by Eric Mohr.
quicksort
This is a quicksort benchmark. (That isn't as obvious as it sounds. The quicksort benchmark distributed with Gambit is a bignum benchmark, not a quicksort benchmark. See the comments in the code.) Sorts a vector of 10000 random integers 2500 times. Written by Lars Hansen, and restored to its original glory by Will Clinger.
scheme
A Scheme interpreter evaluating a merge sort of 30 strings, 100000 iterations. Written by Marc Feeley.
slatex
Scheme to LaTeX processor, 500 iterations. A test of file i/o and probably much else. Part of a real program written by Dorai Sitaram.

Garbage Collection Benchmarks

nboyer
An updated and exponentially scalable version of the boyer benchmark. The nboyer benchmark's data structures are considerably more appropriate than the data structures used in the original boyer benchmarks. These timings are for 1 iteration on a problem of size 5. A test of lists, vectors, and garbage collection.
sboyer
A version of nboyer that has been tuned (by Henry Baker) to reduce storage allocation, making it less of a garbage collection benchmark and more of a compiler benchmark. Only 4 lines of code were changed, and another 7 lines of code were added. These timings are for 1 iteration on a problem of size 5.
gcbench
This program was written to mimic the phase structure that has been conjectured for a class of application programs for which garbage collection may represent a significant fraction of the execution time. This benchmark warms up by allocating and then dropping a large binary tree. Then it allocates a large permanent tree and a permanent array of floating point numbers. Then it allocates considerable tree storage in seven phases, increasing the tree size in each phase but keeping the total storage allocation approximately the same for each phase. Each phase is divided into two subphases. The first subphase allocates trees top-down using side effects, while the second subphase allocates trees bottom-up without using side effects. This benchmark was written in Java by John Ellis and Pete Kovac, modified by Hans Boehm, and translated into Scheme, Standard ML, C++, and C by William Clinger; it has had too much influence on implementors of production garbage collectors. The timings shown are for 1 iteration on problem size 20.
mperm
The mperm20:10:2:1 benchmark is a severe test of storage allocation and garbage collection. At the end of each of the 20 iterations, the oldest half of the live storage becomes garbage. This benchmark is particularly difficult for generational garbage collectors, since it violates their assumption that young objects have a shorter future life expectancy than older objects. The perm9 benchmark distributed with Gambit does not have that property.

Synthetic Benchmarks for R7RS

The R6RS and R7RS added several new features that had not been tested by older benchmarks because they were not a standard part of Scheme. Most of the following synthetic benchmarks were derived from Larceny's test suites for these features.

equal
This benchmark tests the R7RS equal? predicate on some fairly large structures of various shapes, including circular structures.
bv2string
This benchmark tests conversions between bytevectors and Unicode.