Customizing Pass 2

Most of the optimizations in Pass 2 are independent of the target architecture. Those that are sensitive to the code generator or target are controlled by parameters, customizable policies, and compiler switches:


The register set

The number of general registers that Twobit allocates for arguments and temporaries is a parameter. The assembler maps Twobit's general registers onto some combination of hardware registers and memory locations.

The front end of Twobit currently has no knowledge of specialized representations or registers. In particular, Twobit boxes all numbers and uses subroutine calls for all floating point calculations.


Inlining of primitive operations

The IEEE standard for Scheme allows any procedure to be redefined. Since most operations are procedures in Scheme, this prevents the compiler from generating inline code for most operations.

A compiler switch allows the programmer to promise that none of the usual primitive procedures will be redefined, which allows Twobit to compile them specially. When this switch is true:

A table of primitive operations is supplied as a parameter. For each primitive operation, this table specifies:

Except for the assembler, which is target-dependent, Twobit has no further knowledge of primitive data types and operations.


Inlining of local procedures


Incremental lambda lifting

It is always possible to lift all known local procedures to the outermost lambda expression. This is not always desirable, however, because the formal parameters that must be added represent extra registers that must be saved across a non-tail-recursive call. This is a target-dependent tradeoff, which should be evaluated by the interprocedural register allocator.

At each stage of incremental lambda lifting, after the flow equations have been solved but before the source code is transformed, Twobit calls POLICY:LIFT?, a predicate that determines whether lifting will be performed. POLICY:LIFT? is given three arguments to examine:

  1. The lambda expression that contains the internal definitions to be lifted.
  2. The enclosing lambda expression to which the internal definitions will be lifted.
  3. The solutions to the flow equations, which list the formal parameters that will be added if lifting is performed.
Twobit lifts all or none, because all can share the same closure.

Many factors could be taken into account when deciding whether to lift a group of known local procedures. Here are just a few:

For example, Twobit currently transforms

  (define (sieve n)
    (define (new-generator next-prime prime)
      (let ((psquare (* prime prime)))
        (lambda ()
          (define (loop t)
            (if (or (< t psquare) (not (zero? (remainder t prime))))
                t
                (loop (next-prime))))
          (loop (next-prime)))))
    ...)
into
  ((lambda ()
     (define .sieve_3
       (lambda (.n_5)
         ...))
     (define .new-generator_6
       (lambda (.next-prime_10 .prime_10)
         ((lambda (.psquare_11)
            (define .loop_13
              (lambda (.t_15)
                (if
                 ((lambda (.T93_16)
                    (if .T93_16
                        .T93_16
                        (not (zero? (remainder .t_15 .prime_10)))))
                  (< .t_15 .psquare_11))
                 .t_15
                 (.loop_13 (.next-prime_10)))))
            (lambda () (.loop_13 (.next-prime_10))))
          (* .prime_10 .prime_10))))
     ...))
Here the known local procedure .loop_13 has been lifted out of only one lambda expression. Lifting to the enclosing lambda expression would add .psquare_11 as another argument, and there would be no benefit at all: A closure must be created anyway for the body of the lambda expression that defines .loop_13, and .loop_13 can share that closure at no cost.

The policy used for incremental lambda lifting does not appear to be critical in practice. It is usually advantageous to lift all known local procedures to the outermost level, as is done by conventional lambda lifting.


Interprocedural register targeting

Since the compiler can by definition locate all calls to a known local procedure, it can reorder the procedure's formal parameters and rewrite each call to match the new order. In Twobit, the formal parameters are just names for the registers that are live at entry to the procedure, so reordering the parameters amounts to interprocedural register targeting.

Register targeting is easier than interprocedural register allocation, because register allocation involves decisions concerning which values should be kept in registers and how long they should be kept there. Twobit simply assumes that the arguments to known procedures will be kept in registers for the duration of each call, so all there is to do is to select a particular register for each argument. The goal of register targeting is to minimize register shuffling when one procedure calls another.

Twobit performs interprocedural register targeting by reordering the formal parameters that it adds to known local procedures during lambda lifting. The ordering is chosen by by a customizable policy procedure.

If the set A of formal parameters to be added to f is equal to the set of formal parameters for the lambda expression L that defines f, then Twobit adds the new parameters to the beginning of the parameter list for f, after sorting them so that, for a call to f from the body of L, these parameters will already reside in the correct registers.

If A is a proper subset of the formal parameters of L, and this is the first lifting of f, then the parameters in A are allocated to the positions they occupy in the formal parameter list of L, and the original formal parameters of f are filled in around them. If a formal parameter x of L is not an element of A but occurs as an actual parameter in some call to f, then the corresponding formal parameter of f should be placed in the position left vacant by x. These heuristics reduce register shuffling for calls between mutually recursive procedures, including loop nests.


Compiler switches

Several switches affect Twobit's optimizations:

These switches are true by default.

The first two switches are taken from MacScheme. They allow a programmer to assert that preserving the precise semantics of Scheme when certain top-level procedures are dynamically redefined is less important that generating efficient code.

The benefits of benchmark-mode are explained by

Andrew W Appel. Loop headers in lambda-calculus or CPS. Princeton University CS-TR-460-94, June 1994.