Twobit Pass 4: parallel assignment optimization

Twobit Pass 4: parallel assignment optimization.

Scheme does not specify the order in which the arguments of a procedure call are evaluated. Most Scheme compilers, including Twobit, use parallel assignment optimization to find an efficient order of evaluation for each call.

Consider, for example, the internal definition

  (define .loop_3
    (lambda (.f_2 .l_5 .x_5)
      (if (pair? .l_5)
          (.loop_3 .f_2
                   (cdr .l_5)
                   (cons (.f_2 (car .l_5))
                         .x_5))
          .x_5)))

Variables .f_2, .l_5, and .x_5 are symbolic names for registers REG1, REG2, and REG3, which are the first three registers used to pass arguments. To evaluate the arguments for the tail-recursive call to .loop_3, the compiler must therefore achieve the effect of performing the following assignment statements in parallel:

     REG1  :=  REG1
     REG2  :=  (cdr REG2)
     REG3  :=  (cons (REG1 (car REG2)) REG3)

On a sequential machine, the effect of this parallel assignment can be achieved most efficiently by performing the third assignment first, followed by the second assignment. The first assignment does not need to be performed at all.

Twobit performs parallel assignment optimization by constructing the dependency graph for the parallel assignments. Using <= to mean "depends on", the dependency graph for the example above is

     REG1 <= REG1
     REG2 <= REG2
     REG3 <= REG1, REG2, REG3

Twobit then attempts a topological sort of the dependency graph. If a topological sort can be found, then it represents an order of evaluation that will evaluate each argument expression directly into its target register.

If no topological sort can be found, then at least one temporary will be required to hold an argument until its target register becomes free. In the worst case every register depends on every other register, in which case all of the arguments must be kept in temporaries until all have been evaluated, but this seldom happens.

Parallel assignment optimization reduces the size of the Scheme standard library by about 6%.