##
Twobit Pass 4: parallel assignment optimization.

Scheme does not specify the order in which the arguments
of a procedure call are evaluated.
Most Scheme compilers, including Twobit, use parallel
assignment optimization to find an
efficient order of evaluation for each call.

Consider, for example, the internal definition

(define .loop_3
(lambda (.f_2 .l_5 .x_5)
(if (pair? .l_5)
(.loop_3 .f_2
(cdr .l_5)
(cons (.f_2 (car .l_5))
.x_5))
.x_5)))

Variables `.f_2`

, `.l_5`

, and `.x_5`

are symbolic names for registers
`REG1`

, `REG2`

, and `REG3`

,
which are the first three registers used to pass arguments.
To evaluate the arguments for the tail-recursive call to
`.loop_3`

, the compiler must therefore
achieve the effect of performing the following assignment
statements in parallel:

REG1 := REG1
REG2 := (cdr REG2)
REG3 := (cons (REG1 (car REG2)) REG3)

On a sequential machine, the effect of this parallel
assignment can be achieved most efficiently
by performing the third assignment first, followed by
the second assignment. The first assignment does not
need to be performed at all.

Twobit performs parallel assignment optimization by
constructing the dependency graph for the parallel
assignments. Using `<=`

to mean "depends on",
the dependency graph for the example above is

REG1 <= REG1
REG2 <= REG2
REG3 <= REG1, REG2, REG3

Twobit then attempts a topological sort of the dependency
graph.
If a topological sort can be found, then it
represents an order of evaluation that will evaluate each
argument expression directly into its target register.

If no topological sort
can be found, then at least one
temporary will be required to hold an argument until its
target register becomes free.
In the worst case every register depends on every other
register, in which case all of the arguments must be kept
in temporaries until all have been evaluated, but this
seldom happens.

Parallel assignment optimization reduces the size of the
Scheme standard library by about 6%.