Larceny Note #13: MacScheme Machine Instruction Set

Lars T Hansen / 21 January 1999
Copyright 1991, 1992 Lightship Software,  Incorporated.


The Machine

The MacScheme machine is an accumulator plus general-register machine.
(It was designed to overcome the shortcomings of a similar intermediate
language used in MacScheme.) Its instructions were designed for easy
peephole optimization when generating code for real RISC machines.

Registers.

    RESULT    Accumulator and first millicode argument register
    SECOND    Second millicode argument register
    THIRD     Third millicode argument register
    REG0      Procedure register / environment / static link
    REGn      General registers 1, ..., r for machine-dependent r
    CONT      Continuation register / dynamic link

Stack cache.

The machine uses a stack cache to hold the topmost stack frames.  The
CONT register points to the topmost frame in the cache.

Environments.

REG0 always holds the currently executing procedure, providing access to
its constants and free variables.


Assembly Language Syntax

An <int> is an exact integer.
A <symbol> is a Scheme symbol.
A <doc> is an arbitrary value.

<segment> ::= (<instruction> ...)

<instruction> ::= <pseudo>
                | (<op1> <prim1>)
                | (<op2> <prim2> <register>)
                | (<op3> <prim3> <register>)
                | (<op2imm> <primimm> <imm>)
                | (<const> <object>)
                | (<global> <symbol>)
                | (<setglbl> <symbol>)
                | (<lexical> <int> <int>)
                | (<lexical> <int> <int> <id>)
                | (<setlex> <int> <int>)
                | (<setlex> <int> <int> <id>)
                | (<stack> <int>)
                | (<setstk> <int>)
                | (<load> <register> <int>)
                | (<store> <register> <int>)
                | (<reg> <register>)
                | (<reg> <register> <id>)
                | (<setreg> <register> <doc>)
                | (<movereg> <register> <register>)
                | (<lambda> <program> <int> <ribdoc>)
                | (<lexes> <int> <ribdoc>)
                | (<args=> <int>)
                | (<args>=> <int>)
                | (<invoke> <int>)
                | (<save> <int>)
                | (<setrtn> <label>)
                | (<restore> <int>)
                | (<pop> <int>)
                | (<return>)
                | (<mvrtn>)
                | (<apply> <register> <register>)
                | (<nop>)
                | (<jump> <int> <label>)
                | (<jump> <int> <label> <live>)
                | (<skip> <label>)
                | (<skip> <label> <live>)
                | (<branch> <label>)
                | (<branch> <label> <live>)
                | (<branchf> <label>)
                | (<branchf> <label> <live>)

<pseudo> ::= (<.label> <label>)
           | (<.proc>)
           | (<.cont>)
           | (<.align> <int>)

<register> ::= <int>
<label> ::= <int>
<prim1> ::= <primitives of 0 or 1 arguments, not specified here>
<prim2> ::= <primitives of 2 arguments, not specified here>
<prim3> ::= <primitives of 3 arguments, not specified here>
<primimm> ::= <primitives of 2 arguments with immediate argument #2, not specified here>

A <ribdoc> is the constant #f or a vector of 0 or more elements of 
specific types; the format is defined elsewhere.  

The nonterminals <.label>, <.proc>, <.cont>, <.align>, 
<op1>, etc., denote integer opcodes that are not specified here.

An <object> is any Scheme object that can be read and written
using Scheme's READ and WRITE procedures.

An <imm> is a target-dependent immediate value; see the section
Implementation Notes, below.


MacScheme machine instructions.

In the descriptions of these instructions, the following ranges and
uses are implied.

    doc is any Scheme object used as documentation.
    x is any Scheme object.
    z is an immediate operand, MINIMM <= z < MAXIMM.
    m is a rib number, 0 <= m < MAXRIB.
    n is an argument count, rib slot, or stack frame slot,
       0 <= n < MAXSLOT.
    k is a register, 0 <= k < R.
    o is an offset from the beginning of a code segment,
       0 <= o < MAXOFFSET.
    L is an offset from the beginning of the next instruction,
       MINOFFSET <= L < MAXOFFSET.


Implementation-specific parameters.

The highest numbered general register is REGr, where r = R-1.

See the end of this file for implementation notes, notably the 
values of R, MINOFFSET, MAXOFFSET, MINIMM, MAXIMM, MAXSLOT, and MAXRIB.


Summary of instructions.

Data operations.

op1     prim
op2     prim,k
op3     prim,k2,k3
op2imm  prim,z

Constants.

const   x

Variable references.

global  x
setglbl x
lexical m,n
setlex  m,n
stack   n
setstk  n
load    k,n
store   k,n
reg     k
setreg  k
movereg k1,k2

Lambda expressions.

lambda  x,n,doc
lexes   n,doc

Procedure calls.

args=   n
args>=  n
invoke  n
save    n
setrtn  L
restore n
pop     n
return
mvrtn
apply   k1,k2
nop
jump    m,o

Branches.

branch  L
branchf L
skip    L


Description of instructions.

op1     prim
op2     prim,k1                 (k1 > 0)
op3     prim,k1,k2              (k1, k2 > 0)

        Loads RESULT with the result of the given primitive operation
        with arguments taken from RESULT, REGk1, and REGk2.  May fault,
        in which case the special registers SECOND and THIRD are loaded
        with the values of REGk1 and REGk2 before the fault is taken.

op2imm  prim,z

        Loads RESULT with the result of the given primitive operation
        with the first argument taken from RESULT and a second argument
        z (which is always an immediate value).  May fault, in which case
        the special register SECOND is loaded with z before the fault
        is taken.

lambda  x,n,doc

        If n < r, loads RESULT with the procedure formed from the 
        code x and the values of registers REG0-REGn.  If n >= r,
        loads RESULT with the procedure formed from the values of
        registers REG0-REG{r-1} and the first n-r+1 values taken
        from the list in REGr.  The code x consists of a bytevector
        of pure code and a vector of constants.  The documentation
        is used only for debugging.

lexes   n,doc

        If n < r, loads RESULT with the procedure formed from the
        currently executing code and the values of registers REG0-REGn.
        If n >= r, loads RESULT with the procedure formed from the 
        values of registers REG0-REG{r-1} and the first n-r+1 values
        taken from the list in REGr.  The documentation is used only
        for debugging.

const   x

        Loads RESULT with x.  Unless x is an immediate, it is fetched
        from the current procedure's constant vector.

global  x

        Loads RESULT with the value contained in the value cell for the
        global variable x.  May fault if the value is undefined.  The 
        value cell is fetched from the current procedure's constant vector.

setglbl x

        Stores RESULT in the value cell for the global variable x.
        The value cell is fetched from the current procedure's constant
        vector.  Destroys RESULT.

lexical m,n

        Loads RESULT with the value of slot n of rib m in the current
        environment (i.e. current procedure).  May fault if the value
        is undefined.

setlex  m,n

        Stores RESULT in slot n of rib m in the current environment
        (i.e. current procedure).  Destroys RESULT.

stack   n

        Loads RESULT with the value of slot n in the topmost frame of the
        stack cache.

setstk  n

        Stores RESULT in slot n in the topmost frame of the stack cache.
        Destroys RESULT.

load    k,n                     (n >= 0)

        Loads REGk with the value of slot n in the topmost frame of the
        stack cache.

store   k,n                     (n >= 0)

        Stores REGk in slot n of the topmost frame of the stack cache.

reg     k

        Loads RESULT with the value of REGk.

setreg  k

        Stores RESULT in REGk.  Destroys RESULT.

movereg k1,k2

        Stores REGk1 in REGk2.

args=   n

        Faults if the value of RESULT is not n.

args>=  n                       (n <= R-2)

        Assumes that the value of RESULT is a fixnum j.  Faults unless
        j >= n.

        Case 0: n < R-2, j < r.  Loads REGn+1 with a newly allocated list
            formed from REGn+1 through REGj.
        Case 1: n < R-2, j >= r.  Loads REGn+1 with
            (APPEND (LIST REGn+1 ... REG{r-1})
                    (LIST-COPY REGr))
        Case 2: n = R-2.  Loads REGr with (LIST (LIST-COPY REGr)).
        Case 3: r <= n <= j.  Loads REGr with
            (LET ((REGr (LIST-COPY REGr)))
              (SET-CDR! (LIST-TAIL REGr (- n r))
                        (LIST (LIST-TAIL REGr (- n (- r 1)))))
              REGr)

        Cases 1 through 3 must copy the list in REGr to avoid some
        extremely obscure bugs with call-with-current-continuation.

invoke  n

        Faults if the value of TIMER is zero.  Faults if the value of
        RESULT is not a procedure.  Decrements TIMER, loads SECOND with
        the value of REG0, loads REG0 with the value of RESULT, stores n
        in RESULT, and jumps to the entry point for the procedure in REG0.

save    n

        Pushes a new continuation frame onto the stack cache with n+1
        data slots, and initializes slot 0 with the value of REG0.  
        Faults on stack cache overflow.

setrtn  L

        Stores the return address L in the return address slot in the
        topmost frame of the stack cache.

restore k

        Restores registers REG0-REGk from the topmost frame of the stack
        cache.  This frame must have been created by "save n", where
        k <= n.

pop     n

        Pops the topmost frame from the stack cache.  This frame must have
        been created by "save n".  This instruction may empty the stack
        cache.

return

        Jumps to the return address contained in the topmost frame of
        the stack cache.  If the stack cache is empty, one frame is copied
        into the stack cache prior to the return.

mvrtn

        Let n be the value in RESULT.  Returns n values.  If n < r, then
        the return values are in REG1 through REGn.  If n >= r, then the
        first R-2 return values are in REG1 through REG{R-2}, and REGr holds
        a list of the remaining return values.

apply   k1,k2

        Faults if the value of TIMER is zero.  RESULT must contain a
        PROCEDURE.  REGk1 must contain a list.  REGk2 must contain the
        length of the list in REGk1.  Let n be the value of REGk2.  If n < r,
        loads registers REG1-REGn with the elements of the list in REGk1.
        If n >= r, loads registers REG1-REG{R-2} with the first R-2 elements
        of the list in REGk1, and stores the tail of that list in REGr.  
        Decrements TIMER, loads SECOND with the value of REG0, loads REG0 
        with the value of RESULT, stores n in RESULT, and jumps to the 
        entry point for the procedure in REG0.

nop

        Does nothing whatsoever.

jump    m,o

        Faults if the value of TIMER is zero.  Decrements timer, loads
        REG0 with the procedure obtained by following m links of the 
        static chain, and jumps to the code of that procedure at offset o.

branch  L

        If this is a backward branch, decrements TIMER and faults if
        the new value is negative.  Jumps to L.  Destroys RESULT.

branchf L

        If RESULT contains a false value, then behaves as "branch L".
        Destroys RESULT in any case.

skip    L

        Jumps to L.  Destroys RESULT.


Implementation notes.

Minimal parameter values.

    R                  ?    (Certainly at least 3, or op3 won't work)
    MINIMM             0
    MAXIMM           256
    MAXOFFSET      32767
    MINOFFSET     -32768
    MAXSLOT          256
    MAXRIB           256

SPARC implementation.

    R                 32
    MAXIMM          1023
    MINIMM         -1024
    MAXOFFSET   16777215
    MINOFFSET  -16777216
    MAXSLOT         1020
    MAXRIB     unlimited

    Additional immediate values can also be immediates in instructions. 
    These include #t, #f, and the empty list, and in some instances,
    characters.  See Compiler/sparc.imp.sch for details.

    INVOKE and APPLY: the old value of REG0 is not preserved in
    SECOND across the call.

    BRANCHF: if RESULT contains a false value, then behaves as "skip L".

    ARGS>=: Twobit guarantees that all fixed arguments are in registers,
    so only cases 0 and 1 of this instruction are implemented.

    For obscure reasons, SECOND is called ARGREG2 and THIRD is called 
    ARGREG3.  (This needs to be fixed.)

Standard-C implementation.

    R                 32
    MINIMM             0 (ought to be smaller)
    MAXIMM           256 (ought to be larger)
    MINOFFSET  unlimited
    MAXOFFSET  unlimited
    MAXSLOT    unlimited
    MAXRIB     unlimited

    MINOFFSET, MAXOFFSET, MAXSLOT, and MAXRIB are restricted only by the
    C compiler; typically, they are restriced only by the virtual address 
    range of the processor.

    INVOKE and APPLY: the old value of REG0 is not preserved in
    SECOND across the call.

    ARGS>=: Twobit guarantees that all fixed arguments are in registers,
    so only cases 0 and 1 of this instruction are implemented.

    There is a fourth millicode argument register, FOURTH.


$Id: note13-malcode.html 469 1999-01-21 20:34:22Z lth $
larceny@ccs.neu.edu