Larceny Note #14: Petit Larceny Architecture

September 27, 1999 / Lars T Hansen

Petit Larceny

Contents

Introduction

Run-time Structure
  Overall compilation model
  Launching Petit Larceny
  User interface
  Building Petit Larceny
  Building variants of Petit Larceny

Petit Larceny Source Files
  Run-time system files
  Compiler files
  Assembler files

System internals
  Code generation
  Startup
  Runtime architecture
    Dispatch loop
    Generic arithmetic

Architecture Dependencies
  Word size
  Endianness

Status as of 4 June 1999
  Portability problems
    Nonportable code
    Copmpiler and operating system idiosyncrasies
  Operating systems support
    Unix
    MacOS
    Generic operating systems
    

Introduction

This document describes the architectural aspects of Petit Larceny, a portable Scheme system based on Larceny. Petit Larceny achieves reasonably good performance at the expense of some programmer convenience by compiling Scheme to C, which is then compiled to machine code by the local C compiler. Petit Larceny uses the same compiler (Twobit) and most of the same run-time system as Larceny, including all the garbage collectors.

Petit Larceny should run on most 32-bit architectures and is largely operating-system independent. In particular, it has been tested on several variants of Unix, on the Macintosh Operating System, and on Windows 95. Opertaing-system independece comes at a cost, however, since not all features provided by Larceny can be provided by operating-system independent means. Thus, while Petit Larceny can use the "generic" operating system support (where some features are absent), operating-system dependent units can be fashioned for each particular system Petit Larceny runs on.

We expect to remove word-size dependencies in Petit Larceny before long, allowing it to run on a broader class of systems.

Run-time Structure

Overall compilation model

Petit Larceny compiles Scheme code to native code by compiling Scheme code to C and then invoking the local C compiler to generate native code. Since the resulting native code cannot be dynamically loaded into a running systems on all platforms, Petit Larceny usually has less of an interactive feel than Larceny. In particular, source code that is loaded with LOAD cannot be compiled on-the-fly; instead it must be compiled separately and the resulting machine code and data must be loaded and linked with Petit Larceny before it can become available.

Petit Larceny's reasonably fast interpreter mitigates the compilation problem to some extent, but interpreted code is still significantly slower than compiled code in most cases.

The loading and linking process will differ somewhat from one operating system to another, and as Petit Larceny has portability across operating systems and architectures as one of its goals, a least-common-denominator approach has been adopted. Additional mechanisms may be provided on some operating systems.

Program structure

The least-common-denominator approach to program structure is this: your Scheme code can always be incorporated into Petit Larceny by creating a variant of Petit Larceny that includes the code and data for your application.

A Petit Larceny application consists of: an object file containing run-time system and library code (PETIT.LIB); a heap image containing data for the library code (PETIT.HEAP); a set of files containign data but no code for the application (*.FASL); an object file containing containing the application code (APPLICATION.LIB); and finally an application object file tying these together (APPLICATION.OBJ). This structure, though elaborate, is reasonably portable, does not constrain more advanced techniques on platforms that provide them, and allows applications to be built without requiring the heap image to be rebuilt from scratch.

We can distinguish three classes of C source files:

Run-time system: The files comprising the run-time system (in the Rts directory) are all hand-coded.
Library code: The files comprising the run-time libraries (in the Lib, Interpreter, and REPL directories) are generated by Twobit from the Scheme source files in the same directories. Also in this class is the bootstrap heap startup file. This file--HEAPDATA.c in Larceny's root directory--contains the heap bootstrap code and a table of pointers to environment-less thunks. The thunks are called to evaluate the top-level expressions in the compiled libraries.
Application code: Scheme code in your application is compiled to LOP files by Twobit. Petit Larceny's application builder is then invoked to create C and FASL files from the compiled code, to compile the C code, and to create an object module that contains all the compiled object files. In addition, the application builder creates the application main file, which is a C file that contains a single table of pointers to environment-less thunks. These thunks are called to evaluate the top-level expressions in compiled application code files.

Petit Larceny is an application built in this fashion. It contains not application code, because all compiled code is already in the libraries, thus the application main file contains an empty code table. The name of the application main file is "petit-larceny.c".

Launching Petit Larceny

On Unix systems, a Petit Larceny application consists of three files:
  petit-larceny          The executable: a stub
  petit.heap             The heap image: data for the compiled Scheme libraries
  petit-larceny.so       The shared library: All compiled code for the run-time 
                         system and the compiled Scheme libraries.

Petit Larceny is run by launching petit-larceny; the program dynamically links in petit-larceny.so and loads petit.heap into the run-time heap. There is some operating-system dependent cruft associated with how the executable finds the shared library; this must be resolved on a system-by-system basis.

It is possible to statically link petit-larceny against the library, and to use other heap images by giving the name of the heap image on the command line in standard fashion.

On other operating systems, the structure is similar.

User interface

FIXME
(The generic code assumes a command-line interface and C-like I/O model; can be adapted to operating systems on demand. Note the Macintosh console.)

Building Petit Larceny

The build process is straightforward and similar to the process of building Larceny. First, build the run-time system library:
  cd Rts
  make Rts/libpetit.so
Second, build the heap image and the application itself. You do this from Larceny's build system, which is started thusly:
  build -petit -host -endianness
Host is the name of the host Scheme system; the allowable hosts are listed at the head of the build program itself. Endianness specifies the endianness of the target platform; it is either little or big. See the section on architecture dependencies below.

Once the build system is up, build the heap image and the executable with the following command:

  (make-petit-heap)

If your operating system does not have a C compiler that can be invoked by another program, then you will need to do two things. First, before executing the preceding command, you must select the dummy C compiler in the build environment (FIXME: how to do this?). Second, after the build has completed, you must compile and link the generated C code. (FIXME: A list of the files that need to be compiled and linked needs to be distributed with Petit Larceny but this does not happen at present.)

FIXME: on systems that _do_ have CLI compiler it may still be necessary to do some hacking to get the interface right.

Building variants of Petit Larceny

FIXME.

Petit Larceny Source Files

Petit Larceny shares most of its sources with Larceny. Files private to Petit Larceny are extensions to the run-time system (millicode and syscalls), a primitive table for the compiler, and an assembler module that generates C and invokes the C compiler.

Run-time system files

The Rts/Standard-C directory contains run-time system files that are specific to Petit Larceny. These are:
  arithmetic.mac  Generic arithmetic macros.
  config.c        Definitions of system name, default heap file name,
                  and architecture name.
  millicode.h     Header file for portable millicode implementation.
  millicode.c     Portable millicode implementation, including
                  dispatch loop.
  multiply.c      Portable 32x32->64 multiplication code.
  petit-config.h  Configuration file for Petit Larceny
  petit-hacks.h   Preprocessor symbols and macros that are useful
                  and/or fix gross naming bugs in Larceny that have
                  not yet been fixed globally.  These fixes should
                  be folded into the system.
  syscall.c       Syscalls specific to Petit Larceny.
  twobit.h        Macros for all MAL instructions and primops.

Compiler files

The Compiler directory contains one additional compiler file that is specific to Petit Larceny. This is the table of primitive operations: standard-C.imp.sch.

Assembler files

The Asm/Standard-C directory contains the assembler back-end for Petit Larceny. It consists of the following files:
  pass5p2.sch          MAL code emitter procedures.
  dumpheap-extra.sch   Overriding procedures for heap dumper.
  asm-switches.sch     Assembler switches.

System internals

Code generation

Twobit is not affected by the change in back-ends, and produces LAP files like it used to. (Since the primitive tables depend on the target architecture, LAP files from SPARC Larceny are not portable to Petit Larceny.) The LAP files are assembled with the new assembler, which produces LOP files that are different from the SPARC LOP files.

The new LOP files contain segments that are represented as lists with three elements: code vector, constant vector, and list of C functions defined by the code vector. The code vector is a string that contains C code. The constant vector has the same tagged structure is in SPARC LOP files, except that code vectors again are strings containing C code. The list of C function names can be used by a postprocessor to produce prototypes for the functions defined in the code vector. Each entry is a list of length 3: function name, flag for whether the procedure is always defined (some procedures are only defined if a certain calling convention is used), and a flag for whether the procedure is the top-level thunk that's to be run when a bootstrap heap image is loaded.

Every C function in a file is declared "static" and has a name that is unique in the file. When a C file is produced from a LOP file for compilation, forward declarations are added at the beginning of the file, all code vectors are copied from the LOP segments to the file, and finally trampolines are added that can be called at startup time. Each trampoline knows about the initialization procedures in the file, and for each of them, in order, it creates a trivial thunk which it then calls.

When a group of LOP files are dumped into a heap image, temporary C source files are generated as outlined above and compiled individually; simultaneously, all the data from the LOP files are dumped into a heap image that's identical to the standard heap image except that every code vector slot contains #f. The C code always knows which code vector it needs and gets the code vector by using its name. Consequently, the startup-thunk list in the SPARC heap image is replaced by a C procedure that knows the name of all the startup thunks and can just call them.

When a dumped heap is linked with Larceny run-time support, Larceny only needs to know the name of the startup procedure.

The run-time system is available as an archive Rts/libpetit.so. The archive is created by executing "(cd Rts ; make libpetit.so)". In particular, the archive contains a file with a main() function. When a heap is dumped, the resulting C files are compiled and then linked with the archive to produce an executable (and a code-less heap image).

Startup

If the contents of SCHEME_STARTUP is a procedure with #f for the code vector, then the code for the procedure is in the C procedure twobit_start; this code is patched into the thunk and then the thunk is called. Otherwise it is a normal procedure, the code pointer is the address of some C procedure, and the procedure should be called.

Twobit_start is the result of compiling the startup procedure in Asm/Standard-C/dumpheap-extra.sch. The startup procedure is called with a single argument, the argument vector. Its job is to call all the init thunks in the heap and then call _go_ with the list of symbols to intern and the argument vector. In a native heap, the list of thunks is in a list in the bootstrap procedure's constant vector. In a C heap, the code pointers are not present in the initial heap, so a trick is necessary. The startup procedure calls a special primitive, mc_petit_patch_boot_code, which patches code vectors into the list of thunks. It gets the code pointers from an external array that is created specially for each heap. This array is called twobit_start_procedures and is created by the heap dumper. This is the only change necessary in the boot code, relative to the native boot code.

Heaps can be dumped iff code pointers don't move between program invocations.

Runtime architecture

Dispatch loop

The RTS calls the boot procedure petit_larceny_start with the globals vector as the only argument; any arguments to the boot procedure are encoded in the register file in the globals vector. A stack frame already exists for the boot procedure to use. The boot procedure sets the return address in the frame to be dispatch_loop_return (see below). Then, a jump buffer is created for longjmp, the first code vector is extracted from the procedure in REG0, and that code vector is jumped to. This transfers control to compiled Scheme code.

Whenever a code vector returns normally to the dispatch loop, it must return a pointer to its continuation (a procedure). The loop then immediately calls that continuation.

However, it's also possible to transfer control via longjump. The procedure call_scheme does this: it saves the current state, creates a new stack frame, sets REG0 to the procedure to call, and does a longjump to the dispatch loop, which grabs the code vector from REG0 and calls it, just like in the startup phase. Normally this is not the way to reach the dispatch loop: only Scheme-to-Scheme-thru-Millicode calls do this. The exit handler is reached if the initial Scheme procedure executes a RETURN instruction.

Generic arithmetic

General idea for generic arithmetic:
  add( a, b, k )
  {
    if (both_fixnums( a, b )) {
      r, oflo = ADD( a, b );
      if (!oflo) {
        RESULT = r;
        return k;
      }
    }
    /* oflo or not both fixnums */
    mc_generic_add( globals, k ); /* does its thing inline or longjmps */
  }

Architecture dependencies

Word size

Larceny and Petit Larceny currently depend on using a 32-bit word. The underlying hardware doesn't have to be 32-bit as such, but it needs to provide a 32-bit pointer and a 32-bit unsigned integer type.

Unfortunately, compiling Petit Larceny on a 16-bit platform that provides 32-bit representations may not be straightforward, since native integers on such a platform are most likely 16-bit. While the use of 32-bit data for the object representation is abstracted from the C compiler's representations (for example, the type name "word" is used everywhere for the type that holds a Scheme object pointer), other data used by the run-time system have been declared simply as "int" or "unsigned", and assumptions may have been made about these being at least 32 bits wide.

Thus, to compile Larceny to run under Windows 3.1, MS-DOS, or similar 16-bit operating systems, it may be necessary to use a a system like Win32s or a 32-bit DOS extender.

In contrast, compiling Petit Larceny in 32-bit mode on a DEC Alpha should not cause any problems.

Endianness

Definition: Big-endian systems store multi-byte data from most significant to least significant byte with the most significant byte at the lowest address.

Definition: Little-endian systems store multi-byte data from least significant to most significant byte with the least significant byte at the lowest address.

Endianness dependencies in the system are few and fairly well isolated. If your system has mixed endianness you will almost certainly need to adapt Petit Larceny. For example, systems that store IEEE doubles in big-endian word order but the bytes of each word in little-endian order are neither little-endian nor big-endian by the above definitions, and code that manipulates data layouts must know.

Few parts of the C run-time system have any endianness dependencies; search the code for the string ENDIAN_.

Some Scheme code that manipulates flonums and bignums is endianness-dependent. The dependencies are isolated in the files flonums-el.sch, flonums-be.sch, bignums-el.sch, and bignums-be.sch in Lib/Common.

The development system needs to know about endianness, so the build script must be modified, as must Lib/makefile.sch, Asm/Common/dumpheap.sch, and Util/nbuild.sch.

Status as of 4 June 1999

Portability issues

Although the goal of Larceny is to run on any system that has a Standard C compiler (modulo some architectural constraints), that goal has yet to be reached. Some of the outstanding problems will be rectified over time, but others will not and will have to be dealt with on a case-by-case basis by the person porting Petit Larceny.

Nonportable code

Petit Larceny, like Larceny, assumes that a pointer to void can be cast to a WORD and back without losing any bits (thus by extension that any pointer to a datum can so be cast). This is a fundamental architectural constraint and pervades the run-time system.

Petit Larceny currently casts code pointers to integers and back, and relies on two non-portable assumptions in doing so.

The first assumption is that there are integers that are wide enough to hold code pointers; this may not be the case and is not required by Standard C. This problem may or may not be addressed in the future.

The second assumption is that code pointers point to a WORD address. This too is not required by Standard C, and though most modern compilers generate code that is word-aligned when possible for performance reasons, some don't, and in general, many won't. (One compiler that doesn't generate aligned code is LCC-WIN32.) This problem will be fixed.

Some C code relies on compiler-specific semantics of the right shift operator: that a right shift of an unsigned quantity shifts in 0 bits at the high end, and that a right shift of a signed quantity shifts in a copies of the high bit. This problem will be fixed.

Some macros used by the garbage collector (in Rts/Sys/cheney.c) expand to lines much longer than the maximum length guaranteed by Standard C (509 bytes). Thus some Standard C compilers may reject the program. I have not yet encountered a compiler that has this problem, although I have seen standalone C preprocessors that have truncated the output to about 1K.

Some library Scheme code (notably the REPL and the heap dumper) assumes that the operating system supports reasonable file names, i.e., more than 8+3, and are in any case operating-system specific. Names used by these should be computed more carefully. This problem will be fixed.

Pointers to different "objects" (blocks of memory) may be compared using relational operators, possibly wreaking havoc on systems that represent pointers as segment+offset. This should be a minor problem thesedays and will be fixed if encountered.

Extensive use of 'int' assuming it's large enough...

Compiler and operating system idiosyncrasies

Occasionally I run into implementation limitations. This section documents some of them.

When Petit Larceny was ported to the DEC Alpha, I used the DEC C compiler's -taso and -xtaso_short switches to compile the program in 32-bit mode. However, the ARGV pointer passed to the MAIN function is still a 64-bit pointer, and cannot be cast to a 32-bit pointer! Thus it was necessary to copy ARGV before passing it to the argument-parsing structures. Similarly, a LONG on this architecture is 64 bits, and one cast to LONG in the run-time system therefore had an unintended effect. Other, similar nonportable code may still exist in the run-time system.

FIXME
- code is not loaded at the same address every time (MacOS, at least)
- compiler requires unique filenames across subdirectories (CodeWarrior)
- very large number of function pointers causes problems (TOC on CodeWarrior;
  what about MIPS conventions?)
- operating system has no subdirectories (CP/M)
- operating system has short filenames (MS-DOS)
- object size limited to 64K or so (malloc and friends on 16-bit systems) 
  requires use of system-specific allocation procedures.  One may have to
  replace calls to malloc() by halloc()

Operating system interface

Operating systems support for non-Unix systems is still evolving. As of June 1999, Petit Larceny is self-hosting on Macintosh, and runs on Windows 95 compiled with GCC and the Cygwin Unix compatibility package.

Unix variants

Petit Larceny uses the same run-time libraries as SPARC Larceny on Unix, and all features of Larceny are supported. Petit Larceny has been tested on SunOS 4.1.4 and SunOS 5.6 on Sun SPARC machines with GCC and LCC, DEC OSF/1 on a DEC Alpha (in 32-bit mode) with the DEC C compiler, and on RedHat Linux 5.1 on an Intel 486 with GCC.

The SPARC foreign function interface ought to work with Petit Larceny on SPARC systems; this has yet to be investigated.

Macintosh

Petit Larceny has partial operating-systems support for the Macintosh Operating System when compiled with the CodeWarrior compiler only, because the operating systems support depends on the Unix compatibility libraries provided with CodeWarrior. Less compiler-dependent support is in the works, but is not a priority since CodeWarrior is really without competition on MacOS. Additionally, a graphical user interface written by Doug Currie has been adapted to Petit Larceny, so Petit Larceny has a reasonably complete GUI with text-editing capabilities sensitive to Scheme syntax and indentation. Petit Larceny can also be built with the CodeWarrior "SIOUX" package, which emulates a standard Unix-like console.

Generic operating systems support

A package for generic operating systems support based on Standard C is in the works. This package cannot support some facilities used by the self-hosting systems (notably FILE-MODIFICATION-TIME), but it should still be possible to create a self-hosting system that recompiles everything every time.

There is a little bit of hair involved here to make sure that Scheme and the operating systems support communicate about the value of the newline character; this appears to be a matter of programming.

Future structure for applications

Currently there are two tables: twobit_start_procedures and twobit_load_table. Both are in HEAPDATA.c which is linked into the petit-larceny file with all the other object files -- what I propose to place in petit-heap.so. The latter would contain twobit_start_procedures only; the application would supply twobit_load_table. In petit-larceny this would be empty; in an application built for other purposes (to load the compiler, say) it would be nonempty.

The split would speed up the build process because no heap dumping would need to take place for applications -- only compiling and linking the compiled code to an executable and a bunch of FASL files. The FASL files can be loaded, and a complete heap dumped. It's a very localized change. It would work on all architectures that have a notion of libraries, as linking can happen statically as well as dynamically.