Foreign-Function Interface to C ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Larceny provides a general foreign-function interface (FFI) substrate on which other FFIs can be built; see link:LarcenyNotes/note7-ffi.html[Larceny Note #7]. The FFI described in this manual section is a simple example of a derived FFI. It is not yet fully evolved, but it is useful. [WARNING] ================================================================ This section has undergone signficant revision, but not all of the material has been properly vetted. Some of the information in this section may be out of date. ================================================================ [NOTE] ================================================================ Some of the text below is adapted from the 2008 Scheme Workshop paper, ``The Layers of Larceny's Foreign Function Interface,'' by Felix S Klock II. That paper may provide additional insight for those searching for implementation details and motivations. ================================================================ [[FfiOverview, Introducing the FFI]] ==== Introducing the FFI There are a number of different potential ways to use the FFI. One client may want to develop code in C and load it into Larceny. Another client may want to load native libraries provided by the host operating system, enabling invocation of foreign code from Scheme expressions without developing any C code or even running a C compiler. Larceny's FFI can be used for both of these cases, but many of its facilities target a third client in between the two extremes: a client with a C compiler and the header files and object code for the foreign libraries, but who wishes to avoid writing glue code in C to interface with the libraries. There are four main steps to interacting with foreign code: 1. identifying the space of values manipulated by the foreign code that will also be manipulated in Scheme, 2. describing how to marshal values between foreign and Scheme code, 3. loading library file(s) holding foreign object code, and 4. linking procedures from the loaded library. Step 1 is conceptual, while steps 2 through 4 yield artifacts in Scheme source code. [[FfiForeignValues, The space of foreign values]] ==== The space of foreign values At the machine code level, foreign values are uninterpreted sequences of bits. Often foreign object code is oriented around manipulating word-sized bit-sequences ('words') or arrays and tuples of words. Many libraries are written with a particular interpretation of such values. In C code, explicit types are often used hints to guide such interpretation; for example, a +0+ of type +bool+ is usually interpreted as 'false', while a +1+ (or other non-zero value) of type +bool+ is usually interpreted as 'true'. Another example are C enumerations (or 'enums'). An enum declaration defines a set of named integral constants. After the C declaration: ------------------------------------------------------------------------------- enum months { JAN = 1, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC }; ------------------------------------------------------------------------------- a +JAN+ in C code now denotes +1+, +FEB+ is +2+, and so on. Furthermore, tools like debuggers may render a variable +x+ dynamically assigned the value +2+ (and of static type +enum months+) as +FEB+. Thus the enum declaration intoduces a new interpretation for a finite set of integers. This leads to questions for a client of an FFI; we explore some below. - Should foreign words be passed over to the Scheme world as uninterpreted numbers (and thus be converted into Scheme integers, usually fixnums), or should they be marshaled into interpreted values, such as +#f+ and +#t+ for the +bool+ type, or the Scheme symbols {+JAN+, +FEB+, +MAR+, +APR+, +MAY+, +JUN+, +JUL+, +AUG+, +SEP+, +OCT+, +NOV+, +DEC+} for the +enum months+ type? - Similarly, how should Scheme values be marshaled into foreign words? - A foreign library might leave the mapping of names like +FEB+ to words like +2+ 'unspecified' in the library interface. That is, while the C compiler will know +FEB+ maps to +2+ according to a particular version of the library's header file, the library designer may intend to change this mapping in the future, and clients writing C code should 'only' use the names to refer to a +enum months+ value, and 'not' integer expressions. * How should this constraint be handled in the FFI; should the library client revise their code in reaction to such changes to the mapping? * Or should the system derive the mapping from the header files, in the same manner that the C compiler does? - Foreign libraries often manipulate mutable entities, like arrays of words where modifications can be observed (often by design). * How should such values be marshaled? * Is it sound to copy such values to the Scheme heap? If so, is a shallow copy sufficient? - Will the foreign code hold references to heap-allocated objects? Heap-allocated objects that 'leak' out to foreign memory must be treated with care; garbage collection presents two main problems. * First, such objects must not move during a garbage collection; Larceny supports this via special-purpose allocation routines: +cons-nonrelocatable+, +make-nonrelocatable-bytevector+, and +make-nonrelocatable-vector+. //////////////////////////////////////////////////////////////////// +syscall:make-nonrelocatable+ //////////////////////////////////////////////////////////////////// * Second, the garbage collector must know to hold on to (i.e. trace) such values as long as they are needed by foreign code; otherwise the objects or their referents may be collected without the knowledge of the foreign code. Answering these questions may require deep knowledge of the intended usage of the foreign library. The Larceny FFI attempts to ease interfacing with foreign code in the presence of the above concerns, but the nature of the header files included with most foreign libraries means that the FFI cannot infer the answers unassisted. [NOTE] ================================================================ Foreign C code developed to work in concert with Larceny could hypothetically be written to cope with holding handles for objects managed by the the garbage collector, but there is currently no significant support for this use-case. ================================================================ [NOTE] ================================================================ One class of foreign values is not addressed by the Larceny FFI: structures passed by value (as opposed to by reference, ie pointers to structures). There is no way to describe the interface to a foreign procedure that accepts or produces a C +struct+ (at least not properly nor portably). This tends to not matter for many foreign libraries (since many C programmers eschew passing structures by value), but it can arise. If the foreign library of interest has procedures that accept or produce a C +struct+, we currently recommend either avoiding such procedures, or writing adapter code in C that marshals between values handled by the FFI and the C +struct+. ================================================================ The conclusion is: when designing an interface to a foreign library, you should analyze the values manipulated on the foreign side and identify their relationship with values on the Scheme side. After you have identified the domains of interest, you then describe how the values will be marshaled back and forth between the two domains. [[FfiMarshalling, Marshalling via ffi-attributes]] ==== Marshalling via ffi-attributes This section describes the marshalling protocol defined in +lib/Base/std-ffi.sch+. Foreign functions automatically marshal their inputs and outputs according to type-descriptors attached to each foreign function. Type-descriptors are S-expressons formed according to the following grammar: ................................................................ TypeDesc ::= CoreAttr | ArrowT | MaybeT | OneOfT CoreAttr ::= PrimAttr | VoidStar | --- PrimAttr ::= CurrentPrimAttr | DeprecatedPrimAttr CurrentPrimAttr ::= int | uint | byte | short | ushort | char | uchar | long | ulong | longlong | ulonglong | size_t | float | double | bool | string | void DeprecatedPrimAttr ::= unsigned | boxed VoidStar ::= void* | --- ArrowT ::= (-> (TypeDesc ...) TypeDesc) MaybeT ::= (maybe TypeDesc) OneOfT ::= (oneof (Any Fixnum) ... TypeDesc) ................................................................ where +---+ represents a user-extensible part of the grammar (see below), +Any+ represents any Scheme value, and +Fixnum+ represents any word-sized integer. A central registry maps +CoreAttr+'s to a foreign representation and two conversion routines: one to convert a Scheme value to a foreign argument, and another to convert a foreign result back back to a Scheme value. The denoted components are collectively referred to as a _type_ within the FFI documentation. The registry is extensible; the +ffi-add-attribute-core-entry!+ procedure adds new +CoreAttr's+ to the registry, and one can alternatively add short-hands for type-descriptors via the +ffi-add-alias-of-attribute-entry!+ procedure. Finally, one can add new +VoidStar+ productions (subtypes of the +void*+ type-descriptor) via the +ffi-install-void*-subtype+ procedure (defined in the +lib/Standard/foreign-stdlib.sch+ library). //////////////////////////////////////////////////////////////////// For example the symbol +'char+ denotes a type with procedures that convert Scheme characters to a corresponding signed byte representation and back again, while the symbol +'uchar+ denotes similar procedures that convert between Scheme characters and unsigned bytes. //////////////////////////////////////////////////////////////////// ===== Primitive Attribute Types The following is a list of the accepted types and their conversions at the boundary between Scheme and foreign code: +int+:: Exact integer values in the range [-2^31^,2^31^-1]. Scheme integers in that range are converted to and from C "+int+". +uint+:: Exact integer values in the range [0,2^32^-1]. Scheme integers in that ranges are converted to and from C "+unsigned int+". +byte+:: Synonymous with +int+ in the current implementation. +short+:: Synonymous with +int+ in the current implementation. +ushort+:: Synonymous with +unsigned+ in the current implementation. +char+:: Scheme ASCII characters are converted to and from C "+char+". +uchar+:: Scheme ASCII characters are converted to and from C "+unsigned char+". +long+:: Synonymous with +int+ in the current implementation. +ulong+:: Synonymous with +unsigned+ in the current implementation. +longlong+:: Exact integer values in the range [-2^63^,2^63^-1]. Scheme integers in that range are converted to and from C "+long long+". +ulonglong+:: Exact integer values in the range [0,2^64^-1]. Scheme integers in that range are converted to and from C "+unsigned long long+". +size_t+:: Synonymous with +uint+ in the current implementation. +float+:: Scheme flonums are converted to and from C "+float+". The conversion to +float+ is performed via a C +(float)+ cast from a C +double+. +double+:: Scheme flonums are converted to and from C "double". +bool+:: Scheme objects are converted to C "+int+"; +#f+ is converted to 0, and all other objects to 1. In the reverse direction, 0 is converted to +#f+ and all other integers to +#t+. +string+:: A Scheme string holding ASCII characters is _copied_ into a NUL-terminated bytevector, passing a pointer to its first byte to the foreign procedure; +#f+ is converted to a C "+(char*)0+" value. In the reverse direction, a pointer to a NUL-terminated sequence of bytes interpreted as ASCII characters is copied into a freshly allocated Scheme string; a NULL pointer is converted to +#f+. +void+:: No return value. (Only used in return position for foreign functions; all Scheme procedures passed to the FFI are invoked in a context expecting one value.) +unsigned+:: Synonymous with +uint+; deprecated. +boxed+:: Any heap-allocated data structure (pair, bytevector-like, vector-like, procedure) is converted to a C "`void*`" to the first element of the structure. The value `#f` is also acceptable. It is converted to a C "`(void*)0`" value. (Only used in argument position for foreign functions; foreign functions are not expected to return direct references to heap-allocated values.) ===== Extending the Core Attribute Registry The public interface to many foreign libraries is written in terms of types defined within that foreign library. One can introduce new types to the Larceny FFI by extending the core attribute entry table. proc:ffi-add-attribute-core-entry![args="entry-name rep-sym marshal unmarshal",result="unspecified"] <> extends the internal registry with the new entry specified by its arguments. - _entry-name_ is a symbol (the symbolic type name being introduced to the ffi). - _rep-name_ is a low-level type descriptor symbol, one of +signed32+, +unsigned32+, +signed64+, +unsigned64+ (representing varieties of fixed width integers), +ieee32+ (representing ``floats''), +ieee64+ (representing ``doubles''), or +pointer+ (representing ``+(void*)+'' in C). - _marshal_ is a marshaling function that accepts a Scheme object and a symbol (the name of the invoking procedure); it is responsible for checking the Scheme object's validity and then producing a corresponding instance of the low-level representation. - _unmarshal_ is either +#f+ or an unmarshalling function that accepts an instance of the low-level representation and produces a corresponding Scheme object. //////////////////////////////////////////////////////////////////// Perhaps document ffi-attribute-core-entry ? //////////////////////////////////////////////////////////////////// ===== Attribute Type Constructors Core attributes suffice for linking to simple functions. Constructured FFI attributes express more complex marshaling protocols .Arrow Type Constructors A structured FFI attribute of the form +(-> (_s_1_ ... _s_n_) _s_r_)+ (called an _arrow type_) allows passing functions from Scheme to C and back again. Each of the _s_1_, ..., _s_n_, _s_r_ is an FFI attribute. When an arrow type describes an input to a foreign function, it marshals a Scheme procedure to a C function pointer by generating glue code to hook the two together and marshal values as described by the FFI attributes within the arrow type. Likewise, when an arrow type describes an output from a foreign function, it marshals a C function pointer to a Scheme procedure, again by generating glue code. These two mappings naturally generalize to arbitrary nesting of arrow types, so one can create callbacks that consume callouts, return callouts that consume callbacks, and so on. [WARNING] ================================================================ The current implementation of arrow types introduces an unnecessary space leak, because none of Larceny's current garbage collectors attempt to reclaim some of the structure allocated (in particular, the so-called trampolines) when functions are marshaled via arrow types. The FFI could be revised to reduce the leak (e.g. it could keep a cache of generated trampolines and reuse them, but currently do not do so). Many foreign libraries have a structure where one only sets up a fixed set of callbacks, and then all further computation does not require arrow type marshaling. This is one reason why fixing this problem has been a low priority item for the Larceny development team. ================================================================ .Maybe Type Constructor +(maybe _t_)+ captures the pattern of passing +NULL+ in C and +#f+ in Scheme to represent the absence of information. The FFI attribute _t_ within the maybe type describes the typical information passed; the constructed maybe type marshals +#f+ to the foreign null pointer or +0+ (as appropriate), and otherwise applies the marshaling of _t_. Likewise, it unmarshals the foreign null pointer and +0+ to +#f+, and otherwise applies the unmarshaling of _t_. (There are a few other built-in type constructors, such as the +oneof+ type constructor, but they are not as fully-developed as the two above, and are intended for use only for internal development for now.) ===== void* Type Hierarchies Using the +void*+ attribute wraps foreign addresses up in a Larceny record, so that standard numeric operations cannot be directly applied by accident. The FFI uses two features of Larceny's record system: the record type descriptor is a first class value with an inspectable name, and record types are extensible via single-inheritance. .Basic Operations on +void*+ The FFI provides +void*-rt+, a record type descriptor with a single field (a wrapped address). There is also a family of functions for dereferencing the pointer within a +void*-rt+ and manipulating the state it references. proc:void*->address[args="x",result="number"] Extracts the underlying address held in a +void*+. proc:void*?[args="x",result="boolean"] Distinquishes +void*+'s from other Scheme values. proc:void*-byte-ref[args="x idx",result="number"] Extracts byte at offset from address within 'x'. proc:void*-byte-set![args="x idx val",result="unspecified"] Modifies byte at offset from address within 'x'. proc:void*-word-ref[args="x idx",result="number"] Extracts word-sized integer at offset from address within 'x'. proc:void*-word-set![args="x idx val",result="unspecified"] Modifies word-sized integer at offset from address within 'x'. proc:void*-void*-ref[args="x idx",result="void*"] Extracts address (and wraps it in a +void*+) at offset from address within 'x'. proc:void*-void*-set![args="x idx val",result="unspecified"] Modifies address at offset from address within 'x'. proc:void*-double-ref[args="x idx",result="number"] Extracts 64-bit flonum at offset from address within 'x'. proc:void*-double-set![args="x idx val",result="unspecified"] Modifies 64-bit flonum at offset from address within 'x'. .Type Hierarchies Procedures for establishing type hierarchies are provided by the +lib/Standard/foreign-stdlib.sch+ library; see <> and <>. [[FfiCompiliing, Creating loadable modules]] ==== Creating loadable modules You must first compile your C code and create one or more loadable object modules. These object modules may then be loaded into Larceny, and Scheme foreign functions may link to specific functions in the loaded module. Defining foreign functions in Scheme is covered in a later section. The method for creating a loadable object module varies from platform to platform. In the following, assume you have to C source files file1.c and file2.c that define functions that you want to make available as foreign functions in Larceny. ===== SunOS 4 Compile your source files and create a shared library. Using GCC, the command line might look like this: gcc -fPIC -shared file1.c file2.c -o my-library.so The command creates my-library.so in the current directory. This library can now be loaded into Larceny using <>. Any other shared libraries used by your library files should also be loaded into Larceny using <> before any procedures are linked using <>. By default, /lib/libc.so is made available to the dynamic linker and to the foreign function interface, so there is no need for you to load that library explicitly. ===== SunOS 5 Compile your source files and create a shared library, linking with all the necessary libraries. Using GCC, the command line might look like this: gcc -fPIC -shared file1.c file2.c -lc -lm -lsocket -o my-library.so Now you can use foreign-file to load my-library.so into Larceny. By default, /lib/libc.so is made available to the foreign function interface, so there is no need for you to load that library explicitly. [[FfiInterface, Loading and linking foreign functions]] ==== The Interface ===== Procedures proc:foreign-file[args="filename",result="unspecified"] <> loads the named object file into Larceny and makes it available for dynamic linking. Larceny uses the operating system provided dynamic linker to do dynamic linking. The operation of the dynamic linker varies from platform to platform: * On some versions of SunOS 4, if the linker is given a file that does not exist, it will terminate the process. (Most likely this is a bug.) This means you should never call foreign-file with the name of a file that does not exist. * On SunOS 5, if a foreign file is given to foreign-file without a directory specification, then the dynamic linker will search its load path (the `LD_LIBRARY_PATH` environment variable) for the file. Hence, a foreign file in the current directory should be "./file.so", not "file.so". proc:foreign-procedure[args="name (arg-type ...) return-type",result="unspecified"] FIXME: The interface to this function has been extended to support hooking into Windows procedures that use the Pascal calling convention instead of the C one. The way to select which convention to use should be documented. Returns a Scheme procedure _p_ that calls the foreign procedure whose name is _name_. When _p_ is called, it will convert its parameters to representations indicated by the __arg-type__s and invoke the foreign procedure, passing the converted values as parameters. When the foreign procedure returns, its return value is converted to a Scheme value according to _return-type_. Types are described below. The address of the foreign procedure is obtained by searching for _name_ in the symbol tables of the foreign files that have been loaded with _foreign-file_. proc:foreign-null-pointer[args="",result="integer"] Returns a foreign null pointer. proc:foreign-null-pointer?[args="integer",result="boolean"] Tests whether its argument is a foreign null pointer. [[FfiAccess, Foreign data access]] ==== Foreign Data Access ===== Raw memory access The two primitives _peek-bytes_ and _poke-bytes_ are provided for reading and writing memory at specific addresses. These procedures are typically used for copying data from foreign data structures into Scheme bytevectors for subsequent decoding. (The use of _peek-bytes_ and _poke-bytes_ can often be avoided by keeping foreign data in a Scheme bytevector and passing the bytevector to a call-out using the **boxed** parameter type. However, this technique is inappropriate if the foreign code retains a pointer to the Scheme datum, which may be moved by the garbage collector.) proc:peek-bytes[args="addr bytevector count",result="unspecified"] _Addr_ must be an exact nonnegative integer. _Count_ must be a fixnum. The bytes in the range from _addr_ through _addr+count-1_ are copied into _bytevector_, which must be long enough to hold that many bytes. If any address in the range is not an address accessible to the process, unpredictable things may happen. Typically, you'll get a segmentation fault. Larceny does not yet catch segmentation faults. proc:poke-bytes[args="addr bytevector count",result="unspecified"] _Addr_ must be an exact nonnegative integer. _Count_ must be a fixnum. The _count_ first bytes from _bytevector_ are copied into memory in the range from _addr_ through _addr+count-1_. If any address in the range is not an address accessible to the process, unpredictable things may happen. Typically, you'll get a segmentation fault. Larceny does not yet catch segmentation faults. Also, it's possible to corrupt memory with _poke-bytes_. Don't do that. ===== Foreign data sizes The following variables constants define the sizes of basic C data types: * **sizeof:short** The size of a "short int". * **sizeof:int** The size of an "int". * **sizeof:long** The size of a "long int". * **sizeof:pointer** The size of any pointer type. ===== Decoding foreign data Foreign data is visible to a Scheme program either as an object pointed to by a memory address (which is itself represented as an integer), or as a bytevector that contains the bytes of the foreign datum. A number of utility procedures that make reading and writing data of common C primitive types have been written for both these kinds of foreign objects. _Bytevector accessor procedures_ proctempl:%get16[args="bv i",result="integer"] proctempl:%get16u[args="bv i",result="integer"] proctempl:%get32[args="bv i",result="integer"] proctempl:%get32u[args="bv i",result="integer"] proctempl:%get-int[args="bv i",result="integer"] proctempl:%get-unsigned[args="bv i",result="integer"] proctempl:%get-short[args="bv i",result="integer"] proctempl:%get-ushort[args="bv i",result="integer"] proctempl:%get-long[args="bv i",result="integer"] proctempl:%get-ulong[args="bv i",result="integer"] proctempl:%get-pointer[args="bv i",result="integer"] These procedures decode bytevectors that contain the bytes of foreign objects. In each case, _bv_ is a bytevector and _i_ is the offset of the first byte of a field in that bytevector. The field is fetched and returned as an integer (signed or unsigned as appropriate). _Bytevector updater procedures_ proctempl:%set16[args="bv i val",result="unspecified"] proctempl:%set16u[args="bv i val",result="unspecified"] proctempl:%set32[args="bv i val",result="unspecified"] proctempl:%set32u[args="bv i val",result="unspecified"] proctempl:%set-int[args="bv i val",result="unspecified"] proctempl:%set-unsigned[args="bv i val",result="unspecified"] proctempl:%set-short[args="bv i val",result="unspecified"] proctempl:%set-ushort[args="bv i val",result="unspecified"] proctempl:%set-long[args="bv i val",result="unspecified"] proctempl:%set-ulong[args="bv i val",result="unspecified"] proctempl:%set-pointer[args="bv i val",result="unspecified"] These procedures update bytevectors that contain the bytes of foreign objects. In each case, _bv_ is a bytevector, _i_ is an offset of the first byte of a field in that bytevector, and _val_ is a value to be stored in that field. The values must be exact integers in a range implied by the data type. _Foreign-pointer accessor procedures_ proctempl:%peek8[args="addr",result="integer"] proctempl:%peek8u[args="addr",result="integer"] proctempl:%peek16[args="addr",result="integer"] proctempl:%peek16u[args="addr",result="integer"] proctempl:%peek32[args="addr",result="integer"] proctempl:%peek32u[args="addr",result="integer"] proctempl:%peek-int[args="addr",result="integer"] proctempl:%peek-long[args="addr",result="integer"] proctempl:%peek-unsigned[args="addr",result="integer"] proctempl:%peek-ulong[args="addr",result="integer"] proctempl:%peek-short[args="addr",result="integer"] proctempl:%peek-ushort[args="addr",result="integer"] proctempl:%peek-pointer[args="addr",result="integer"] proctempl:%peek-string[args="addr",result="integer"] These procedures read raw memory. In each case, _addr_ is an address, and the value stored at that address (the size of which is indicated by the name of the procedure) is fetched and returned as an integer. _%Peek-string_ expects to find a NUL-terminated string of 8-bit bytes at the given address. It is returned as a Scheme string. _Foreign-pointer updater procedures_ proctempl:%poke8[args="addr val",result="unspecified"] proctempl:%poke8u[args="addr val",result="unspecified"] proctempl:%poke16[args="addr val",result="unspecified"] proctempl:%poke16u[args="addr val",result="unspecified"] proctempl:%poke32[args="addr val",result="unspecified"] proctempl:%poke32u[args="addr val",result="unspecified"] proctempl:%poke-int[args="addr val",result="unspecified"] proctempl:%poke-long[args="addr val",result="unspecified"] proctempl:%poke-unsigned[args="addr val",result="unspecified"] proctempl:%poke-ulong[args="addr val",result="unspecified"] proctempl:%poke-short[args="addr val",result="unspecified"] proctempl:%poke-ushort[args="addr val",result="unspecified"] proctempl:%poke-pointer[args="addr val",result="unspecified"] These procedures update raw memory. In each case, _addr_ is an address, and _val_ is a value to be stored at that address. [[FfiDumping, Heap dumping and the FFI]] ==== Heap dumping and the FFI If foreign functions are linked into Larceny using the FFI, and a Larceny heap image is subsequently dumped (with <> or <>), then the foreign functions are not saved as part of the heap image. When the heap image is subsequently loaded into Larceny at startup, the FFI will attempt to re-link all the foreign functions in the heap image. During the relinking phase, foreign files will again be loaded into Larceny, and Larceny's FFI will use the file names _as they were originally given to the FFI_ when it tries to load the files. In particular, if relative pathnames were used, Larceny will not have converted them to absolute pathnames. An error during relinking will result in Larceny aborting with an error message and returning to the operating system. This is considered a feature. [[FfiExamples, Examples]] ==== Examples ===== Change directory This procedure uses the chdir() system call to set the process's current working directory. The string parameter type is used to pass a Scheme string to the C procedure. (define cd (let ((chdir (foreign-procedure "chdir" '(string) 'int))) (lambda (newdir) (if (not (zero? (chdir newdir))) (error "cd: " newdir " is not a valid directory name.")) (unspecified)))) ===== Print Working Directory This procedure uses the getcwd() (get current working directory) system call to retrieve the name of the process's current working directory. A bytevector is created and passed in as a buffer in which to store the return value -- a 0-terminated ASCII string. Then the FFI utility function ffi/asciiz->string is called to convert the bytevector to a string. (define pwd (let ((getcwd (foreign-procedure "getcwd" '(boxed int) 'int))) (lambda () (let ((s (make-bytevector 1024))) (getcwd s 1024) (ffi/asciiz->string s))))) ===== Quicksort WARNING: this example is bogus. It is not safe to pass a collectable object into a C procedure when the callback invocation might cause a garbage collection, thus moving the object and invalidating the address stored in the C machine context. This demonstrates how to use a callback such as the comparator argument to qsort. It is specified in the type signature using -> as a type constructor. (Note that one should probably use the built-in sort routines rather than call out like this; this example is for demonstrating callbacks, not how to sort.) (define qsort! (foreign-procedure "qsort" '(boxed ushort ushort (-> (void* void*) int)) 'void)) (let ((bv (list->vector '(40 10 30 20 1 2 3 4)))) (qsort! bv 8 4 (lambda (x y) (let ((x (/ (void*-word-ref x 0) 4)) (y (/ (void*-word-ref y 0) 4))) (- x y)))) bv) (let ((bv (list->bytevector '(40 10 30 20 1 2 3 4)))) (qsort! bv 8 1 (lambda (x y) (let ((x (void*-byte-ref x 0)) (y (void*-byte-ref y 0))) (- x y)))) bv) ===== Other examples The Experimental directory contains several examples of use of the FFI. See in particular the files unix.sch (Unix system calls) and socket.sch (procedures for communicating over sockets). ==== Higher level layers The general foreign-function interface functionality described above is powerful but awkward to use in practice. A user might be tempted to hard code values of offsets or constants that are compiler dependent. Also, the FFI will marshall some low-level values such as strings or integers, but other values such as enumerations which could be naturally mapped to sets of symbols are not marshalled since the host environment does not provide the necessary type information to the FFI. This section documents a collection of libraries to mitigate these and other problems. ===== foreign-ctools Foreign data access is performed by peeking at manually calculated addresses, but in practice one often needs to inspect fields of C structures, whose offsets are dependant on the application binary interface (ABI) of the host environment. Similarly, C programs often use refer to values via constant macro definitions; since the values of such names are not provided by the object code and Scheme programs do not have a C preprocessor run on them prior to execution, it is difficult to refer to the same value without encoding "magic numbers" into the Scheme source code. The foreign-ctools library is meant to mitigate problems like the two described above. It provides special forms for introducing global definitions of values typically available at compile-time for a C program. The library assumes the presence of a C compiler (such as _cc_ on Unix systems or _cl.exe_ on Windows systems). The special forms work by dynamically generating, compiling, and running C code at expansion time to determine the desired values of structure offsets or macro constants. Here is a grammar for the +define-c-info+ form provided by the +foreign-ctools+ library. ................................................................ ::= (define-c-info ... ...) ::= (compiler ) | (path ) | (include
) | (include<>
) ::= cc | cl ::= (const ) | (sizeof ) | (struct ...) | (fields ...) | (ifdefconst ) ::= int | uint | long | ulong ::=
::= ::= ( ) | ( ) ::= ::= ::= ::= ................................................................ _Syntax define-c-info_ ++ (define-c-info ... ...)++ The ++ clauses of +define-c-info+ control how header files are processed. The +compiler+ clause selects between +cc+ (the default UNIX system compiler) and +cl+ (the compiler included with Microsoft's Windows SDK). The +path+ clause adds a directory to search when looking for header files. The +include+ and +include<>+ clauses indicate header files to include when executing the ++ clauses; the two variants correspond to the quoted and bracketed forms of the C preprocessor's +#include+ directive. ////////////////////////////////////////// All of the ++ variants are optional. ////////////////////////////////////////// The ++ clauses bind identifiers. A +(const _x_ _t_ "_ae_")+ clause binds _x_ to the integer value of _ae_ according to the C language; _ae_ can be any C arithmetic expression that evaluates to a value of type _t_. (The expected usage is for _ae_ to be an expression that the C preprocessor expands to an arithmetic expression.) The remaining clauses provide similar functionality: - +(sizeof _x_ "_te_")+ binds _x_ to the size occupied by values of type _te_, where _te_ is any C type expression. - +(struct "_cn_" ... (_x_ "_cf_" _y_) ...)+ binds _x_ to the offset from the start of a structure of type +struct _cn_+ to its _cf_ field, and binds _y_, if present, to the field's size. A +fields+ clause is similar, but it applies to structures of type +_cn_+ rather than +struct _cn_+. - +(ifdefconst _x_ _t_ "_cn_")+ binds _x_ to the value of +_cn_+ if +_cn_+ is defined; _x_ is otherwise bound to Larceny's unspecified value. ===== foreign-sugar The <> function is sufficient to link in dynamically loaded C procedures, but it can be annoying to use when there are many procedures to define that all follow a regular pattern where one could infer a mapping between Scheme identifiers and C function names. For example, some libraries follow a naming convention where a words within a name are separated by underscores; such functions could be immediately mapped to Scheme names where the underscores have been replaced by dashes. The foreign-sugar library provides a special form, ++define-foreign++, which gives the user a syntax for defining foreign functions using a syntax where one provides only the Scheme name, the argument types, and the return type. The ++define-foreign++ form then attempts to infer what C function the name was meant to refer to. _Syntax define-foreign_ ++ (define-foreign (name arg-type ...) result-type)++ NOTE: There is other functionality provided allowing the user to introduce new rules for inferring C function names, but they are undocumented because they will probably have to change when we switch to an R6RS macro expander. ===== foreign-stdlib proc:stdlib/malloc[args="rtd",optarg="ctor",result="procedure"] Given a record extension of _void*-rt_, returns an allocator that uses the C ++malloc++ procedure to allocate instances of such an object. Note that the client is responsible for eventually freeing such objects with <>. proc:stdlib/free[args="void*-obj"] Frees objects produced by allocators returned from <>. proc:ffi-install-void*-subtype[var="ffi-install-void*-subtype"] proctempl:ffi-install-void*-subtype[args="rtd",result="rtd"] proctempl:ffi-install-void*-subtype[args="string",optarg="parent-rtd",result="rtd"] proctempl:ffi-install-void*-subtype[args="symbol",optarg="parent-rtd",result="rtd"] <> extends the core attribute registry with a new primitive entry for _subtype_. The _parent-rtd_ argument should be a subtype of +void*-rt+ and defaults to +void*-rt+. In the case of the _symbol_ or _string_ inputs, the procedure constructs a new record type subtyping the _parent_ argument. In the case of the _rtd_ input, the _rtd_ record type must extend +void*-rt+. <> returns the subtype record type. The returned record type represents a tagged wrapped C pointer, allowing one to encode type hierarchies. proc:establish-void*-subhierarchy![args="symbol-tree",result="unspecified"] <> is a convenience function for constructing large object hierarchies. It descends the _symbol-tree_, creates a record type descriptor for each symbol (where the root of the tree has the parent +void*-rt+), and invokes <> on all of the introduced types. _Type char*_ extends _void*_ proc:string->char*[args="string",result="char*"] proc:char*-strlen[args="char*",result="fixnum"] proc:char*->string[args="char*",result="string"] proctempl:char*->string[args="char* len",result="string"] proc:CallWithCharStar[var="call-with-char*",args="string string-function",result="value"] _Type char\*\*_ extends _void*_ proc:CallWithCharStarStar[var="call-with-char\*\*",args="string-vector function",result="value"] _Type int*_ extends _void*_ proc:CallWithIntStar[var="call-with-int*",args="fixnum-vector function",result="value"] _Type short*_ extends _void*_ proc:CallWithShortStar[var="call-with-short*",args="fixnum-vector function",result="value"] _Type double*_ extends _void*_ proc:CallWithDoubleStar[var="call-with-double*",args="num-vector function",result="value"] FIXME: (There are other functions, but I want to test and document the ones above first...) ===== foreign-cstructs The +foreign-cstructs+ library provides a more direct interface to C structures. It provides the +define-c-struct+ special form. This form is layered on top of +define-c-info+; the latter provides the structure field offsets and sizes used to generate constructors (which produce appropriately sized bytevectors, not record instances). The +define-c-struct+ form combines these with marshaling and unmarshaling procedures to provide high-level access to a structure. The grammar for the +define-c-struct+ form is presented below. ..................................................................... ::= (define-c-struct ( ...) ...) ::= ( ) | ( ) ::= () | ( ) ::= () | ( ) ::= | ::= | ::= ..................................................................... ===== foreign-cenums This library provides the special forms ++define-c-enum++ and ++define-c-enum-set++, which associate the identifiers of a C +enum+ type declaration with the integer values they denote. The +define-c-enum+ form describes enums encoding a discriminated sum; +define-c-enum-set+ describes bitmasks, mapping them to R^6^RS enum-sets in Scheme. The +(define-c-enum _en_ ( ...) (_x_ "_cn_") ...)+ form adds the +_en_+ FFI attribute. The attribute marshals each symbol +_x_+ to the integer value that +_cn_+ denotes in C; unmarshaling does the inverse translation. The +(define-c-enum-set _ens_ ( ...) (_x_ "_cn_") ...)+ form binds _ens_ to an R^6^RS enum-set constructor with universe resulting from +(make-enumeration '(_x_ ...))+; it also adds the +_ens_+ FFI attribute. The attribute marshals an enum-set _s_ constructed by _ens_ to the corresponding bitmask in C (that is, the integer one would get by logically or'ing all _cn_ such that the corresponding _x_ is in _s_). Unmarshaling attempts to do the inverse translation. ////////////////////////////////////////////////////////// The inverse uniquely exists when the high-to-low mapping is a bijection, which depends on the denotations of _cn_ ... assigned by the header files. ////////////////////////////////////////////////////////// The grammar for the two forms is presented below. ................................................................ ::= (define-c-enum ( ...) ( ) ...) ::= (define-c-enum-set ( ...) ( ) ...) ::= ................................................................