The procedures described in this chapter are nonstandard.
Some are deprecated after being rendered obsolete by ERR5RS
or R6RS standard libraries.
Others still provide useful capabilities that the standard
libraries don't.
10.1. Strings
Larceny provides Unicode strings with
R6RS
semantics.
The string-downcase and string-upcase procedures
perform Unicode-compatible case folding, which can result
in a string whose length is different from that of the original.
Larceny may still provide string-downcase! and string-upcase!
procedures, but they are deprecated.
10.2. Bytevectors
A bytevector is a data structure that stores bytes — exact
8-bit unsigned integers. Bytevectors are useful in constructing
system interfaces and other low-level programming. In Larceny,
many bytevector-like structures — bignums, for example —
are implemented in terms of a
lower-level bytevector-like data type. The operations on
generic bytevector-like structures are particularly fast but
useful largely in code that manipulates Larceny's data
representations.
The (rnrs bytevectors) library now
provides a large set of procedures that, in Larceny, are
defined using the procedures described below.
Integrable procedure make-bytevector
(make-bytevector length) => bytevector
(make-bytevector length fill) => bytevector
Returns a bytevector of the desired length.
If no second argument is given, then the bytevector has not
been initialized and most likely contains garbage.
Operations on bytevector structures
(bytevector? obj) => boolean
(bytevector-length bytevector) => integer
(bytevector-ref bytevector offset) => byte
(bytevector-set! bytevector offset byte) => unspecified
(bytevector-equal? bytevector1 bytevector2) => boolean
(bytevector-fill! bytevector byte) => unspecified
(bytevector-copy bytevector) => bytevector
These procedures do what you expect.
All are integrable, except bytevector-equal? and bytevector-copy.
The bytevector-equal? name is deprecated, since the
R6RS calls it bytevector=?.
Operations on bytevector-like structures
(bytevector-like? obj) => boolean
(bytevector-like-length bytevector) => integer
(bytevector-like-ref bytevector offset) => byte
(bytevector-like-set! bytevector offset byte) => unspecified
(bytevector-like-equal? bytevector1 bytevector2) => boolean
(bytevector-like-copy bytevector) => bytevector
A bytevector-like structure is a low-level representation
for indexed arrays of uninterpreted bytes. Bytevector-like
structures are used to represent types such as bignums and
flonums.
There is no way to construct a "generic" bytevector-like
structure; use the constructors for specific bytevector-like
types.
The bytevector-like operations operate on all bytevector-like
structures. All are integrable, except bytevector-like-equal?
and bytevector-like-copy. All are deprecated because they
violate abstraction barriers and make your code
representation-dependent; they are useful mainly to
Larceny developers, who might otherwise be tempted to
write some low-level operations in C or assembly language.
10.3. Vectors
Procedure vector-copy
(vector-copy vector) => vector
Returns a shallow copy of its argument.
Operations on vector-like structures
(vector-like? object) => boolean
(vector-like-length vector-like) => fixnum
(vector-like-ref vector-like k) => object
(vector-like-set! vector-like k object) => unspecified
A vector-like structure is a low-level representation
for indexed arrays of Scheme objects. Vector-like
structures are used to represent types such as vectors,
records, symbols, and ports.
There is no way to construct a "generic" vector-like structure;
use the constructors for specific data types.
The vector-like operations operate on all vector-like structures.
All are integrable.
All are deprecated because they
violate abstraction barriers and make your code
representation-dependent; they are useful mainly to
Larceny developers, who might otherwise be tempted to
write some low-level operations in C or assembly language.
10.4. Procedures
Operations on procedures
(make-procedure length) => procedure
(procedure-length procedure) => fixnum
(procedure-ref procedure offset) => object
(procedure-set! procedure offset object) => unspecified
These procedures operate on the representations of procedures and
allow user programs to construct, inspect, and alter procedures.
Procedure procedure-copy
(procedure-copy procedure) => procedure
Returns a shallow copy of the procedure.
The procedures above are deprecated because they
violate abstraction barriers and make your code
representation-dependent; they are useful mainly to
Larceny developers, who might otherwise be tempted to
write some low-level operations in C or assembly language.
The rest of this section describes some procedures that
reach through abstraction barriers in a more controlled way
to extract heuristic information from procedures for debugging
purposes.
Note
|
The following
text is copied from a straw proposal authored by Will Clinger and sent
to rrr-authors on 09 May 1996. The text has been edited lightly. See
the end for notes about the Larceny implementation.
|
The procedures that extract heuristic information from procedures are
permitted to return any result whatsoever. If the type of a result is
not among those listed below, then the result represents an
implementation-dependent extension to this interface, which may safely
be interpreted as though no information were available from the
procedure. Otherwise the result is to be interpreted as described
below.
Procedure procedure-arity
(procedure-arity proc)
Returns information about the arity of proc. If the result is #f,
then no information is available. If the result is an exact
non-negative integer k, then proc requires exactly k
arguments. If the result is an inexact non-negative integer n, then
proc requires n or more arguments. If the result is a pair, then
it is a list of non-negative integers, each of which indicates a
number of arguments that will be accepted by proc; the list is not
necessarily exhaustive.
Procedure procedure-documentation-string
(procedure-documentation-string proc)
Returns general information about proc. If the result is #f, then no
information is available. If the result is a string, then it is to be
interpreted as a "documentation string" (see Common Lisp).
Procedure procedure-name
(procedure-name proc)
Returns information about the name of proc. If the result is #f,
then no information is available. If the result is a symbol or string,
then it represents a name. If the result is a pair, then it is a list
of symbols and/or strings representing a path of names; the first
element represents an outer name and the last element represents an
inner name.
Procedure procedure-source-file
(procedure-source-file proc)
Returns information about the name of a file that contains the source
code for proc. If the result is #f, then no information is
available. If the result is a string, then the string is the name of a
file.
Procedure procedure-source-position
(procedure-source-position proc)
Returns information about the position of the source code for proc
whithin the source file specified by procedure-source-file. If the
result is #f, then no information is available. If the result is an
exact integer k, then k characters precede the opening parenthesis
of the source code for proc within that source file.
Procedure procedure-expression
(procedure-expression proc)
Returns information about the source code for proc. If the result is
#f, then no information is available. If the result is a pair, then it
is a lambda expression in the traditional representation of a list.
Procedure procedure-environment
(procedure-environment proc)
Returns information about the environment of proc. If the result is
#f, then no information is available. In any case the result may be
passed to any of the environment inquiry functions.
Notes on the Larceny implementation
Twobit does not yet produce data for all of these functions, so some
of them always return #f.
10.5. Pairs and Lists
The (rnrs lists) library now
provides a set of procedures that may supersede some
of the procedures described below.
If one of Larceny's procedures duplicates the semantics of
an R6RS procedure whose name is different, then Larceny's
name is deprecated.
Procedure append!
(append! list1 list2 … obj) => object
append! destructively appends its arguments, which must be lists, and
returns the resulting list. The last argument can be any object. The
argument lists are appended by changing the cdr of the last pair of
each argument except the last to point to the next argument.
Procedure every?
(every? procedure list1 list2 …) => object
every? applies procedure to each element tuple of list_s in
first-to-last order, and returns #f as soon as _procedure returns
#f. If procedure does not return #f for any element tuple of
list_s, then the value returned by _procedure for the last element
tuple of _list_s is returned.
Procedure last-pair
(last-pair list-structure) => pair
last-pair returns the last pair of the list structure, which must be
a sequence of pairs linked through the cdr fields.
Procedure list-copy
(list-copy list-copy) => list
list-copy makes a shallow copy of the list and returns that copy.
Procedure remove
(remove key list) => list
Procedure remq
(remq key list) => list
Procedure remv
(remv key list) => list
Procedure remp
(remp pred? list) => list
Each of these procedures returns a new list which contains all the
elements of list in the original order, except that those elements of
the original list that were equal to key (or that satisfy pred?) are
not in the new list. Remove uses equal? as the equivalence predicate;
remq uses eq?, and remv uses eqv?.
Procedure remove!
(remove! key list) => list
Procedure remq!
(remq! key list) => list
Procedure remv!
(remv! key list) => list
Procedure remp!
(remp! pred? list) => list
These procedures are like remove, remq, remv, and remp,
except they modify list instead of returning a fresh list.
Procedure reverse!
(reverse! list) => list
reverse! destructively reverses its argument and returns the reversed
list.
Procedure some?
(some? procedure list1 list2 …) => object
some? applies procedure to each element tuple of list_s in
first-to-last order, and returns the first non-false value returned by
_procedure. If procedure does not return a true value for any
element tuple of _list_s, then some? returns #f.
10.6. Sorting
The (rnrs sorting) library now
provides a small set of procedures that supersede most
of the procedures described below.
All of the procedures described below are therefore
deprecated.
Procedures sort and sort!
(sort list less?) => list
(sort vector less?) => vector
(sort! list less?) => list
(sort! vector less?) => vector
These procedures sort their argument (a list or a vector) according to
the predicate less?, which must implement a total order on the
elements in the data structures that are sorted.
sort returns a fresh data structure containing the sorted data;
sort! sorts the data structure in-place.
10.7. Records
Note
|
Larceny's records have been extended to implement all
ERR5RS
and
R6RS
procedures from
(err5rs records procedural)
(err5rs records inspection)
(rnrs records procedural)
(rnrs records inspection)
We recommend that Larceny programmers use the ERR5RS APIs instead
of the R6RS APIs. This should entail no loss of portability, since
the standard reference implementation of ERR5RS records should run
efficiently in any implementation of the R6RS that permits new
libraries to defined at all.
Larceny now has two kinds of records: old-style and ERR5RS/R6RS.
Old-style records cannot be created in R6RS-conforming mode, so
our extension of R6RS procedures to accept old-style records does
not affect R6RS conformance.
|
Note
|
The following specification describes Larceny's old-style record API,
which is now deprecated. It
is based on a proposal posted by Pavel Curtis to
rrrs-authors on 10 Sep 1989, and later re-posted by Norman Adams to
comp.lang.scheme on 5 Feb 1992. The authorship and copyright status of
the original text are unknown to me.
This document differs from the original proposal in that its record
types are extensible, and that it specifies the type of record-type
descriptors.
|
10.7.1. Specification
Procedure make-record-type
(make-record-type type-name field-names)
Returns a "record-type descriptor", a value representing a new data
type, disjoint from all others. The type-name argument must be a
string, but is only used for debugging purposes (such as the printed
representation of a record of the new type). The field-names
argument is a list of symbols naming the "fields" of a record of the
new type. It is an error if the list contains any duplicates.
If the parent-rtd argument is provided, then the new type will be a
subtype of the type represented by parent-rtd, and the field names
of the new type will include all the field names of the parent
type. It is an error if the complete list of field names contains any
duplicates.
Record-type descriptors are themselves records. In particular,
record-type descriptors have a field printer that is either #f or a
procedure. If the value of the field is a procedure, then the
procedure will be called to print records of the type represented by
the record-type descriptor. The procedure must accept two arguments:
the record object to be printed and an output port.
Procedure record-constructor
(record-constructor rtd)
Returns a procedure for constructing new members of the type
represented by rtd. The returned procedure accepts exactly as many
arguments as there are symbols in the given list, field-names; these
are used, in order, as the initial values of those fields in a new
record, which is returned by the constructor procedure. The values of
any fields not named in that list are unspecified. The field-names
argument defaults to the list of field-names in the call to
make-record-type that created the type represented by rtd; if the
field-names argument is provided, it is an error if it contains any
duplicates or any symbols not in the default list.
Procedure record-predicate
(record-predicate rtd)
Returns a procedure for testing membership in the type represented by
rtd. The returned procedure accepts exactly one argument and returns
a true value if the argument is a member of the indicated record type
or one of its subtypes; it returns a false value otherwise.
Procedure record-accessor
(record-accessor rtd field-name)
Returns a procedure for reading the value of a particular field of a
member of the type represented by rtd. The returned procedure
accepts exactly one argument which must be a record of the appropriate
type; it returns the current value of the field named by the symbol
field-name in that record. The symbol field-name must be a member of
the list of field-names in the call to make-record-type that created
the type represented by rtd, or a member of the field-names of the
parent type of the type represented by rtd.
Procedure record-updater
(record-updater rtd field-name)
Returns a procedure for writing the value of a particular field of a
member of the type represented by rtd. The returned procedure
accepts exactly two arguments: first, a record of the appropriate
type, and second, an arbitrary Scheme value; it modifies the field
named by the symbol field-name in that record to contain the given
value. The returned value of the updater procedure is unspecified. The
symbol field-name must be a member of the list of field-names in the
call to make-record-type that created the type represented by rtd,
or a member of the field-names of the parent type of the type
represented by rtd.
(record? obj)
Returns a true value if obj is a record of any type and a false value
otherwise. Note that record? may be true of any Scheme value; of
course, if it returns true for some particular value, then
record-type-descriptor is applicable to that value and returns an
appropriate descriptor.
Procedure record-type-descriptor
(record-type-descriptor record)
Returns a record-type descriptor representing the type of the given
record. That is, for example, if the returned descriptor were passed
to record-predicate, the resulting predicate would return a true value
when passed the given record. Note that it is not necessarily the case
that the returned descriptor is the one that was passed to
record-constructor in the call that created the constructor procedure
that created the given record.
Procedure record-type-name
(record-type-name rtd)
Returns the type-name associated with the type represented by rtd.
The returned value is eqv? to the type-name argument given in the call
to make-record-type that created the type represented by rtd.
Procedure record-type-field-names
(record-type-field-names rtd)
Returns a list of the symbols naming the fields in members of the type
represented by rtd.
Procedure record-type-parent
(record-type-parent rtd)
Returns a record-type descriptor for the parent type of the type
represented by rtd, if that type has a parent type, or a false value
otherwise. The type represented by rtd has a parent type if the call
to make-record-type that created rtd provided the parent-rtd
argument.
Procedure record-type-extends?
(record-type-extends? rtd1 rtd2)
Returns a true value if the type represented by rtd1 is a subtype of
the type represented by rtd2 and a false value otherwise. A type s
is a subtype of a type t if s=t or if the parent type of s, if
it exists, is a subtype of t.
10.7.2. Implementation
The R6RS spouts some tendentious nonsense about procedural
records being slower than syntactic records, but this is not
true of Larceny's records, and is unlikely to be true of other
implementations either.
Larceny's procedural records are fairly efficient already,
and will become even more efficient in future versions as
interlibrary optimizations are added.
10.8. Input, Output, and Files
The (rnrs io ports) and (rnrs files) libraries now
provide a set of procedures that may supersede some
of the procedures described below.
If one of Larceny's procedures duplicates the semantics of
an R6RS procedure whose name is different, then Larceny's
name is deprecated.
Procedure close-open-files
(close-open-files ) => unspecified
Closes all open files.
Procedure console-input-port
(console-input-port ) => input-port
Returns a character input port such that no read from the port has
signalled an error or returned the end-of-file object.
Rationale: console-input-port and console-output-port are artifacts
of Unix interactive I/O conventions, where an interactive end-of-file
does not mean "quit" but rather "done here". Under these conventions
the console port should be reset following an end-of-file. Resetting
conflicts with the semantics of ports in Scheme, so console-input-port
and console-output-port return a new port if the current port is
already at end-of-file.
Since it is convenient to handle errors in the same manner as
end-of-file, these procedures also return a new port if an error has
been signalled during an I/O operation on the port.
Console-input-port and console-output-port simply call the port
generators installed in the parameters console-input-port-factory and
console-output-port-factory, which allow user programs to install
their own console port generators.
Procedure console-output-port
(console-output-port ) => output-port
Returns a character output port such that no write to the port has
signalled an error.
See console-input-port for a full explanation.
Parameter console-input-port-factory
The value of this parameter is a procedure that returns a character
input port such that no read from the port has signalled an error or
returned the end-of-file object.
See console-input-port for a full explanation.
Parameter console-output-port-factory
The value of this parameter is a procedure that returns a character
output port such that no write the port has signalled an error.
See console-input-port for a full explanation.
Parameter current-input-port
The value of this parameter is a character input port.
Parameter current-output-port
The value of this parameter is a character output port.
Procedure delete-file
(delete-file filename) => unspecified
Deletes the named file. No error is signalled if the file does not
exist.
Procedure eof-object
(eof-object ) => end-of-file object
Eof-object returns an end-of-file object.
Procedure file-exists?
(file-exists? filename) => boolean
File-exists? returns #t if the named file exists at the time the
procedure is called.
Procedure file-modification-time
(file-modification-time filename) => vector or #f
File-modification-time returns the time of last modification of the
file as a vector, or #f if the file does not exist. The vector has six
elements: year, month, day, hour, minute, second, all of which are
exact nonnegative integers. The time returned is relative to the local
timezone.
(file-modification-time "larceny") => #(1997 2 6 12 51 13)
(file-modification-time "geekdom") => #f
Procedure flush-output-port
(flush-output-port ) => unspecified
(flush-output-port port) => unspecified
Write any buffered data in the port to the underlying output medium.
Procedure get-output-string
(get-output-string string-output-port) => string
Retrieve the output string from the given string output port.
Procedure open-input-string
(open-input-string string) => input-port
Creates an input port that reads from string. The string may be
shared with the caller. A string input port does not need to be
closed, although closing it will prevent further reads from it.
Procedure open-output-string
(open-output-string ) => output-port
Creates an output port where any output is written to a string. The
accumulated string can be retrieved with
[get-output-string] at any time.
Procedure port?
(port? object) => boolean
Tests whether its argument is a port.
Procedure port-name
(port-name port) => string
Returns the name associated with the port; for file ports, this is the file name.
Procedure port-position
(port-position port) => fixnum
Returns the number of characters that have been read from or written to the port.
Procedure rename-file
(rename-file from to) => unspecified
Renames the file from and gives it the name to. No error is
signalled if from does not exist or to exists.
Procedure reset-output-string
(reset-output-string port) => unspecified
Given a port created with open-output-string, deletes from the
port all the characters that have been output so far.
Procedure with-input-from-port
(with-input-from-port input-port thunk) => object
Calls thunk with current input bound to input-port in the dynamic
extent of thunk. Returns whatever value was returned from thunk.
Procedure with-output-to-port
(with-output-to-port output-port thunk) => object
Calls thunk with current output bound to output-port in the
dynamic extent of thunk. Returns whatever value was returned from
thunk.
10.9. Operating System Interface
Procedure command-line-arguments
(command-line-arguments ) => vector
Returns a vector of strings: the arguments supplied to the program by
the user or the operating system.
Procedure dump-heap
(dump-heap filename procedure) => unspecified
Dump a heap image to the named file that will start up with the
supplied procedure. Before procedure is called, command line
arguments will be parsed and any init procedures registered with
add-init-procedure! will be called.
Note: Currently, heap dumping is only available with the
stop-and-copy collector (-stopcopy command line option), although the
heap image can be used with all the other collectors.
Procedure dump-interactive-heap
(dump-interactive-heap filename) => unspecified
Dump a heap image to the named file that will start up with the
standard read-eval-print loop. Before the read-eval-print loop is
called, command line arguments will be parsed and any init procedures
registered with add-init-procedure!
will be called.
Note: Currently, heap dumping is only available with the
stop-and-copy collector (-stopcopy command line option), although the
heap image can be used with all the other collectors.
Procedure getenv
(getenv key) => string or #f
Returns the operating system environment mapping for the string key,
or #f if there is no mapping for key.
Procedure system
(system command) => status
Send the command to the operating system's command processor and
return the command's exit status, if any. On Unix, command is a
string and status is an exact integer.
10.10. Fixnum primitives
Fixnums are small exact integers that are likely to be
represented without heap
allocation. Larceny never represents a number that can be
represented as a fixnum any other way, so programs that can use
fixnums will do so automatically. However, operations that work only
on fixnums can sometimes be substantially faster than generic
operations, and the following primitives are provided for use in those
programs that need especially good performance.
The (rnrs arithmetic fixnums) library now
provides a large set of procedures that, in Larceny, are
defined using the procedures described below.
If one of Larceny's procedures duplicates the semantics of
an R6RS procedure whose name is different, then Larceny's
name is deprecated.
All arguments to the following procedures must be fixnums.
Procedure fixnum?
(fixnum? obj) => boolean
Returns #t if its argument is a fixnum, and #f otherwise.
Procedure fx+
(fx+ fix1 fix2) => fixnum
Returns the fixnum sum of its arguments. If the result is not
representable as a fixnum, then an error is signalled (unless error
checking has been disabled).
Procedure fx-
Returns the fixnum difference of its arguments. If the result is not
representable as a fixnum, then an error is signalled.
Procedure fx—
(fx— fix1) => fixnum
Returns the fixnum negative of its argument. If the result is not
representable as a fixnum, then an error is signalled.
Procedure fx*
(fx* fix1 fix2) => fixnum
Returns the fixnum product of its arguments. If the result is not
representable as a fixnum, then an error is signalled.
Procedure fx=
(fx= fix1 fix2) => boolean
Returns #t if its arguments are equal, and #f otherwise.
Procedure fx<
(fx< fix1 fix2) => boolean
Returns #t if fix1 is less than fix2, and #f otherwise.
Procedure fx<=
(fx<= fix1 fix2) => boolean
Returns #t if fix1 is less than or equal to fix2, and #f
otherwise.
Procedure fx>
(fx> fix1 fix2) => boolean
Returns #t if fix1 is greater than fix2, and #f otherwise.
Procedure fx>=
(fx>= fix1 fix2) => boolean
Returns #t if fix1 is greater than or equal to fix2, and #f
otherwise.
Procedure fxnegative?
(fxnegative? fix) => boolean
Returns #t if its argument is less than zero, and #f otherwise.
Procedure fxpositive?
(fxpositive? fix) => boolean
Returns #t if its argument is greater than zero, and #f otherwise.
Procedure fxzero?
(fxzero? fix) => boolean
Returns #t if its argument is zero, and #f otherwise.
Procedure fxlogand
(fxlogand fix1 fix2) => fixnum
Returns the bitwise and of its arguments.
Procedure fxlogior
(fxlogior fix1 fix2) => fixnum
Returns the bitwise inclusive or of its arguments.
Procedure fxlognot
(fxlognot fix) => fixnum
Returns the bitwise not of its argument.
Procedure fxlogxor
(fxlogxor fix1 fix2) => fixnum
Returns the bitwise exclusive or of its arguments.
Procedure fxlsh
(fxlsh fix1 fix2) => fixnum
Returns fix1 shifted left fix2 places, shifting in zero bits at
the low end. If the shift count exceeds the number of bits in the
machine's word size, then the results are machine-dependent.
Procedure most-positive-fixnum
(most-positive-fixnum ) => fixnum
Returns the largest representable positive fixnum.
Procedure most-negative-fixnum
(most-negative-fixnum ) => fixnum
Returns the smallest representable negative fixnum.
Procedure fxrsha
(fxrsha fix1 fix2) => fixnum
Returns fix1 shifted right fix2 places, shifting in a copy of the
sign bit at the left end. If the shift count exceeds the number of
bits in the machine's word size, then the results are
machine-dependent.
Procedure fxrshl
(fxrshl fix1 fix2) => fixnum
Returns fix1 shifted right fix2 places, shifting in zero bits at
the high end. If the shift count exceeds the number of bits in the
machine's word size, then the results are machine-dependent.
10.11. Numbers
Larceny has six representations for numbers: fixnums are small,
exact integers; bignums are unlimited-precision exact integers;
ratnums are exact rationals; flonums are inexact rationals;
rectnums are exact complexes; and compnums are inexact complexes.
Number-representation predicates
(fixnum? obj) => boolean
(bignum? obj) => boolean
(ratnum? obj) => boolean
(flonum? obj) => boolean
(rectnum? obj) => boolean
(compnum? obj) => boolean
These predicates test whether an object is a number of a particular
representation and return #t if so, #f if not.
Procedure random
(random limit) => exact integer
Returns a pseudorandom nonnegative exact integer in the range 0
through limit-1.
10.12. Hashtables and hash functions
Hashtables represent finite mappings from keys to values.
If the hash function is a good one, then the value associated
with a key may be looked up in constant time (on the average).
Note
|
The R6RS hashtables library are a big improvement
over Larceny's traditional hash tables, and should be used
instead of the API described below.
|
Note
|
To resolve a clash of names and semantics with the
R6RS make-hashtable procedure, Larceny's traditional
make-hashtable procedure has been renamed to
make-oldstyle-hashtable.
|
10.12.1. Hash tables
Procedure make-oldstyle-hashtable
(make-oldstyle-hashtable hash-function bucket-searcher size) => hashtable
Returns a newly allocated mutable hash table using hash-function as
the hash function and bucket-searcher, e.g. assq, assv, assoc, to
search a bucket with size buckets at first, expanding the number of
buckets as needed. The hash-function must accept a key and return a
non-negative exact integer.
(make-oldstyle-hashtable hash-function bucket-searcher) => hashtable
Equivalent to (make-oldstyle-hashtable hash-function bucket-searcher n) for
some value of n chosen by the implementation.
(make-oldstyle-hashtable hash-function) => hashtable
Equivalent to (make-oldstyle-hashtable hash-function assv).
(make-oldstyle-hashtable ) => hashtable
Equivalent to (make-oldstyle-hashtable object-hash assv).
Procedure hashtable-contains?
(hashtable-contains? hashtable key) => bool
Returns true iff the hashtable contains an entry for key.
Procedure hashtable-fetch
(hashtable-fetch hashtable key flag) => object
Returns the value associated with key in the hashtable if the
hashtable contains key; otherwise returns flag.
Procedure hashtable-get
(hashtable-get hashtable key) => object
Equivalent to (hashtable-fetch #f).
Procedure hashtable-put!
(hashtable-put! hashtable key value) => unspecified
Changes the hashtable to associate key with value, replacing any
existing association for key.
Procedure hashtable-remove!
(hashtable-remove! hashtable key) => unspecified
Removes any association for key within the hashtable.
Procedure hashtable-clear!
(hashtable-clear! hashtable) => unspecified
Removes all associations from the hashtable.
Procedure hashtable-size
(hashtable-size hashtable) => integer
Returns the number of keys contained within the hashtable.
Procedure hashtable-for-each
(hashtable-for-each procedure hashtable) => unspecified
The procedure must accept two arguments, a key and the value
associated with that key. Calls the procedure once for each
key-value association in hashtable. The order of these calls is
indeterminate.
Procedure hashtable-map
(hashtable-map procedure hashtable)
The procedure must accept two arguments, a key and the value
associated with that key. Calls the procedure once for each
key-value association in hashtable, and returns a list of the
results. The order of the calls is indeterminate.
Procedure hashtable-copy
(hashtable-copy hashtable) => hashtable
Returns a copy of the hashtable.
10.12.2. Hash functions
The hash values returned by these functions are nonnegative exact
integer suitable as hash values for the hashtable functions.
Procedure equal-hash
(equal-hash object) => integer
Returns a hash value for object based on its contents.
Procedure object-hash
(object-hash object) => integer
Returns a hash value for object based on its identity.
Warning
|
This hash function performs extremely poorly on pairs,
vectors, strings, and bytevectors, which are the objects
with which it is mostly likely to be used.
For efficient hashing on object identity, create the
hashtable with make-eq-hashtable or make-eqv-hashtable
of the (rnrs hashtables) library.
|
Procedure string-hash
(string-hash string) => fixnum
Returns a hash value for string based on its content.
Procedure symbol-hash
(symbol-hash symbol) => fixnum
Returns a hash value for symbol based on its print name.
The symbol-hash
is very fast, because the hash code is cached in the symbol data
structure.
10.13. Parameters
Parameters are procedures that serve as containers for values; parts
of the system that do not operate in the same namespace can still
share parameters and thereby read and write shared state.
A parameter takes zero or one arguments. If called with no arguments,
it returns the current value of the parameter. If called with one
argument, it sets the parameter's value to that of the argument and
returns the new value.
Procedure make-parameter
(make-parameter name value) => procedure
Create a parameter with name name, initial value value, and
optional setter predicate predicate. When the parameter is set the
new value is first passed to predicate,, and if it returns #f then
an error is signalled. Name can be a symbol or a string.
Syntax parameterize
(parameterize ((parameter0 value0) …) expr0 expr1 …)
Parameterize overrides the values of a set of parameters in a dynamic
scope — it is like fluid-let for parameters.
10.13.1. Larceny parameters
The following list of parameters does not yet include the reader or
compiler switches, which are also parameters.
Parameter break-handler
Parameter console-input-port-factory
Parameter console-output-port-factory
Parameter current-input-port
Parameter current-output-port
Parameter error-handler
Parameter evaluator
Parameter herald
Parameter interaction-environment
Parameter keyboard-interrupt-handler
Parameter load-evaluator
Parameter quit-handler
Parameter repl-level
Parameter repl-evaluator
Parameter repl-printer
Parameter reset-handler
Parameter standard-timeslice
Parameter structure-comparator
Parameter structure-printer
Parameter timer-interrupt-handler
10.14. Property Lists
The property list of a symbol is an association list that is
attached to that symbol. The association list maps properties, which
are themselves symbols, to arbitrary values.
Procedure putprop
(putprop symbol property obj) => unspecified
If an association exists for property on the property list of
symbol, then its value is replaced by the new value
obj. Otherwise, a new association is added to the property list of
symbol that associates property with obj.
Procedure getprop
(getprop symbol property) => obj
If an association exists for property on the property list of
symbol, then its value is returned. Otherwise, #f is returned.
Procedure remprop
(remprop symbol property) => unspecified
If an association exists for property on the property list of
symbol, then that association is removed. Otherwise, this is a
no-op.
10.15. Symbols
Procedure gensym
(gensym string) => symbol
Gensym returns a new uninterned symbol, the name of which contains the
given string.
Procedure oblist
(oblist ) => list
Oblist returns the list of interned symbols.
Procedure oblist-set!
(oblist-set! list) => unspecified
(oblist-set! list table-size) => unspecified
oblist-set! sets the list of interned symbols to those in the given
list by clearing the symbol hash table and storing the symbols in
list in the hash table. If the optional table-size is given, it is
taken to be the desired size of the new symbol table.
See also: [symbol-hash].
10.16. System Control and Performance Measurement
Procedure collect
(collect ) => unspecified
(collect generation) => unspecified
(collect generation method) => unspecified
Collect initiates a garbage collection. If the system has multiple
generations, then the optional arguments are interpreted as
follows. The generation is the generation to collect, where 0 is the
youngest generation. The method determines how the collection is
performed. If method is the symbol collect, then a full collection
is performed in that generation, whatever that means — in a normal
multi-generational copying collector, it means that all live objects
in the generation's current semispace and all live objects from all
younger generations are copied into the generation's other
semispace. If method is the symbol promote, then live objects are
promoted from younger generations into the target generation — in our
example collector, that means that the objects are copied into the
target generation's current semispace.
The default value for generation is 0, and the default value for
method is collect.
Note that the collector's internal policy settings may cause it to
perform a more major type of collection than the one requested; for
example, an attempt to collect generation 2 could cause the collector
to promote all live data into generation 3.
Procedure gc-counter
(gc-counter ) => fixnum
gc-counter returns the number of garbage collections performed since
startup. On a 32-bit system, the counter wraps around every
1,073,741,824 collections.
gc-counter is a primitive and compiles to a single load instruction
on the SPARC.
Procedure major-gc-counter
(major-gc-counter ) => fixnum
major-gc-counter returns the number of major garbage collections
performed since startup, where a major collection is defined as a
collection that may change the address of objects that have already
survived a previous collection.
On a 32-bit system, the counter wraps around every
1,073,741,824 collections.
major-gc-counter is a primitive and compiles to a single load
instruction on the SPARC. Its primary use to implement efficient
hashtables that hash on object identity (make-eq-hashtable and
make-eqv-hashtable).
Procedure gcctl
(gcctl heap-number operation operand) => unspecified
[GCCTL is largely obsolete in the new garbage collector but may be
resurrected in the future. It can still be used to control the
non-predictive collector.]
gcctl controls garbage collection policy on a heap-wise basis. The
heap-number is the heap to operate on, like for the command line
switches: heap 1 is the youngest. If the given heap number does not
correspond to a heap, gcctl fails silently.
The operation is a symbol that selects the operation to perform, and
the operand is the operand to that operation, always a number. For
the non-predictive garbage collector, the following operator/operand
pairs are meaningful:
-
j-fixed, n: after a collection, the collector parameter j should be set to the value n, if possible. (Non-predictive heaps only.)
-
j-percent, n: after a collection, the collector parameter j should be set to be n percent of the number of free steps. (Non-predictive heaps only.)
-
incr-fixed, n: when growing the heap, the growing should be done in increments of n. In the non-predictive heap, n is the number of steps. In other heaps, n denotes kilobytes.
-
incr-percent, n: when growing the heap, the growing should be done in increments of n percent.
Example: if the non-predictive heap is heap number 2, then the expressions
(gcctl 2 'j-fixed 0)
(gcctl 2 'incr-fixed 1)
makes the non-predictive collector simulate a normal stop-and-copy
collector (because j is always set to 0), and grows the heap only
one step at a time as necessary. This may be useful for certain kinds
of experiments.
Example: ditto, the expressions
(gcctl 2 'j-percent 50)
(gcctl 2 'incr-percent 20)
selects the default policy settings.
Note: The gcctl facility is experimental. A more developed
facility will allow controlling heap contraction policy, as well as
setting all the watermarks. Certainly one can envision other uses,
too. Finally, it needs to be possible to get current values.
Note: Currently the non-predictive heap (np-sc-heap.c) and the
standard stop-and-copy "old" heap (old-heap.c) are supported, but
not the standard "young" heap (young-heap.c), nor the stop-and-copy
collector (sc-heap.c).
Procedure sro
(sro pointer-tag type-tag limit) => vector
SRO ("standing room only") is a system primitive that traverses the
entire heap and returns a vector that contains all live objects in the
heap that satisfy the constraints imposed by its parameters:
-
If pointer-tag is -1, then object type is unconstrained;
otherwise, the object type is constrained to have a pointer tag
that matches pointer-tag. You can read all about pointer tags
here, but the short story is that 1=pair, 3=vector-like,
5=bytevector-like, and 7=procedure-like.
-
If type-tag is -1, then object type is unconstrained by
type-tag; otherwise, only objects with a matching type-tag are
selected (after selection by pointer tag). Pairs don't have
type-tags, but other objects do. You can read all about type-tags
here.
-
Limit constrains the selected objects by the number of
references. If limit is -1, then no constraints are imposed;
otherwise, only objects (selected by pointer-tag and type-tag)
with no more than limit references to them are selected.
For example, (sro -1 -1 -1) returns a vector that contains all live
objects (not including the vector), and (sro 5 2 3) returns a vector
containing all live flonums (bytevector-like, with typetag 2) that are
referred to in no more than 3 places.
Procedure stats-dump-on
(stats-dump-on filename) => unspecified
Stats-dump-on turns on garbage collection statistics dumping. After
each collection, a complete RTS statistics dump is appended to the
file named by filename.
The file format and contents are documented in a banner written at the
top of the output file. In addition, accessor procedures for the
output structure are defined in the program Util/process-stats.sch.
Stats-dump-on does not perform an initial dump when the file is first
opened; only at the first collection is the first set of statistics
dumped. The user might therefore want to initiate a minor collection
just after turning on dumping in order to have a baseline set of data.
Procedure stats-dump-off
(stats-dump-off ) => unspecified
Stats-dump-off turns off garbage collection statistics dumping (which
was turned on with [stats-dump-on]). It does not dump a final set
of statistics before closing the file; therefore, the user may wish to
initiate a minor collection before calling this procedure.
Procedure system-features
(system-features ) => alist
System-features returns an association lists of system features. Most
entries are self-explanatory. The following are a more subtle:
-
The value of architecture-name is Larceny's notion of the architecture for which it was compiled, not the architecture the program is currently running on. For example, the value of this feature is "Standard-C" if you're running Petit Larceny.
-
The value of heap-area-info is a vector of vectors, one subvector for each heap area in the running system. The subvector has four entries: the generation number, the area type, the current size, and additional information.
Procedure display-memstats
(display-memstats vector) => unspecified
(display-memstats vector minimal) => unspecified
(display-memstats vector minimal full) => unspecified
Display-memstats takes as its argument a vector as returned by
[memstats] and displays the contents of the vector in
human-readable form on the current output port. By default, not all of
the values in the vector are displayed.
If the symbol minimal is passed as the second argument, then only a
small number of statistics generally relevant to running benchmarks
are displayed.
If the symbol full is passed as the second argument, then all
statistics are displayed.
Procedure memstats
(memstats ) => vector
Memstats returns a freshly allocated vector containing run-time-system
resource usage statistics. Many of these will make no sense whatsoever
to you unless you also study the RTS sources. A listing of the
contents of the vector is available here.
Procedure run-with-stats
(run-with-stats thunk) => obj
Run-with-stats evaluates thunk, then prints a short summary of
run-time statistics, as with
(display-memstats ... 'minimal),
and then returns the result of evaluating thunk.
Procedure run-benchmark
(run-benchmark name k thunk ok?) => obj
Run-benchmark prints a short banner (including the identifying name)
to identify the benchmark, then runs thunk k times, and finally
tests the value returned from the last call to thunk by applying the
predicate ok? to it. If the predicate returns true, then
run-benchmark prints summary statistics, as with
([display-memstats][5] ... 'minimal).
If the predicate returns false, an error is signalled.
10.17. SRFI Support
The SRFIs (Scheme Requests For Implementations) is an Internet-based
collection of Scheme code designed and provided by Scheme
programmers. The SRFI effort is open to anyone, and is described at
http://srfi.schemers.org.
The fundamental SRFI is SRFI-0, "Feature-based conditional expansion
construct", which allows a program to query the underlying
implementation about the available SRFIs (and potentially about other
implementation features) at macro expansion time. The design documents
for this and other SRFIs are available at the web site shown above.
Larceny currently supports many SRFIs, but not as many as it should.
Some SRFIs are built into Larceny, but most must be loaded dynamically
using Larceny's require procedure:
Larceny provides the following nonstandard SRFI keys for use in
SRFI 0:
10.18. SLIB support
SLIB
is a large collection of useful libraries that have been
written or collected by Aubrey Jaffer.
Larceny supports SLIB via
SRFI 96,
but SLIB itself is not shipped with Larceny;
it must be downloaded separately and then installed.
For the most up-to-date information on installing and using
SLIB with Larceny, see doc/HOWTO-SLIB.
10.19. Foreign-Function Interface to C
Larceny provides a general foreign-function interface (FFI) substrate
on which other FFIs can be built; see
Larceny Note #7.
The FFI described in this manual section is a simple example of
a derived FFI. It is not yet fully evolved, but it is useful.
Warning
|
This section has undergone signficant revision, but
not all of the material has been properly vetted.
Some of the information in this section may be out of date.
|
Note
|
Some of the text below is adapted from the 2008 Scheme Workshop
paper, “The Layers of Larceny's Foreign Function Interface,”
by Felix S Klock II. That paper may provide additional insight
for those searching for implementation details and motivations.
|
10.19.1. Introducing the FFI
There are a number of different potential ways to use the FFI.
One client may want to develop code in C and load it into Larceny.
Another client may want to load native libraries
provided by the host operating system, enabling invocation
of foreign code from Scheme expressions without developing
any C code or even running a C compiler.
Larceny's FFI can be used for both of these cases,
but many of its facilities target a third client
in between the two extremes: a client with a C compiler and
the header files and object code for the foreign libraries,
but who wishes to avoid writing glue code in C to interface
with the libraries.
There are four main steps to interacting with foreign code:
-
identifying the space of values manipulated by the
foreign code that will also be manipulated in Scheme,
-
describing how to marshal values between foreign and
Scheme code,
-
loading library file(s) holding foreign object code, and
-
linking procedures from the loaded library.
Step 1 is conceptual, while steps 2 through 4
yield artifacts in Scheme source code.
10.19.2. The space of foreign values
At the machine code level, foreign values are uninterpreted
sequences of bits. Often foreign object code is oriented
around manipulating word-sized bit-sequences (words)
or arrays and tuples of words.
Many libraries are written with a particular
interpretation of such values. In C code, explicit types are
often used hints to guide such interpretation; for example,
a 0 of type bool is usually interpreted as false,
while a 1 (or other non-zero value) of type bool is
usually interpreted as true.
Another example are C enumerations (or enums).
An enum declaration defines a set of named
integral constants. After the C declaration:
enum months { JAN = 1, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC };
a JAN in C code now denotes 1, FEB is 2, and so on.
Furthermore, tools like debuggers may render a variable x
dynamically assigned the value 2 (and of static type enum months)
as FEB. Thus the enum declaration
intoduces a new interpretation for a finite set of integers.
This leads to questions for a client of an FFI;
we explore some below.
-
Should foreign words be passed over to
the Scheme world as uninterpreted numbers (and thus
be converted into Scheme integers, usually fixnums),
or should they be marshaled into interpreted values, such as
#f and #t for the bool type, or the Scheme symbols
{JAN, FEB, MAR, APR, MAY, JUN,
JUL, AUG, SEP, OCT, NOV, DEC}
for the enum months type?
-
Similarly, how should Scheme values be marshaled into
foreign words?
-
A foreign library might leave the mapping
of names like FEB to words like 2 unspecified
in the library interface.
That is, while the C compiler will know FEB maps to 2
according to a particular version of the library's header file,
the library designer may intend to change this mapping
in the future, and clients writing C code should only use
the names to refer to a enum months value, and not integer
expressions.
-
How should this constraint be handled in the FFI; should
the library client revise their code in reaction to
such changes to the mapping?
-
Or should the system derive
the mapping from the header files, in the same manner that
the C compiler does?
-
Foreign libraries often manipulate
mutable entities, like arrays of words where
modifications can be observed (often by design).
-
How should such values be marshaled?
-
Is it sound to copy such values to the Scheme heap?
If so, is a shallow copy sufficient?
-
Will the foreign code hold references to heap-allocated
objects? Heap-allocated objects that leak out to
foreign memory must be treated with care;
garbage collection presents two main problems.
-
First, such objects must not move during a garbage collection;
Larceny supports this via special-purpose allocation routines:
cons-nonrelocatable, make-nonrelocatable-bytevector,
and make-nonrelocatable-vector.
-
Second, the garbage collector must know to hold on to
(i.e. trace)
such values as long as they are needed by foreign code;
otherwise the objects or their referents may be
collected without the knowledge of the foreign code.
Answering these questions may require deep knowledge
of the intended usage of the foreign library.
The Larceny FFI attempts to ease interfacing with
foreign code in the presence of the above concerns,
but the nature of the header files included with
most foreign libraries means that the FFI cannot infer
the answers unassisted.
Note
|
Foreign C code developed to work in concert with Larceny
could hypothetically be written to cope with holding
handles for objects managed by the the garbage collector,
but there is currently no significant support
for this use-case.
|
Note
|
One class of foreign values is not addressed
by the Larceny FFI: structures passed by value (as
opposed to by reference, ie pointers to structures).
There is no way to describe the interface to a
foreign procedure that accepts or produces a
C struct (at least not properly nor portably).
This tends to not matter for many foreign libraries
(since many C programmers eschew passing structures
by value), but it can arise.
If the foreign library of interest has procedures that
accept or produce a C struct, we currently recommend
either avoiding such procedures, or writing
adapter code in C that marshals between values handled
by the FFI and the C struct.
|
The conclusion is: when designing an interface to a foreign
library, you should analyze the values manipulated on the
foreign side and identify their relationship with values
on the Scheme side.
After you have identified the domains of interest,
you then describe how the values will be marshaled
back and forth between the two domains.
10.19.3. Marshalling via ffi-attributes
This section describes the marshalling protocol defined in
lib/Base/std-ffi.sch.
Foreign functions automatically marshal their inputs and outputs
according to type-descriptors attached to each foreign
function.
Type-descriptors are S-expressons formed according to the following
grammar:
TypeDesc ::= CoreAttr | ArrowT | MaybeT | OneOfT
CoreAttr ::= PrimAttr | VoidStar | ---
PrimAttr ::= CurrentPrimAttr | DeprecatedPrimAttr
CurrentPrimAttr
::= int | uint | byte | short | ushort | char | uchar
| long | ulong | longlong | ulonglong
| size_t | float | double | bool | string | void
DeprecatedPrimAttr
::= unsigned | boxed
VoidStar ::= void* | ---
ArrowT ::= (-> (TypeDesc ...) TypeDesc)
MaybeT ::= (maybe TypeDesc)
OneOfT ::= (oneof (Any Fixnum) ... TypeDesc)
where --- represents a user-extensible part of the grammar
(see below),
Any represents any Scheme value, and Fixnum represents
any word-sized integer.
A central registry maps CoreAttr's to a foreign
representation and two conversion routines:
one to convert a Scheme value to a foreign argument, and
another to convert a foreign result back back to a Scheme value.
The denoted components are collectively referred to as a type
within the FFI documentation.
The registry is extensible; the ffi-add-attribute-core-entry!
procedure adds new CoreAttr's to the registry, and
one can alternatively add short-hands for
type-descriptors via the ffi-add-alias-of-attribute-entry!
procedure.
Finally, one can add new VoidStar productions
(subtypes of the void* type-descriptor)
via the ffi-install-void*-subtype procedure
(defined in the lib/Standard/foreign-stdlib.sch library).
Primitive Attribute Types
The following is a list of the accepted types and their conversions
at the boundary between Scheme and foreign code:
-
int
-
Exact integer values in the range [-231,231-1].
Scheme integers in that range are converted to and from C "int".
-
uint
-
Exact integer values in the range [0,232-1].
Scheme integers in that ranges are converted to and from C "unsigned int".
-
byte
-
Synonymous with int in the current implementation.
-
short
-
Synonymous with int in the current implementation.
-
ushort
-
Synonymous with unsigned in the current implementation.
-
char
-
Scheme ASCII characters are converted to and from C "char".
-
uchar
-
Scheme ASCII characters are converted to and from C "unsigned char".
-
long
-
Synonymous with int in the current implementation.
-
ulong
-
Synonymous with unsigned in the current implementation.
-
longlong
-
Exact integer values in the range [-263,263-1].
Scheme integers in that range are converted
to and from C "long long".
-
ulonglong
-
Exact integer values in the range [0,264-1].
Scheme integers in that range are converted
to and from C "unsigned long long".
-
size_t
-
Synonymous with uint in the current implementation.
-
float
-
Scheme flonums are converted to and from C "float".
The conversion to float is performed via
a C (float) cast from a C double.
-
double
-
Scheme flonums are converted to and from C "double".
-
bool
-
Scheme objects are converted to C "int";
#f is converted to 0, and all other objects to 1.
In the reverse direction, 0 is converted to #f and
all other integers to #t.
-
string
-
A Scheme string holding ASCII characters
is copied into a NUL-terminated bytevector,
passing a pointer to its first byte to the foreign procedure;
#f is converted to a C "(char*)0" value.
In the reverse direction, a pointer to a NUL-terminated sequence
of bytes interpreted as ASCII characters is
copied into a freshly allocated Scheme string; a NULL pointer is
converted to #f.
-
void
-
No return value.
(Only used in return position for foreign functions;
all Scheme procedures passed to the FFI are invoked in a context
expecting one value.)
-
unsigned
-
Synonymous with uint; deprecated.
-
boxed
-
Any heap-allocated data structure (pair,
bytevector-like, vector-like, procedure) is converted to
a C "void*" to the first element of the structure. The
value #f is also acceptable. It is converted to a C "(void*)0"
value.
(Only used in argument position for foreign functions; foreign
functions are not expected to return direct references
to heap-allocated values.)
Extending the Core Attribute Registry
The public interface to many foreign libraries is written
in terms of types defined within that foreign library.
One can introduce new types to the Larceny FFI
by extending the core attribute entry table.
Procedure ffi-add-attribute-core-entry!
(ffi-add-attribute-core-entry! entry-name rep-sym marshal unmarshal) => unspecified
[ffi-add-attribute-core-entry!] extends the
internal registry with the new entry specified by its arguments.
-
entry-name is a symbol (the symbolic type name being
introduced to the ffi).
-
rep-name is a low-level type descriptor symbol, one of
signed32, unsigned32, signed64, unsigned64
(representing varieties of fixed width integers),
ieee32 (representing “floats”),
ieee64 (representing “doubles”), or
pointer (representing “(void*)” in C).
-
marshal is a marshaling function that accepts a Scheme object and a symbol
(the name of the invoking procedure); it is responsible for checking
the Scheme object's validity and then producing a corresponding
instance of the low-level representation.
-
unmarshal is either #f or an unmarshalling function that
accepts an instance of the low-level representation
and produces a corresponding Scheme object.
Attribute Type Constructors
Core attributes suffice for linking to simple
functions.
Constructured FFI attributes express more complex
marshaling protocols
Arrow Type Constructors
A structured FFI attribute
of the form (-> (s_1 … s_n) s_r)
(called an arrow type)
allows passing functions from Scheme to C
and back again. Each of the s_1, …, s_n, s_r
is an FFI attribute.
When an arrow type describes an input to a foreign
function, it marshals a Scheme procedure to a
C function pointer by generating glue code to hook the two together
and marshal values as described by the FFI attributes
within the arrow type.
Likewise, when an arrow type describes an output from a
foreign function, it marshals a C function pointer
to a Scheme procedure, again by generating glue code.
These two mappings naturally generalize to arbitrary nesting
of arrow types, so one can create callbacks that consume
callouts, return callouts that consume callbacks, and so on.
Warning
|
The current implementation of arrow types introduces an
unnecessary space leak, because none of Larceny's current
garbage collectors attempt to reclaim some of the structure
allocated (in particular, the so-called trampolines)
when functions are marshaled via arrow types.
The FFI could be revised to reduce the leak
(e.g. it could keep a cache of generated trampolines and
reuse them, but currently do not do so).
Many foreign libraries have a structure where one only
sets up a fixed set of callbacks, and then all further
computation does not require arrow type marshaling.
This is one reason why fixing this problem
has been a low priority item for the Larceny development
team.
|
Maybe Type Constructor
(maybe t) captures the
pattern of passing NULL in C and #f in Scheme
to represent the absence of information.
The FFI attribute t within the maybe type
describes the typical information passed;
the constructed maybe type
marshals #f to the foreign null pointer or 0 (as appropriate),
and otherwise applies the marshaling of t.
Likewise, it unmarshals the foreign
null pointer and 0 to #f, and otherwise applies the
unmarshaling of t.
(There are a few other built-in type constructors, such as
the oneof type constructor, but they
are not as fully-developed as the two above, and are intended
for use only for internal development for now.)
void* Type Hierarchies
Using the void* attribute
wraps foreign addresses up in a Larceny record,
so that standard numeric
operations cannot be directly applied by accident.
The FFI uses two features of Larceny's record system:
the record type descriptor is a first class
value with an inspectable name, and
record types are extensible via single-inheritance.
Basic Operations on void*
The FFI provides void*-rt, a record type
descriptor with a single field (a wrapped address).
There is also a family of functions for dereferencing the
pointer within a void*-rt and manipulating the
state it references.
Procedure void*->address
(void*->address x) => number
Extracts the underlying address held in a
void*.
Procedure void*?
(void*? x) => boolean
Distinquishes
void*'s from other Scheme values.
Procedure void*-byte-ref
(void*-byte-ref x idx) => number
Extracts byte at offset from address within
x.
Procedure void*-byte-set!
(void*-byte-set! x idx val) => unspecified
Modifies byte at offset from address within
x.
Procedure void*-word-ref
(void*-word-ref x idx) => number
Extracts word-sized integer at offset from address within
x.
Procedure void*-word-set!
(void*-word-set! x idx val) => unspecified
Modifies word-sized integer at offset from address within
x.
Procedure void*-void*-ref
(void*-void*-ref x idx) => void*
Extracts address (and wraps it in a
void*) at offset from address within
x.
Procedure void*-void*-set!
(void*-void*-set! x idx val) => unspecified
Modifies address at offset from address within
x.
Procedure void*-double-ref
(void*-double-ref x idx) => number
Extracts 64-bit flonum at offset from address within
x.
Procedure void*-double-set!
(void*-double-set! x idx val) => unspecified
Modifies 64-bit flonum at offset from address within
x.
Type Hierarchies
Procedures for establishing type hierarchies are provided by the
lib/Standard/foreign-stdlib.sch library; see
[ffi-install-void*-subtype] and [establish-void*-subhierarchy!].
10.19.4. Creating loadable modules
You must first compile your C code and create one or more loadable object modules. These object modules may then be loaded into Larceny, and Scheme foreign functions may link to specific functions in the loaded module. Defining foreign functions in Scheme is covered in a later section.
The method for creating a loadable object module varies from platform to platform. In the following, assume you have to C source files file1.c and file2.c that define functions that you want to make available as foreign functions in Larceny.
SunOS 4
Compile your source files and create a shared library. Using GCC, the command line might look like this:
gcc -fPIC -shared file1.c file2.c -o my-library.so
The command creates my-library.so in the current directory. This library can now be loaded into Larceny using [foreign-file]. Any other shared libraries used by your library files should also be loaded into Larceny using [foreign-file] before any procedures are linked using [foreign-procedure].
By default, /lib/libc.so is made available to the dynamic linker and to the foreign function interface, so there is no need for you to load that library explicitly.
SunOS 5
Compile your source files and create a shared library, linking with all the necessary libraries. Using GCC, the command line might look like this:
gcc -fPIC -shared file1.c file2.c -lc -lm -lsocket -o my-library.so
Now you can use foreign-file to load my-library.so into Larceny.
By default, /lib/libc.so is made available to the foreign function interface, so there is no need for you to load that library explicitly.
10.19.5. The Interface
Procedures
Procedure foreign-file
(foreign-file filename) => unspecified
[foreign-file] loads the named object file into Larceny and makes it available for dynamic linking.
Larceny uses the operating system provided dynamic linker to do dynamic linking. The operation of the dynamic linker varies from platform to platform:
-
On some versions of SunOS 4, if the linker is given a file that does not exist, it will terminate the process. (Most likely this is a bug.) This means you should never call foreign-file with the name of a file that does not exist.
-
On SunOS 5, if a foreign file is given to foreign-file without a directory specification, then the dynamic linker will search its load path (the LD_LIBRARY_PATH environment variable) for the file. Hence, a foreign file in the current directory should be "./file.so", not "file.so".
Procedure foreign-procedure
(foreign-procedure name (arg-type …) return-type) => unspecified
FIXME: The interface to this function has been extended to support
hooking into Windows procedures that use the Pascal calling convention
instead of the C one. The way to select which convention to use
should be documented.
Returns a Scheme procedure p that calls the foreign procedure whose
name is name. When p is called, it will convert its parameters to
representations indicated by the arg-types and invoke the foreign
procedure, passing the converted values as parameters. When the
foreign procedure returns, its return value is converted to a Scheme
value according to return-type.
Types are described below.
The address of the foreign procedure is obtained by searching for name in the symbol tables of the foreign files that have been loaded with foreign-file.
Procedure foreign-null-pointer
(foreign-null-pointer ) => integer
Returns a foreign null pointer.
Procedure foreign-null-pointer?
(foreign-null-pointer? integer) => boolean
Tests whether its argument is a foreign null pointer.
10.19.6. Foreign Data Access
Raw memory access
The two primitives peek-bytes and poke-bytes are provided for reading and writing memory at specific addresses. These procedures are typically used for copying data from foreign data structures into Scheme bytevectors for subsequent decoding.
(The use of peek-bytes and poke-bytes can often be avoided by keeping foreign data in a Scheme bytevector and passing the bytevector to a call-out using the boxed parameter type. However, this technique is inappropriate if the foreign code retains a pointer to the Scheme datum, which may be moved by the garbage collector.)
Procedure peek-bytes
(peek-bytes addr bytevector count) => unspecified
Addr must be an exact nonnegative integer. Count must be a fixnum. The bytes in the range from addr through addr+count-1 are copied into bytevector, which must be long enough to hold that many bytes.
If any address in the range is not an address accessible to the process, unpredictable things may happen. Typically, you'll get a segmentation fault. Larceny does not yet catch segmentation faults.
Procedure poke-bytes
(poke-bytes addr bytevector count) => unspecified
Addr must be an exact nonnegative integer. Count must be a fixnum. The count first bytes from bytevector are copied into memory in the range from addr through addr+count-1.
If any address in the range is not an address accessible to the process, unpredictable things may happen. Typically, you'll get a segmentation fault. Larceny does not yet catch segmentation faults.
Also, it's possible to corrupt memory with poke-bytes. Don't do that.
Foreign data sizes
The following variables constants define the sizes of basic C data types:
-
sizeof:short The size of a "short int".
-
sizeof:int The size of an "int".
-
sizeof:long The size of a "long int".
-
sizeof:pointer The size of any pointer type.
Decoding foreign data
Foreign data is visible to a Scheme program either as an object pointed to by a memory address (which is itself represented as an integer), or as a bytevector that contains the bytes of the foreign datum.
A number of utility procedures that make reading and writing data of common C primitive types have been written for both these kinds of foreign objects.
Bytevector accessor procedures
(%get16 bv i) => integer
(%get16u bv i) => integer
(%get32 bv i) => integer
(%get32u bv i) => integer
(%get-int bv i) => integer
(%get-unsigned bv i) => integer
(%get-short bv i) => integer
(%get-ushort bv i) => integer
(%get-long bv i) => integer
(%get-ulong bv i) => integer
(%get-pointer bv i) => integer
These procedures decode bytevectors that contain the bytes of foreign objects. In each case, bv is a bytevector and i is the offset of the first byte of a field in that bytevector. The field is fetched and returned as an integer (signed or unsigned as appropriate).
Bytevector updater procedures
(%set16 bv i val) => unspecified
(%set16u bv i val) => unspecified
(%set32 bv i val) => unspecified
(%set32u bv i val) => unspecified
(%set-int bv i val) => unspecified
(%set-unsigned bv i val) => unspecified
(%set-short bv i val) => unspecified
(%set-ushort bv i val) => unspecified
(%set-long bv i val) => unspecified
(%set-ulong bv i val) => unspecified
(%set-pointer bv i val) => unspecified
These procedures update bytevectors that contain the bytes of foreign objects. In each case, bv is a bytevector, i is an offset of the first byte of a field in that bytevector, and val is a value to be stored in that field. The values must be exact integers in a range implied by the data type.
Foreign-pointer accessor procedures
(%peek8 addr) => integer
(%peek8u addr) => integer
(%peek16 addr) => integer
(%peek16u addr) => integer
(%peek32 addr) => integer
(%peek32u addr) => integer
(%peek-int addr) => integer
(%peek-long addr) => integer
(%peek-unsigned addr) => integer
(%peek-ulong addr) => integer
(%peek-short addr) => integer
(%peek-ushort addr) => integer
(%peek-pointer addr) => integer
(%peek-string addr) => integer
These procedures read raw memory. In each case, addr is an address, and the value stored at that address (the size of which is indicated by the name of the procedure) is fetched and returned as an integer.
%Peek-string expects to find a NUL-terminated string of 8-bit bytes at the given address. It is returned as a Scheme string.
Foreign-pointer updater procedures
(%poke8 addr val) => unspecified
(%poke8u addr val) => unspecified
(%poke16 addr val) => unspecified
(%poke16u addr val) => unspecified
(%poke32 addr val) => unspecified
(%poke32u addr val) => unspecified
(%poke-int addr val) => unspecified
(%poke-long addr val) => unspecified
(%poke-unsigned addr val) => unspecified
(%poke-ulong addr val) => unspecified
(%poke-short addr val) => unspecified
(%poke-ushort addr val) => unspecified
(%poke-pointer addr val) => unspecified
These procedures update raw memory. In each case, addr is an address, and val is a value to be stored at that address.
10.19.7. Heap dumping and the FFI
If foreign functions are linked into Larceny using the FFI, and a
Larceny heap image is subsequently dumped (with
[dump-interactive-heap] or
[dump-heap]), then the foreign functions are not saved as
part of the heap image. When the heap image is subsequently loaded
into Larceny at startup, the FFI will attempt to re-link all the
foreign functions in the heap image.
During the relinking phase, foreign files will again be loaded into Larceny, and Larceny's FFI will use the file names as they were originally given to the FFI when it tries to load the files. In particular, if relative pathnames were used, Larceny will not have converted them to absolute pathnames.
An error during relinking will result in Larceny aborting with an error message and returning to the operating system. This is considered a feature.
10.19.8. Examples
Change directory
This procedure uses the chdir() system call to set the process's current working directory. The string parameter type is used to pass a Scheme string to the C procedure.
(define cd
(let ((chdir (foreign-procedure "chdir" '(string) 'int)))
(lambda (newdir)
(if (not (zero? (chdir newdir)))
(error "cd: " newdir " is not a valid directory name."))
(unspecified))))
Print Working Directory
This procedure uses the getcwd() (get current working directory) system call to retrieve the name of the process's current working directory. A bytevector is created and passed in as a buffer in which to store the return value — a 0-terminated ASCII string. Then the FFI utility function ffi/asciiz->string is called to convert the bytevector to a string.
(define pwd
(let ((getcwd (foreign-procedure "getcwd" '(boxed int) 'int)))
(lambda ()
(let ((s (make-bytevector 1024)))
(getcwd s 1024)
(ffi/asciiz->string s)))))
Quicksort
Warning
|
this example is bogus. It is not safe to pass a collectable
object into a C procedure when the callback invocation might cause a
garbage collection, thus moving the object and invalidating the
address stored in the C machine context. |
This demonstrates how to use a callback such as the comparator argument to qsort.
It is specified in the type signature using -> as a type constructor.
(Note that one should probably use the built-in sort routines rather than call out
like this; this example is for demonstrating callbacks, not how to sort.)
(define qsort!
(foreign-procedure "qsort" '(boxed ushort ushort (-> (void* void*) int)) 'void))
(let ((bv (list->vector '(40 10 30 20 1 2 3 4))))
(qsort! bv 8 4
(lambda (x y)
(let ((x (/ (void*-word-ref x 0) 4))
(y (/ (void*-word-ref y 0) 4)))
(- x y))))
bv)
(let ((bv (list->bytevector '(40 10 30 20 1 2 3 4))))
(qsort! bv 8 1
(lambda (x y)
(let ((x (void*-byte-ref x 0))
(y (void*-byte-ref y 0)))
(- x y))))
bv)
Other examples
The Experimental directory contains several examples of use of the FFI. See in particular the files unix.sch (Unix system calls) and socket.sch (procedures for communicating over sockets).
10.19.9. Higher level layers
The general foreign-function interface functionality described above
is powerful but awkward to use in practice. A user might be tempted
to hard code values of offsets or constants that are compiler
dependent. Also, the FFI will marshall some low-level values such
as strings or integers, but other values such as enumerations
which could be naturally mapped to sets of symbols are not marshalled
since the host environment does not provide the necessary type
information to the FFI.
This section documents a collection of libraries to mitigate these and
other problems.
foreign-ctools
Foreign data access is performed by peeking at manually calculated
addresses, but in practice one often needs to inspect fields of C
structures, whose offsets are dependant on the application binary
interface (ABI) of the host environment. Similarly, C programs often
use refer to values via constant macro definitions; since the values
of such names are not provided by the object code and Scheme programs
do not have a C preprocessor run on them prior to execution, it is
difficult to refer to the same value without encoding "magic numbers"
into the Scheme source code.
The foreign-ctools library is meant to mitigate problems like the two
described above. It provides special forms for introducing global
definitions of values typically available at compile-time for a C
program. The library assumes the presence of a C compiler (such as
cc on Unix systems or cl.exe on Windows systems). The special
forms work by dynamically generating, compiling, and running C code at
expansion time to determine the desired values of structure offsets or
macro constants.
Here is a grammar for the define-c-info form provided by
the foreign-ctools library.
<exp> ::= (define-c-info <c-decl> ... <c-defn> ...)
<c-decl> ::= (compiler <cc-spec>)
| (path <include-path>)
| (include <header>)
| (include<> <header>)
<cc-spec> ::= cc | cl
<c-defn> ::= (const <id> <c-type> <c-expr>)
| (sizeof <id> <c-type-expr>)
| (struct <c-name> <field-clause> ...)
| (fields <c-name> <field-clause> ...)
| (ifdefconst <id> <c-type> <c-name>)
<c-type> ::= int | uint | long | ulong
<include-path>
::= <string-literal>
<header> ::= <string-literal>
<field-clause>
::= (<offset-id> <c-field>)
| (<offset-id> <c-field> <size-id>)
<c-expr> ::= <string-literal>
<c-type-expr>
::= <string-literal>
<c-name> ::= <string-literal>
<c-field> ::= <string-literal>
Syntax define-c-info
(define-c-info <c-decl> … <c-defn> …)
The <c-decl> clauses of define-c-info
control how header files are processed.
The compiler clause selects between cc
(the default UNIX system compiler) and cl
(the compiler included with Microsoft's Windows SDK).
The path clause adds a directory to search when
looking for header files.
The include and include<> clauses indicate
header files to include when executing the
<c-defn> clauses;
the two variants correspond to the quoted and bracketed
forms of the C preprocessor's #include directive.
The <c-defn> clauses bind identifiers.
A (const x t "ae") clause binds x to
the integer value of ae according to the C language;
ae can be any C arithmetic expression that evaluates
to a value of type t.
(The expected usage is for ae to be an
expression that the C preprocessor expands to an arithmetic expression.)
The remaining clauses provide similar functionality:
-
(sizeof x "te")
binds x to the size occupied by values
of type te, where te is any C type expression.
-
(struct "cn" … (x "cf" y) …)
binds x to the offset from the start of a
structure of type struct cn to its
cf field, and binds y, if present, to the field's size.
A fields clause is similar, but it applies
to structures of type cn rather than struct cn.
-
(ifdefconst x t "cn")
binds x to the value of cn if cn is defined;
x is otherwise bound to Larceny's unspecified value.
foreign-sugar
The [foreign-procedure] function is sufficient to link in
dynamically loaded C procedures, but it can be annoying to
use when there are many procedures to define that all follow
a regular pattern where one could infer a mapping between
Scheme identifiers and C function names.
For example, some libraries follow a naming convention where a words
within a name are separated by underscores; such functions could be
immediately mapped to Scheme names where the underscores have been
replaced by dashes.
The foreign-sugar library provides a special form, define-foreign,
which gives the user a syntax for defining foreign functions using
a syntax where one provides only the Scheme name, the argument types,
and the return type. The define-foreign form then attempts to
infer what C function the name was meant to refer to.
Syntax define-foreign
(define-foreign (name arg-type …) result-type)
Note
|
There is other functionality provided allowing the user to
introduce new rules for inferring C function names, but they are
undocumented because they will probably have to change when we switch
to an R6RS macro expander. |
foreign-stdlib
Procedure stdlib/malloc
(stdlib/malloc rtd) => procedure
Given a record extension of void*-rt, returns an allocator that uses
the C malloc procedure to allocate instances of such an object.
Note that the client is responsible for eventually freeing such
objects with [stdlib/free].
Procedure stdlib/free
(stdlib/free void*-obj)
Frees objects produced by allocators returned from [stdlib/malloc].
Procedure ffi-install-void*-subtype
(ffi-install-void*-subtype rtd) => rtd
(ffi-install-void*-subtype string) => rtd
(ffi-install-void*-subtype symbol) => rtd
[ffi-install-void*-subtype]
extends the core attribute registry with a new primitive
entry for subtype.
The parent-rtd argument should be a subtype of void*-rt
and defaults to void*-rt.
In the case of the symbol or string inputs, the
procedure constructs a new record type subtyping the parent argument.
In the case of the rtd input, the rtd record type
must extend void*-rt.
[ffi-install-void*-subtype] returns the subtype record type.
The returned record type represents a tagged wrapped C pointer,
allowing one to encode type hierarchies.
Procedure establish-void*-subhierarchy!
(establish-void*-subhierarchy! symbol-tree) => unspecified
[establish-void*-subhierarchy!] is a convenience function
for constructing large object hierarchies.
It descends the symbol-tree,
creates a record type descriptor for each symbol
(where the root of the tree has the parent void*-rt),
and invokes [ffi-install-void*-subtype] on all
of the introduced types.
Type char* extends void*
Procedure string->char*
(string->char* string) => char*
Procedure char*-strlen
(char*-strlen char*) => fixnum
Procedure char*->string
(char*->string char*) => string
(char*->string char* len) => string
Procedure call-with-char*
(call-with-char* string string-function) => value
Type char** extends
void*
Procedure call-with-char**
(call-with-char** string-vector function) => value
Type int* extends
void*
Procedure call-with-int*
(call-with-int* fixnum-vector function) => value
Type short* extends
void*
Procedure call-with-short*
(call-with-short* fixnum-vector function) => value
Type double* extends
void*
Procedure call-with-double*
(call-with-double* num-vector function) => value
FIXME: (There are other functions, but I want to test and document the
ones above first…)
foreign-cstructs
The foreign-cstructs library provides a
more direct interface to C structures.
It provides the define-c-struct special form.
This form is layered on top of define-c-info;
the latter provides the structure field offsets
and sizes used to generate constructors
(which produce appropriately sized bytevectors,
not record instances).
The define-c-struct form combines these
with marshaling and unmarshaling procedures to
provide high-level access to a structure.
The grammar for the define-c-struct form is presented below.
<exp> ::= (define-c-struct (<struct-type> <ctor-id> <c-decl> ...)
<field-clause> ...)
<field-clause>
::= (<c-field> <getter>) | (<c-field> <getter> <setter>)
<getter> ::= (<id>) | (<id> <unmarshal>)
<setter> ::= (<id>) | (<id> <marshal>)
<marshal> ::= <ffi-attr-symbol> | <marshal-proc-exp>
<unmarshal> ::= <ffi-attr-symbol> | <unmarshal-proc-exp>
<struct-type> ::= <string-literal>
foreign-cenums
This library provides the special forms
define-c-enum and define-c-enum-set,
which associate the identifiers of
a C enum type declaration
with the integer values they denote.
The define-c-enum form describes enums
encoding a discriminated sum;
define-c-enum-set describes bitmasks,
mapping them to R6RS enum-sets in Scheme.
The (define-c-enum en (<c-decl> …) (x "cn") …)
form adds the en FFI attribute.
The attribute marshals each symbol x to
the integer value that cn denotes in C;
unmarshaling does the inverse translation.
The (define-c-enum-set ens (<c-decl> …) (x "cn") …)
form binds ens to an R6RS enum-set constructor
with universe resulting from
(make-enumeration '(x …)); it also adds the ens
FFI attribute. The attribute marshals an
enum-set s constructed by ens
to the corresponding bitmask in C (that is,
the integer one would get by logically or'ing
all cn such that the corresponding x is in s).
Unmarshaling attempts to do the inverse translation.
The grammar for the two forms is presented below.
<exp> ::= (define-c-enum <enum-id> (<c-decl> ...)
(<id> <c-name>) ...)
<exp> ::= (define-c-enum-set <enum-id> (<c-decl> ...)
(<id> <c-name>) ...)
<enum-id> ::= <id>