MLud was implemented in December of 2002 by Derrick Coetzee, and this is
a brief overview of how it works. Most of the code discussed herein
can be found in src/objmodel.sml in the MLud package.
The MLud compiler is actually relatively simple, because it targets ML, and most of its features are easily implemented using ML features (an exception is return, which is implemented using continuations). However, even after doing this all it has is a string of ML. To turn this into actively executable ML fundamentally involves the SML/NJ system and the eval hack, which allows MLud to exploit the extensibility of SML/NJ's interactive environment to make itself runtime-extensible.
The eval hack, originally posted by Matthias Blume, is a way for code running in the SML/NJ interactive environment to take a string of ML, compile it, and execute it on-the-spot. Its main disadvantage is that you must know the exact type of the result ahead of time if you want a reference to the result. Also, it only works in the release version of SML/NJ; changes have been made to the programmatic interface to the compiler since then that need to abstracted away. A simple example is:
(* Used to grab return value of eval *)
val refHack = ref (fn (i : int) => 0)
fun eval code =
let
val code' = concat ["val _ = refHack := (", code, ")"]
val s = TextIO.openString code'
in
Compiler.Interact.useStream s;
TextIO.closeIn s;
!refHack
end
We can then use eval like this:
val oneAdder = eval "fn(i)=>i+1"; oneAdder(5); (* produces 6 *)
In my application, the restriction to one type of function
was not a problem, because MLud is a dynamically typed language;
all its datatypes are implemented as a single ML datatype called
object:
datatype object = VoidObject
| Object of { (* object stuff *) }
| BooleanObject of bool
| CharObject of char
| StringObject of string
| IntegerObject of IntInf.int
| RealObject of real
| SymbolObject of Symbol.symbol
| ArrayObject of object array
| ListObject of object CPCD.deque
| MapObject of (object,object) RedBlackMap.map
| ClosureObject of object -> object
Thus, a MLud function simply takes an object (an
ArrayObject of arguments) and returns an
object. The smlnj-lib red-black tree had to
reimplemented as a polymorphic type in order to create this datatype,
since otherwise we run head-first into SML's lack of support for
recursive modules. CPCD is a module implementing confluently persistent catenable deques, a
sophisticated persistent deque type supporting fast catenation.
Now we see how a compiled MLud function is added to the
environment, and how the built-in types are stored. It's also clear
how inline ML works; it's simply inserted into the compiled ML output
code. What's not yet clear is where the objects live, and how the
multiple dispatch mechanism is properly invoked. The first question is
answered by expanding out the Object branch from
above:
(* id is used for comparison of objects.
slots contains private data slots for objects.
methods contains methods with a specific number of parameters.
restMethods contains methods with a @ or "rest" parameter. *)
| Object of {id: int,
slots: slot DoubleIntHashTable.hash_table,
methods: method list DoubleIntHashTable.hash_table,
restMethods: method list IntHashTable.hash_table}
An object only stores slots after they are set on it; initially, it
retrieves slot values from the nearest ancestor which stores them.
The id field is used in several places, including the
default method of comparing two objects.
A DoubleIntHashTable, as seen above, has keys which
are pairs of ints. For slots, these are (definer
object id,symbol for slot name), while for methods these
are (symbol for method name,number of arguments). Each
value in methods' hash table is a list of methods defined
on this object with a particular name and number of arguments. The
restMethods are similar, except that each of them additionally
includes a "rest" parameter that absorbs all remaining arguments into
a new array.
Dispatch is handled mainly by the ObjectModel.call
function, which takes 5 parameters:
The main purposes of call are:
lookupMethod and execute the appropriate
method with the appropriate parameters.lookupMethod
calls can be avoided in most cases.The cache, as in Smalltalk and Self, is critical; the system was unusably slow until it was added. However, functions which add, modify, or remove methods must invalidate some entries in the cache, and functions which change the delegation graph utterly destroy the cache.
The dirty work of the multiple dispatch algorithm is done by the
lookupMethod function. The jist of how it works is:
@) parameter.These are the most important components of the MLud system. If you have any questions, or want to see another component of the MLud system explained, or if you find any of the above unclear and want an explanation, please don't hesitate to contact me.