MLud: Implementation Outline

MLud was implemented in December of 2002 by Derrick Coetzee, and this is a brief overview of how it works. Most of the code discussed herein can be found in src/objmodel.sml in the MLud package.

The MLud compiler is actually relatively simple, because it targets ML, and most of its features are easily implemented using ML features (an exception is return, which is implemented using continuations). However, even after doing this all it has is a string of ML. To turn this into actively executable ML fundamentally involves the SML/NJ system and the eval hack, which allows MLud to exploit the extensibility of SML/NJ's interactive environment to make itself runtime-extensible.

The eval hack

The eval hack, originally posted by Matthias Blume, is a way for code running in the SML/NJ interactive environment to take a string of ML, compile it, and execute it on-the-spot. Its main disadvantage is that you must know the exact type of the result ahead of time if you want a reference to the result. Also, it only works in the release version of SML/NJ; changes have been made to the programmatic interface to the compiler since then that need to abstracted away. A simple example is:

    (* Used to grab return value of eval *)
    val refHack = ref (fn (i : int) => 0)

    fun eval code =
	let
	    val code' = concat ["val _ = refHack := (", code, ")"]
	    val s = TextIO.openString code'
	in
	    Compiler.Interact.useStream s;
	    TextIO.closeIn s;
	    !refHack
	end

We can then use eval like this:

val oneAdder = eval "fn(i)=>i+1";
oneAdder(5); (* produces 6 *)

Definition of object

In my application, the restriction to one type of function was not a problem, because MLud is a dynamically typed language; all its datatypes are implemented as a single ML datatype called object:

    datatype object = VoidObject
		    | Object        of { (* object stuff *) }
		    | BooleanObject of bool
		    | CharObject    of char
		    | StringObject  of string
		    | IntegerObject of IntInf.int
		    | RealObject    of real
		    | SymbolObject  of Symbol.symbol
		    | ArrayObject   of object array
		    | ListObject    of object CPCD.deque
		    | MapObject     of (object,object) RedBlackMap.map
		    | ClosureObject of object -> object

Thus, a MLud function simply takes an object (an ArrayObject of arguments) and returns an object. The smlnj-lib red-black tree had to reimplemented as a polymorphic type in order to create this datatype, since otherwise we run head-first into SML's lack of support for recursive modules. CPCD is a module implementing confluently persistent catenable deques, a sophisticated persistent deque type supporting fast catenation.

Object storage

Now we see how a compiled MLud function is added to the environment, and how the built-in types are stored. It's also clear how inline ML works; it's simply inserted into the compiled ML output code. What's not yet clear is where the objects live, and how the multiple dispatch mechanism is properly invoked. The first question is answered by expanding out the Object branch from above:

      (* id is used for comparison of objects.
	 slots contains private data slots for objects.
	 methods contains methods with a specific number of parameters.
	 restMethods contains methods with a @ or "rest" parameter. *)
    | Object of {id: int,
		 slots: slot DoubleIntHashTable.hash_table,
		 methods: method list DoubleIntHashTable.hash_table,
		 restMethods: method list IntHashTable.hash_table}

An object only stores slots after they are set on it; initially, it retrieves slot values from the nearest ancestor which stores them. The id field is used in several places, including the default method of comparing two objects.

A DoubleIntHashTable, as seen above, has keys which are pairs of ints. For slots, these are (definer object id,symbol for slot name), while for methods these are (symbol for method name,number of arguments). Each value in methods' hash table is a list of methods defined on this object with a particular name and number of arguments. The restMethods are similar, except that each of them additionally includes a "rest" parameter that absorbs all remaining arguments into a new array.

Dispatch

Dispatch is handled mainly by the ObjectModel.call function, which takes 5 parameters:

The main purposes of call are:

The cache, as in Smalltalk and Self, is critical; the system was unusably slow until it was added. However, functions which add, modify, or remove methods must invalidate some entries in the cache, and functions which change the delegation graph utterly destroy the cache.

The dirty work of the multiple dispatch algorithm is done by the lookupMethod function. The jist of how it works is:

  1. It obtains a list of all methods with the correct name, the correct number of arguments, and signatures matching the arguments anywhere in the ancestors of the this object, including those permitted by a rest (@) parameter.
  2. If there is no match, an error is thrown.
  3. For any pair of methods where one is at least as specific in every argument, filter out the more general one. This step involves some work to do efficiently.
  4. Choose the first method from the remaining list.

Conclusion

These are the most important components of the MLud system. If you have any questions, or want to see another component of the MLud system explained, or if you find any of the above unclear and want an explanation, please don't hesitate to contact me.

Mlud home page

Other code from Moonflare

Moonflare home

All text and images, but not necessarily linked material, on this page ©1998-2006 Derrick Coetzee and Moonflare and may not be reproduced or used for any purpose without prior written permission except where otherwise indicated.