Saturday, December 01, 2007

Ikarus - The new Scheme on the Block

A new Scheme on the Block

Congratulations to Abdulaziz Ghuloum with the release of his new Scheme compiler Ikarus!

Getting traction for a new Scheme implementation have never been easy due to the astonishingly number of existing quality implementations. It would however be surprised not to see Ikarus become one of the usual suspects. The reason for this is not the rather large number of features Ikarus has for a first release, but rather that all the hard parts of a quality implementation already are present.

The work on the compiler has already produced three papers. The first "An Incremental Approach to Compiler Construction" describes in 24 small steps how to implement a compiler capable of running environment-passing interpreter for core Scheme. The idea is to a fully working compiler before and after each step. This implies that the one must write the backend first. At this years ICFP in Freiburg Aziz told me, that Ikarus in fact grew in this way.

The second is "Implicit phasing for R6RS libraries" by Abdulaziz Ghuloum, and R. Kent Dybvig describe the design and implementation of "The portable R6RS library and syntax-case system".

The third "Generation-Friendly Eq Hash Tables" also by Ghuloum and Dybvig describe an implemention of eq hash tables were only objects that actually move during a collection is rehashed.

Native Compilation
Ikarus is a native compiler, which means that it compiles directly to x86 machine code, more specifially the resulting code runs on Intel compatible processors supporting SSE2 extentions. In order words it will run on any reasonable new computer.

The native compilation niche Ikarus joins, includes Chez Scheme, Larceny, and MIT/GNU Scheme.

The drawback of native compilation have so far been that more than one backend were needed to support both PCs and Macs. After Apple switched to Intel processors that problem (for the time being at least) has gone. Ikarus supports the big three: Windows, OS X, and Linux.

Incremental Compilation

Ikarus is an incremental compiler. This means that programs at runtime can use the compiler to compile (an link and run) programs "on-the-fly" without using any external tools. I'll leave it to the curious, to find the intel assembler in the Ikarus source.

A pleasant side-effect of using an incremental compiler is that eval can use the compiler, which means that there are good chances that eval will be both efficient and give results consistent with normal compilation. For non-incremental compilers such as the compile-to-C variety, eval is often implementation via a compile-to-closures approach, which introduces an extra source of bugs.

Not that I ever use eval...

Optimizing Compiler
... there is a fine line between “optimization” and “not being stupid.”
-- R. Kent Dybvig
So what exactly does it mean that Ikarus is an optimizing compiler?

For one it means that the code produced in value-, boolean- and effect-contexts are different. See the paper "Destination-Driven Code Generation" by R. Kent Dybvig, Robert Hieb, Tom Butler.

Machine registers are used to hold function call arguments and local variables. Mapping variables to machine registers in a optimal way is unfortunately a NP-hard problem, but nevertheless efficient algorithms exists that work well in practise. Ikarus uses a version of Chaitin's graph-coloring algorithm.

Needless to say, Ikarus also does constant folding.

As far as I can tell, control flow analysis based optimizations are yet to come.

Modern generational garbage collector

The generational garbage collector was inspired by the paper: Don't Stop the BIBOP: Flexible and Efficient Storage Management for Dynamically-Typed Languages by R. Kent Dybvig, David Eby, and Carl Bruggeman.

R6RS Compliance

The immediate goal for Ikarus to become 100% R6RS compliant. Rapid progress is made to achieve this goal, and as far I can tell, there no potential show-stoppers ahead - it is "just" a matter of implementing "more of the same".

Opinions of R6RS are many, but R6RS compliance means at least two things: a decent module system and documentation for free.


Blogger Paulo Matos said...

Hi Jens,

Do you have any reference for:

"Mapping variables to machine registers in a optimal way is unfortunately a NP-hard problem."


Paulo Matos

Blogger Jens Axel Søgaard said...

Hi Paulo,

It is mentioned on the Wikipedia page on Register Allocation.

In particular they refer to:
"Register Allocation after Classical SSA Elimination is NP-complete" by Fernando Magno Quintão Pereira, Jens Palsberg.

/Jens Axel

Blogger Paulo Matos said...

OK, Thanks very much. I was missing the term to search for. :)

Blogger Shriram Krishnamurthi said...

What a bizarre Wikipedia citation. This result goes back to the 1970s.


Post a Comment

Links to this post:

Create a Link

<< Home