Higher Order Compilation Project Homepage


This page describes an ongoing project to use higher order abstract syntax and higher order logic programming for compiler implementation. The code and examples of an experimental compiler are contained here. The meta-language is the Teyjus implementation of Lambda Prolog. There is a corresponding research paper. Some additional notes are also available. These notes are informal and may change.


The purpose of this project is to demonstrate and study the use of a higher order logic programming language in compiler construction. Optimizing compilers take years to develop, and it's not my purpose to perfect a product. I want to develop a set of techniques, that combined with previous work by various researchers in higher order specification, can be used in realistic compilation. For this purpose I've chosen to compile a very typical, imperative style language. I've also decided to forgo abstract machines in favor of a real RISC architecture.

The source language being compiled is called "bobcat", because it syntactically resembles the "tiger" language described in Andrew Appel's compiler text. Bobcat however, has a rudimentary form of Java style object orientation (without inheritance). The target language is Sparc assembly code. Executables are created using cc (or gcc) in its role as an assembler. The get a sense of the bobcat language being compiled, look at test3.bob and test7.bob. The generated assembly codes are in test3.s and test7.s respectively. Type "gcc test3.s" or "gcc test7.s" on a Sparc machine to generate an executable a.out, then type "./a.out" to run the program. (Use "cc" if gcc is not available). Using the C linker has the beneficial effect of allowing C functions (printf) to be called from bobcat programs.

**
Some of the code presented here slightly varies from those presented in the paper "Compiler Construction in Higher Order Logic Programming." The compiler is meant to be experimental and will undergo refinements and even major overhauls over time.
**

The structure of the modules that make up the compiler are as follows:

  1. module "bobabsyn": sig, mod
    Defines the higher order abstract syntax for the bobcat language. The .mod file contains copy clauses for substitution. The file also has first-order alternatives (unused) for certain constructs.
  2. module "lambdayacc": sig, mod
    A deterministic, bottom-up parser generator. This module also contains a semi-customizable lexical scanner. This sub-project has its own homepage, and there's also an associated research paper.
  3. module "bobcatgram": sig, mod
    The specification of the bobcat grammar and semantic actions of the parser. The sig file contains the declarations of the grammar symbols used. The mod file contains the grammar, operator precedence and associativity declarations, and semantic action clauses to generate lambda-syntax trees.
  4. module "bobparser": sig, mod
    The parser that's generated automatically from bobcatgram by lambdayacc. bobparser produces a lambda syntax tree (type texp). To get a good idea of what kind of structure is produced, parse one of the simpler bobcat programs with:
    [bobparser] ?- parsefile "test4.bob" A.
  5. module "kitty": sig, mod
    Contains the definition of a continuation passing style intermediate representation (type kexp) and the transformation from the lambda syntax tree to CPS form. To see a sample cps representation of a program, do
    [kitty] ?- parsefile "test4.bob" (program A), formcps kret A B.
    Variable B will be instantiated with the cps form of the program.
    A function call f(a+b) is represented as (cps (opexp "+" a b) (kabs u\ (karg u 0 \ (kcall f CN 1 K))))
    where K is the overall continuation and CN is the class that f belongs. to. CN="." indicates a global function. (in this way type information is carried in my CPS representation; it's necessitated by object orientation).
  6. module "absparc": sig, mod
    Contains another, very low level intermediate language "Abstract Sparc," which is machine dependent, but represents Sparc machine instructions in abstract syntax. For example, moving the contents of register "o1" to register "l2" is represented by
    movop (reg "o" 1) (reg "l" 2)
    The mod file contains the code generation (gencode) clauses that transform the cps language to the Abstract Sparc language.
    The most important type of this module is "store". A store can be a register, a memory location ("indirect"), a constant, or a label. The most important predicates are gencode and genloadop. The constructor "assoces" associates expressions with stores containing them. genloadop (generate-load-operand) "loads" an expression into a register, emitting the necessary instructions to do so at the same time.
  7. module "realsparc": sig, mod
    Contains the straight-forward translation from the abstract Sparc language to real Sparc assembly language instructions as strings, which are then written to a file. This module also contains the toplevel predicate "tigcompile", which is used as in:
    [realsparc] ?- tigcompile "test3.bob" "test3.s".
  8. module "typecat": sig, mod
    Type checking module for the lambda-syntax tree. This module is currently NOT hooked into the rest of the compiler (doesn't deal with oop yet). In accumulates the bobparser module, and can be used independently as in
    [typecat] ?- parsefile "test3.bob" (program A), typeexp A B.


List of sample bobcat programs and their compiled assembly codes:

  1. test3.bob, test3.s: contains demonstration of basic imperative programming techniques, including recursive functions and loops.
  2. test4.bob, test4.s: demonstrates mutually recursive functions
  3. test5.bob, test5.s: demonstrates arrays (static arrays). Arrays need to be redone.
  4. test6.bob, test6.s: demonstrates static scoping with nested lets
  5. test7.bob, test7.s: demonstrates object orientation with two classes.

All relevant files are in bobcat.tar.gz


Please contact Chuck Liang at liang@cs.umn.edu for questions.