Additional and Informal Notes on the Theory and Implmenation of "Lambdayacc." Chuck Liang I first became aware of the existence of bounded right context grammars while reading Donald Knuth's original paper on LR parsing. I was already basically convinced that there was no way to describe unique handles algebraically without regular expressions - which would be the same as constructing the "viable prefix recognizing" deterministic finite state machine in LR parsers. It would be infinitely unfair to say that I'm trying to challenge Knuth by going back to BRC grammars. In his paper Knuth gave two examples of LR(0) grammars that were not BRC. The first of these is similar to the one in my paper. The second is basicly the same as the following example: S -> (A) | c A -> (S) | c Here, A -> c would be the correct handle after an odd number of ('s, while S -> c would be the handle after an even number of ('s. Clearly the correct handle can not be determined without looking back at the entire contents of the stack, and thus it is not BRC. The BRC(1,0) equivalent grammar would be: S -> DA) | c A -> ES) | c D -> ( E -> ( The extra non-terminals D and E (for odd and even) capture the state information neccessary for the handle to be uniquely determined by looking back only one symbol. This example well illustrates the essential difference between BRC and non-BRC LR grammars: for the latter class it is possible to *derive* a kind of state information from the grammar, while for the BRC class it is necessary to embed the state information *inside* the grammar. In Prolog-style logic programming we are often confronted with the same type of problem. As opposed to imperative languages where varibles can be set to different values at different times, logic programming must capture the state information necessary for computation *as part* of the logical specification. This usually entails predicates having to carry a large number of parameters, as well as other techniques that lead to poor performance unless extra-logical features are used extensively. The resulting logic program will loose much of its "declarative" character. Logic programming is better used when it can avoid the carrying of state information and instead, concentrate on what it does best: logical deduction. Thus what I have done can be seen as the application of Knuth's insight in reverse. By having the grammar encode the state information, the resulting logic program parser need not be so burdened. That is, the tradeoff between BRC and general LR grammars is a positive one in the context of logic programming. Perhaps future developements in logic programming, such as those based on linear logic, would allow state information of the kind used in LR parsers to be specified succinctly. But currently, it is a lot easier to modify the grammar (consider the computation and representation of the LR(0) parsing DFA for even the simple grammar above). As a more practical example of this point, in the file "fndefgram.mod", you will find two productions which may appear to be redundant at first: tlparen ==> [lparen] tcomma ==> [comma] These productions are needed precisely because we use BRC grammars (actually simple BRC grammars). The parser can parse most of Lambda Prolog. In particular, lambda Prolog allows declarations of the following kinds: type f,g,h (i -> i) -> i. pred1 A B :- pred2 A, pred3 B. etc,... There are three sources of possible ambiguity that this syntax can cause for a BRC parser: 1. identifiers such as "i" and "pred2" can be parsed as either "term expression" or "type expression" - the same is true for application (a b). 2. "," can mean conjuction in a formula, or as a simple separator between identifiers in a type declaration. 3. expressions enclosed in multiple parentheses may be term or type expressions. Without the extra productions for tlparen and tcomma, reduce-reduce conflicts will result. In a LR parser, these conflicts are disambiguated by the "type" keyword. The tlparen and tcomma symbols propagates this information so that a simple BRC parser would suffice. The user may complain that this method entails learning about characteristics of a certain class of grammars before the parser generator can be used. But this applies to any parser generator. For example, the grammar S -> Aa | Bb A -> (A) | c B -> (B) | c is perfectly unambiguous, but is not a LR(k) grammar for any k, and unless one understands why he or she will not be able to use any yacc-style tool effectively. Implementation-Related Notes: The current implementation of LambdaYacc has the following characteristics, which will be modified in the future: No epsilon productions are allowed. This is only a implementation-wise simplification, and does not reflect on any deflects in the theoretical foundations, which encompass epsilon rules. As a consequence of this, however, the lookahead set is allowed to contain non-terminal symbols, and the "first" relation is checked dynamically. The first relation-closure is computed at generation time and added as atomic clauses to the parser (via the "gen_first" predicate). Free or "logic" variables are used for attributes in the grammar. This, as explained in my paper, simplifies the implementation, but is a non-logical feature. As a consequence, the "freshcopy" clauses are used profusely to keep these variables from being bound. ... Please check back for more!