[antlr-interest] lexical rule overlapping

Thu Jun 1 05:22:18 PDT 2006

> Hi,
> 
> I am attempting implement Clean 0.8.4. According the 
> following grammar (extracted from page 80 of appendix.ps on 
> ftp://ftp.cs.kun.nl/pub/Clean/old/Clean08/doc/ps/), +1234 
> could be either considered as SymbolID or IntDenot.
> 
> SymbolId = UpperCaseChar.{RestChar} | Class1Char.{RestChar} ;
> 
> RestChar = LowerCaseChar | Class1Char | Digit| UpperCaseChar 
> | CharDel | StringDel ; LowerCaseChar = 'a' | 'b' | ... | 'z' 
> ; UpperCaseChar = 'A' | 'B' | ... | 'Z' ;
> 
> Class1Char = '@' | '#' | '$' | '%' | '^' | '&'| '?' | '*' | 
> '-' | '+' | '/' | '='| '<' | '>' | '_'
> | '.' | '`' | ''' ;
> 
> IntDenot = [Sign].[Digit]+ ;
> Sign = '+' | '-' ;
> Digit = '0' | '1' | ... | '9' ;
> 
> 
> I am using Antlr to do the work. The rules IntDenot and 
> SymbolId are overlapped, thus caused lexical nondeterminism 
> between the two grammar rules in lexer.  
> 
> Could anyone give me an idea?

Combine the rules where they conflict. That is handle all tokens that can
begin with UpperCaseChar or Class1Char in the same rule. Use $setType() to
fix eventual token type. KCSParse (and the java grammar too I think) does
the same for handling DOT and numeric literals - see the INT_LITERAL rule in
CSharpLexer.g.

I'm curious. Are you implementing a parser/lexer for Clean or a full Clean
compiler/interpreter?

Micheal

-----------------------
The best way to contact me is via the list/forum. My time is very limited.