[antlr-interest] lexical rule overlapping
Micheal J
open.zone at virgin.net
Thu Jun 1 05:22:18 PDT 2006
> Hi,
>
> I am attempting implement Clean 0.8.4. According the
> following grammar (extracted from page 80 of appendix.ps on
> ftp://ftp.cs.kun.nl/pub/Clean/old/Clean08/doc/ps/), +1234
> could be either considered as SymbolID or IntDenot.
>
> SymbolId = UpperCaseChar.{RestChar} | Class1Char.{RestChar} ;
>
> RestChar = LowerCaseChar | Class1Char | Digit| UpperCaseChar
> | CharDel | StringDel ; LowerCaseChar = 'a' | 'b' | ... | 'z'
> ; UpperCaseChar = 'A' | 'B' | ... | 'Z' ;
>
> Class1Char = '@' | '#' | '$' | '%' | '^' | '&'| '?' | '*' |
> '-' | '+' | '/' | '='| '<' | '>' | '_'
> | '.' | '`' | ''' ;
>
> IntDenot = [Sign].[Digit]+ ;
> Sign = '+' | '-' ;
> Digit = '0' | '1' | ... | '9' ;
>
>
> I am using Antlr to do the work. The rules IntDenot and
> SymbolId are overlapped, thus caused lexical nondeterminism
> between the two grammar rules in lexer.
>
> Could anyone give me an idea?
Combine the rules where they conflict. That is handle all tokens that can
begin with UpperCaseChar or Class1Char in the same rule. Use $setType() to
fix eventual token type. KCSParse (and the java grammar too I think) does
the same for handling DOT and numeric literals - see the INT_LITERAL rule in
CSharpLexer.g.
I'm curious. Are you implementing a parser/lexer for Clean or a full Clean
compiler/interpreter?
Micheal
-----------------------
The best way to contact me is via the list/forum. My time is very limited.
More information about the antlr-interest
mailing list