[antlr-interest] ANTLR grammar: Clarifications needed

Fri Apr 30 13:29:17 PDT 2004

Thanks to Mark and Anakreon for your comments. I clearly understand the
problem now. I hope to set it right today and give you feedback asap.

Thanks again.

-----Original Message-----
From: Mark Lentczner [mailto:markl at glyphic.com] 
Sent: Wednesday, April 28, 2004 5:31 PM
To: antlr-interest at yahoogroups.com
Subject: Re: [antlr-interest] ANTLR grammar: Clarifications needed

> 1) If I make BOOLEAN "protected", I am unable to refer it in the 
> PARSER.
Correct.  Protected lexer rules are for use only from some other lexer 
rule.  This often removes non-determinism because the calling lexer 
rule supplies the context.

> 2) If I make BOOLEAN "protected" and use $setType command in another 
> rule,
> ----------------
> Number_or_bit: ('0'|'1') {$setType(BOOLEAN); $setType(NUMBER);} | 
> ('2'..'9')
> {$setType(NUMBER);}
> ----------------
> It doesn't work.
Well, this was close to a reasonable approach: but "{$setType(BOOLEAN); 
$setType(NUMBER);}" was an error - you can't have two types attached to 
a token, and you can't do this in a parser rule.

> 3) If I say "boolean: "1"|"0";" in the parser, it doesn't work as I 
> thought.
No, this sets "1" and "0" to be literals added to the literal table.  
Probably not what you want.  If you have literal testing turned on in 
the lexer (it is by default), then these strings will never match any 
production in the lexer including NUMBER.

> How can I use BOOLEAN and NUMBER based on the context in which they 
> appear
> without having non-determinisms?
Short answer: you can't do this in the lexer.
Long answer:

You must take one of two approaches:

1) have two token types

NON_BOOLEAN_NUMBER: '2'..'9' ( '0'..'9' )* ;
BOOLEAN: ( '0' | '1' ) ( '0'..'9' { $setType(NON_BOOLEAN_NUMBER); } ( 
'0'..'9' )* )? ;

then in the parser you have to have rules like:

number: NON_BOOLEAN_NUMBER | BOOLEAN ;
boolean: BOOLEAN ;

2) have one token type

NUMBER: ('0'..'9')+ ;

then in the parser use rules like:

number: NUMBER ;
boolean: n:NUMBER { if (#n.getText() != "1") && (#n.getText() != "0") { 
new ANTLRException(...); } } ;

Exactly which form to use and which direction take depends quite a bit 
on the grammar of the language you're parsing.  How does that language 
deal with the confusing between BOOLEAN and NUMBER?  Does the parsing 
context always tell you which form you need?  For some constructs, does 
how to parse it depend on if the user wrote a boolean value or number 
greater than 1?  Is this true in all cases or only some?  Does the 
language consider "00" boolean as well?

Without knowing more, it is hard to say what to do: The problem isn't 
really with your lexer rules, it is with the grammar you are trying to 
parse: Sometimes it thinks "1" is a boolean, sometimes its a number.  
The question is how to resolve that.  Once you know how the parser 
rules should resolve the ambiguity, then you can design the lexer rules 
to support it.

In programming language design, things are always much more intertwined 
than one thinks...

   - Mark

Mark Lentczner
markl at wheatfarm.org
http://www.wheatfarm.org/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/