[antlr-interest] ANTLR grammar: Clarifications needed
Bharath S
bharath at starthis.com
Fri Apr 30 13:29:17 PDT 2004
Thanks to Mark and Anakreon for your comments. I clearly understand the
problem now. I hope to set it right today and give you feedback asap.
Thanks again.
-----Original Message-----
From: Mark Lentczner [mailto:markl at glyphic.com]
Sent: Wednesday, April 28, 2004 5:31 PM
To: antlr-interest at yahoogroups.com
Subject: Re: [antlr-interest] ANTLR grammar: Clarifications needed
> 1) If I make BOOLEAN "protected", I am unable to refer it in the
> PARSER.
Correct. Protected lexer rules are for use only from some other lexer
rule. This often removes non-determinism because the calling lexer
rule supplies the context.
> 2) If I make BOOLEAN "protected" and use $setType command in another
> rule,
> ----------------
> Number_or_bit: ('0'|'1') {$setType(BOOLEAN); $setType(NUMBER);} |
> ('2'..'9')
> {$setType(NUMBER);}
> ----------------
> It doesn't work.
Well, this was close to a reasonable approach: but "{$setType(BOOLEAN);
$setType(NUMBER);}" was an error - you can't have two types attached to
a token, and you can't do this in a parser rule.
> 3) If I say "boolean: "1"|"0";" in the parser, it doesn't work as I
> thought.
No, this sets "1" and "0" to be literals added to the literal table.
Probably not what you want. If you have literal testing turned on in
the lexer (it is by default), then these strings will never match any
production in the lexer including NUMBER.
> How can I use BOOLEAN and NUMBER based on the context in which they
> appear
> without having non-determinisms?
Short answer: you can't do this in the lexer.
Long answer:
You must take one of two approaches:
1) have two token types
NON_BOOLEAN_NUMBER: '2'..'9' ( '0'..'9' )* ;
BOOLEAN: ( '0' | '1' ) ( '0'..'9' { $setType(NON_BOOLEAN_NUMBER); } (
'0'..'9' )* )? ;
then in the parser you have to have rules like:
number: NON_BOOLEAN_NUMBER | BOOLEAN ;
boolean: BOOLEAN ;
2) have one token type
NUMBER: ('0'..'9')+ ;
then in the parser use rules like:
number: NUMBER ;
boolean: n:NUMBER { if (#n.getText() != "1") && (#n.getText() != "0") {
new ANTLRException(...); } } ;
Exactly which form to use and which direction take depends quite a bit
on the grammar of the language you're parsing. How does that language
deal with the confusing between BOOLEAN and NUMBER? Does the parsing
context always tell you which form you need? For some constructs, does
how to parse it depend on if the user wrote a boolean value or number
greater than 1? Is this true in all cases or only some? Does the
language consider "00" boolean as well?
Without knowing more, it is hard to say what to do: The problem isn't
really with your lexer rules, it is with the grammar you are trying to
parse: Sometimes it thinks "1" is a boolean, sometimes its a number.
The question is how to resolve that. Once you know how the parser
rules should resolve the ambiguity, then you can design the lexer rules
to support it.
In programming language design, things are always much more intertwined
than one thinks...
- Mark
Mark Lentczner
markl at wheatfarm.org
http://www.wheatfarm.org/
Yahoo! Groups Links
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list