[antlr-interest] Question on ambiguouity
Stuart Dootson
stuart.dootson at gmail.com
Wed Dec 27 01:22:08 PST 2006
On 12/27/06, James Mello <james.mello at intelligentdiscovery.com> wrote:
> I'm relatively new to this whole parser grammer thing and I'm trying to get
> up to speed as quickly as possible. This is a particularly troubling thing
> as I can't quite grok why this doesn't work.... So given this simple set of
> rules and productions....
>
> class MyParser extends Parser;
>
> options
> {
> buildAST = true;
> k = 2;
> }
>
> multipleIDs :
> (ID)+
> ;
>
> class MyLexer extends Lexer;
>
> options
> {
> k = 2;
> charVocabulary = '\u0000'..'\ufffe';
> caseSensitive = false;
> }
>
> ID :
> 'a'..'z'
> ;
>
> The expression for multipleIDs compiles cleanly...
>
> When you change the multipleIDs rule to the following
>
> multipleIDs :
> ID (multipleIDs)*
>
> You end up with a warning that says:
>
> Nondeterminism upon K==1:ID K==2:ID between alt 1 and exit branch of block
>
> I'm trying to figure out why this is the case as it seems to me that a
> recursive rule like this should work. I've looked a bit for some info on the
> faq and didn't find much on this.
>
> Finally, since this is NOT the way to write recursive rules, how does one go
> about doing this correctly?
>
>
> James Mello : Software Engineer - ATS, Inc.
> web: www.intelligentdiscovery.com | (p) 360.698.7100x236 | (f) 360.698.7200
Presuming you're just wanting to say that 'multipleIDs' consists of 1
or more 'ID's, then this will work fine:
multipleIDs :(ID)* ;
You'll also need a lexer rule that defines ID separators - for example
whitespace - with something like this:
WS :
(' '
| '\t'
| '\r' '\n' { newline(); }
| '\n' { newline(); }
)
{ $setType(Token.SKIP); }
;
This tells the lexer to use whitespace as token delimiters, but not to
pass it on to the parser.
HTH
Stuart Dootson
More information about the antlr-interest
mailing list