[antlr-interest] Grammar Perplexity in v3.0 (More)
John B. Brodie
jbb at acm.org
Sun Nov 12 15:04:33 PST 2006
>You can find the complete grammar file and a sample of valid input here:
>
>- <http://64.142.14.4/~rschulz/TSTP.g>
>- <http://64.142.14.4/~rschulz/MED001+0.ax>
>
>The grammar has changed a bit since I wrote the original message here,
>but the problems remain.
When I run org.antlr.Tool on the above TSTP.g file I get (this from the
command line and also similiar from within AntlrWorks) this message:
ANTLR Parser Generator Early Access Version 3.0b4 (??, 2006) 1989-2006
TSTP.g:943:21: Decision can match input such as "':' {DistinctObject, LowerWord..SingleQuoted, '['}" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
Sorry that I already deleted your previous e-mails so am not sure whether
or not this is your original error or not.
Notice that the error is complaining about an ambiguity on ':' when it is
followed by one of set the 4 tokens in the {Di...['} part of the message.
So we need to look for uses of ':' to discover the source of the ambiguity.
And we find:
parentInfo : source ( ':' parentDetail )? ;
source : generalTerm ;
generalTerm : generalData ( ':' generalTerm )? | generalList ;
and we quickly realize that, when parsing a parentInfo and encounter a ':', we
are unable to determine whether we have finished the `generalTerm` comprising
a `source` and are moving onto parsing a `parentDetail` or whether we are
still inside the `source`s `generalTerm` and are about to recurse.
I also seem to recall that part of your original request for help was the
inability for the `plainTerm` rule to recognize the string "gt". I think you
listed the relevant rules in your original e-mail --- which I deleted already
-- so here they are again:
plainTerm : atomicWord ( '(' arguments ')' ) ? ;
atomicWord : LowerWord | SingleQuoted ;
SingleQuoted : '\'' ( ~( '\'' | '\\' ) | '\\' '\'' | '\\' '\\' )* '\'' ;
fragment LowerWord : LowerAlpha Alphanumeric * ;
^^^^^^^^
notice that you have specified LowerWord to be a fragment, a component of
other Tokens and *not* a Token in its own right. LowerWord will never be
emitted as a Token by your lexer (in its current form).
Antlr should give you an error for this, saying that you have referenced a
lexical fragment from within a parser rule.
and since "gt" is a LowerWord, which the parser can never see, a syntax error
is generated.
one way to fix this situation - where you have a lexer rule that is both a
fragment and a token - is to use a trampoline:
LowerWord : LW ; // use this in parser rules
LW : LowerAlpha Alphanumeric* ; // use this in lexer rules
hope this helps...
-jbb
More information about the antlr-interest
mailing list