[antlr-interest] Grammar Perplexity in v3.0 (More)

Sun Nov 12 15:04:33 PST 2006

>You can find the complete grammar file and a sample of valid input here:
>
>- <http://64.142.14.4/~rschulz/TSTP.g>
>- <http://64.142.14.4/~rschulz/MED001+0.ax>
>
>The grammar has changed a bit since I wrote the original message here, 
>but the problems remain.

When I run org.antlr.Tool on the above TSTP.g file I get (this from the
command line and also similiar from within AntlrWorks) this message:

ANTLR Parser Generator   Early Access Version 3.0b4 (??, 2006)  1989-2006
TSTP.g:943:21: Decision can match input such as "':' {DistinctObject, LowerWord..SingleQuoted, '['}" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input

Sorry that I already deleted your previous e-mails so am not sure whether
or not this is your original error or not.

Notice that the error is complaining about an ambiguity on ':' when it is
followed by one of set the 4 tokens in the {Di...['} part of the message.

So we need to look for uses of ':' to discover the source of the ambiguity.
And we find:

parentInfo : source ( ':' parentDetail )? ;
source : generalTerm ;
generalTerm : generalData ( ':' generalTerm )? | generalList ;

and we quickly realize that, when parsing a parentInfo and encounter a ':', we
are unable to determine whether we have finished the `generalTerm` comprising
a `source` and are moving onto parsing a `parentDetail` or whether we are
still inside the `source`s `generalTerm` and are about to recurse.

I also seem to recall that part of your original request for help was the
inability for the `plainTerm` rule to recognize the string "gt". I think you
listed the relevant rules in your original e-mail --- which I deleted already
-- so here they are again:

plainTerm : atomicWord ( '(' arguments ')' ) ? ;
atomicWord : LowerWord | SingleQuoted ;
SingleQuoted : '\'' ( ~( '\'' | '\\' ) | '\\' '\'' | '\\' '\\' )* '\'' ;
fragment LowerWord : LowerAlpha Alphanumeric * ;
^^^^^^^^

notice that you have specified LowerWord to be a fragment, a component of
other Tokens and *not* a Token in its own right. LowerWord will never be
emitted as a Token by your lexer (in its current form).

Antlr should give you an error for this, saying that you have referenced a
lexical fragment from within a parser rule.

and since "gt" is a LowerWord, which the parser can never see, a syntax error
is generated.

one way to fix this situation - where you have a lexer rule that is both a
fragment and a token - is to use a trampoline:

LowerWord : LW ; // use this in parser rules
LW : LowerAlpha Alphanumeric* ; // use this in lexer rules

hope this helps...
   -jbb