[antlr-interest] Same symbols, but two parsed terms

Gavin Lambert antlr at mirality.co.nz
Fri Aug 8 15:47:30 PDT 2008


At 02:10 9/08/2008, Ñåðã³é Êàðïåíêî wrote:
>For example, we have an expression  "--".   It 
>must be parsed as NOT '-'-symbol.
>
>And we have a gramar:
>
>   expr : NOT? WORD+;
>   NOT : '-';
>   WORD: '-'+;
>
>input string is "--"
>
>result is "--" term.

This is similar to the recent discussion on 
getting "1..2" to be treated as "INT[1] RANGE[..] 
INT[2]" instead of "FLOAT[1.] FLOAT[.2]".

Basically, the problem here is that your tokens 
are left-ambiguous -- when seeing "--" as input, 
ANTLR needs to choose between multiple 
alternatives: "NOT[-] NOT[-]", "NOT[-] WORD[-]", 
or "WORD[--]".  The latter will always win, since 
a single token always wins against multiple tokens.

You can normally resolve this sort of thing by 
merging the rules and adding predicates to decide 
between the alternatives (thereby resolving the 
ambiguity by giving ANTLR more decision-making context).

However there's an added complication here in 
that you want to match any number of '-'s 
afterwards as a WORD.  That gets a bit tricky.

The first thing you need to do is to convert the 
NOT rule into a fragment (so that it still 
defines the token name but never tries to 
directly output it).  Then you need to modify the 
WORD rule to handle emitting a NOT sometimes.

One way to do it would be to use the modification 
discussed on the Wiki that permits you to emit 
multiple tokens from one rule.  Then, when you 
encounter a '-' as the first character in your 
WORD you can emit a NOT and then emit everything else as a WORD.

Another option is to use a semantic predicate to 
do the same sort of thing -- but this time, if 
you detect that you're processing the first 
character of the WORD (either by checking a flag 
or by checking input.LA(0) [previous character]) 
then emit a NOT and exit, letting ANTLR re-enter 
the WORD rule and generate a WORD the next 
time.  This requires a bit more care (since you 
want to avoid "---" coming out as "NOT NOT NOT"), but it's doable.



More information about the antlr-interest mailing list