[antlr-interest] Same symbols, but two parsed terms
antlr at mirality.co.nz
Fri Aug 8 15:47:30 PDT 2008
At 02:10 9/08/2008, Ñåðã³é Êàðïåíêî wrote:
>For example, we have an expression "--". It
>must be parsed as NOT '-'-symbol.
>And we have a gramar:
> expr : NOT? WORD+;
> NOT : '-';
> WORD: '-'+;
>input string is "--"
>result is "--" term.
This is similar to the recent discussion on
getting "1..2" to be treated as "INT RANGE[..]
INT" instead of "FLOAT[1.] FLOAT[.2]".
Basically, the problem here is that your tokens
are left-ambiguous -- when seeing "--" as input,
ANTLR needs to choose between multiple
alternatives: "NOT[-] NOT[-]", "NOT[-] WORD[-]",
or "WORD[--]". The latter will always win, since
a single token always wins against multiple tokens.
You can normally resolve this sort of thing by
merging the rules and adding predicates to decide
between the alternatives (thereby resolving the
ambiguity by giving ANTLR more decision-making context).
However there's an added complication here in
that you want to match any number of '-'s
afterwards as a WORD. That gets a bit tricky.
The first thing you need to do is to convert the
NOT rule into a fragment (so that it still
defines the token name but never tries to
directly output it). Then you need to modify the
WORD rule to handle emitting a NOT sometimes.
One way to do it would be to use the modification
discussed on the Wiki that permits you to emit
multiple tokens from one rule. Then, when you
encounter a '-' as the first character in your
WORD you can emit a NOT and then emit everything else as a WORD.
Another option is to use a semantic predicate to
do the same sort of thing -- but this time, if
you detect that you're processing the first
character of the WORD (either by checking a flag
or by checking input.LA(0) [previous character])
then emit a NOT and exit, letting ANTLR re-enter
the WORD rule and generate a WORD the next
time. This requires a bit more care (since you
want to avoid "---" coming out as "NOT NOT NOT"), but it's doable.
More information about the antlr-interest