[antlr-interest] Lexer problem

Thomas Brandon tbrandonau at gmail.com
Tue Mar 11 02:20:59 PDT 2008


On Tue, Mar 11, 2008 at 4:29 PM, Richard Clark <rdclark at gmail.com> wrote:
>
> On Mon, Mar 10, 2008 at 10:15 PM, Brent Yates <brent.yates at gmail.com> wrote:
> > That being the case, how do I get alts1 and 2 to match when the ID='aaa'
> or ID='bbb' and to not fall into alt3 if they don't match completely?
>
>
> How about post-processing in the Lexer instead of predicates?
>
>
> TEST
> @init { $type = ID; }
>     :   POUND WS?
>         ID (WS DECIMAL_DIGIT { $channel=HIDDEN; if ($ID.text.equals("aaa"))
> $type = DECIMAL_DIGIT; })?
>     ;
>
I think he wants '# aaa' to be an error rather than an ID, WS and
decimal digit must match. His code did the same as that code, the
predicates were redundant.
Maybe try:
TEST:
     POUND WS?
    (   (    'aaa' { aaa action }
        |    'bbb' { bbb action }
        )
        (    WS DECIMAL_DIGIT
        |    { // error action }
        )
    |   ID
    )
    ;
Either log the error or you could throw a recognition exception.
Or you could use gated semantic predicates like:
TEST:
     POUND WS?
    (   (    'aaa' { $type = AAA; }
        |    'bbb' { $type = BBB; }
        |    ID
        )
        (    ( $type == AAA || type == BBB )?=> WS DECIMAL_DIGIT
        |    // Epsilon
        )
    )
    ;
Or you might need the negation of the predicate as a semantic
predicate in the epsilon alternative. Not sure if the gated predicate
will force ANTLR to take the alternative when it matches. You could of
course replace the token type tests with a flag if you don't need to
change the type.

Tom.
>
> Writing complex lexer rules gets tricky (e.g. when trying to write a filter)
> because 1) the lexer doesn't backtrack, and 2) it matches non-fragment rules
> in top-down order so you have to be careful with your ordering. I've had to
> use a whole lot of trial and error.
>
> Good luck :)
>
> ...Richard
>
>


More information about the antlr-interest mailing list