[antlr-interest] Common left prefix for Antlr tokens...

Seref Arikan serefarikan at kurumsalteknoloji.com
Mon Jan 16 10:52:58 PST 2012


Hi Stuart,
I've come across a similar (almost same probably) problem just an hour
before I've seen your message.
In my case, I have a token called special chars, and it includes ']'
In a grammar rule the the text [blabla[at0003]] is supposed to be
open_bracket myexpr_rule close_bracket. The problem is, if there is no
white space, the ]] at the end is recognized as part of myexpr.

The reason it can be recognized is that the token definition has one or
more cardinality, that is: ('|' |'(' | ')' |'\\' | '^' | '{' |  '}' | '[' |
']')+
In my case, I've changed that to '|' |'(' | ')' |'\\' | '^' | '{' |  '}' |
'[' | ']' so that each of these characters end up as a token. I've handled
the cardinality of occurance of characters in the parser rule.

Based on my (very) limited experience I've found out that these type of
issues pop up if you have a token type which is meant to serve a particular
thing in the grammar, but then it keeps creeping into other rules. The more
tokens I have, the harder it becomes to handle things in parser rules. With
less token types and heavier use of parser rules, I seem to have few
issues.

I am not sure if this is the right approach, but for me, handling things in
the lexer is not working, because there are contexts that arise in the
parser, and by the time you reach that point, the lexer is already done.
So my humble suggestion is try to shift your solution a little bit more to
parser, that is working better for me at the moment.

Regards
Seref


On Mon, Jan 16, 2012 at 3:08 PM, Stuart Dootson <stuart.dootson at gmail.com>wrote:

> Hello
>
> One of my colleagues has been using Antlr 3 to create a lexer/parser
> for the L5K language (used to program Allen-Bradley PLCs). This has
> proceeded generally well, until coming across a little problem.
>
> The problem is with the array literal start token ('[') and an
> 'extended property' indicator ('[[[___'). More specifically, nested
> arrays with no whitespace between the outer and inner array start, for
> example "[[1], 2]", are interpreted by Antlr as an extended property
> introduction, causing a "mismatched character" exception.
>
> I have come up with a workaround, by overriding the 'emit' and
> 'nextToken' methods of the lexer, to allow the strings "[[" and "[[["
> to be converted to multiple "[" tokens through calling 'emit' in
> actions, but was wondering if this use-case can be implemented without
> requiring this extra code, through use of one or more options on the
> grammar/rules?
>
> A minimal Antlr grammar is appended...
>
> Stuart Dootson
>
> grammar arrays;
>
> stat
>        :       array
>        |       EXTENDED_PROP
>        ;
>
> array
>        :        LSQ value ( ',' value)* RSQ
>        ;
>
> value
>        :       INT
>        |       array
>        ;
>
> INT     :       ('0' .. '9')+
>        ;
>
>
> EXTENDED_PROP
>        : '[[[___'
>        ;
>
> LSQ     :       '['
>        ;
>
> RSQ     :       ']'
>        ;
>
> WS      : (' '|'\n'|'\r')+ {$channel=HIDDEN;}
>        ;
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list