[antlr-interest] The ~ operator in lexer rules
Jim Idle
jimi at temporal-wave.com
Wed Sep 17 22:17:29 PDT 2008
On Wed, 2008-09-17 at 21:51 -0700, Matthieu Riou wrote:
> Hi,
>
> I'm trying to write a grammar where 2.3 would be a float style number
> but where the same group of characters in foo.2.3 would be recognized
> as two distinct numbers (like foo[2][3]). I changed my number lexer
> rule to be:
>
> NUM : ~'.' DIGIT+ ('.' DIGIT+)? ;
>
> However this doesn't seem to work, strangely expressions like [2,3,4]
> get rejected with a no viable alternative at input ',3'. Does someone
> have a clue why and how I should tweak my rules to do what I want?
You are going to have to keep state in the lexer, somethign like this:
@lexer::members
{
boolean returnInt = false;
}
ID : ('a'..'z')+
(
('.' ('0'..'9'))=> { returnInt = true; }
|
)
;
DOT : '.';
NUM : ('0'..'9')
(
{returnInt == true}?=>
(
('.' ('0'..'9'))=> { returnInt = true; }
| {returnInt = false; }
)
)
;
And so on, but it will get a lot more complicated. If this is your
language, then don't do this, if it is not, then shoot the designer ;-)
Other approaches might be this kind of thing:
ID : ('a'..'z')+ { input.mark(); }
( ('.' ('0'..'9'))=> '.' ('0'..'9')+ { elementCount++; } )*
{ input.release(); }
;
DOT : '.' ;
NUM : {elementCount > 0}?=> ('0'..'9')+ { elementCount--; };
Or you could override the emit() method and return more than one token
per lexer rule invocation.
Take a look at wiki FAQ on parsing numerics, and the article on emitting
more than one token from a lexer rule.
Jim
>
> Thanks,
> Matthieu
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080917/62406f85/attachment.html
More information about the antlr-interest
mailing list