[antlr-interest] distinguish "3 + 4" from "3 +4"

Sam Barnett-Cormack s.barnett-cormack at lancaster.ac.uk
Wed Oct 5 13:04:39 PDT 2011


Leaving out the question of whether it's a good idea or not (it really
depends on your application, and whether you're implementing a language
that's already defined or not).

Anyway, what I'm talking about is done in the lexer, not the parser, so
it would not need to access the hidden channel - just the lookahead. An 
example doing a much more complex version of what I'm saying is 
http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs 
- you can see there's a single rule for matching several things (in your 
case it would be plus and minux arithmetic operators and integer) and 
sets the type appropriately depending on which path it takes through the 
match.

Getting back to whether it's a good idea or not, it's very unusual to 
have the operator be part of the number token, though conceptually I 
suppose it's still an unary operator in that case... but determining 
whether the operator is unary or binary purely by whitespace is 
definitely unusual.

On 05/10/2011 12:40, Andreas Liebig wrote:
> Thank you for the reply, Sam. Is this the int/float wiki article you
> mentioned:
> http://www.antlr.org/wiki/pages/viewpage.action?pageId=3604497 ("How
> can I emit more than a single token per lexer rule?")? Unfortunately
> I cannot see how it helps in my case. The wiki article uses the
> difference in non-whitespace characters, whereas I have to make use
> of the hidden whitespace characters. I read somewhere that it is
> actually possible to check the hidden channel if necessary, but I
> cannot find any details.
>
> One more example for my situation: "3+4" should be parsed as NUMBER
> PLUS NUMBER, the same as "3 + 4".
>
> Still looking forward to more suggestions. Andreas
>
>
>
> ----- Original Message ----- From: Sam
> Barnett-Cormack<s.barnett-cormack at lancaster.ac.uk> To: Andreas
> Liebig<liebigandreas at yahoo.com> Cc:
> "antlr-interest at antlr.org"<antlr-interest at antlr.org> Sent: Wednesday,
> October 5, 2011 1:20 PM Subject: Re: [antlr-interest] distinguish "3
> + 4" from "3 +4"
>
> On 05/10/2011 12:14, Andreas Liebig wrote:
>> Hello, I am not very experienced with ANTLR, and I would like to
>> ask for some ideas how to solve this task:
>>
>> I have to distinguish input streams like "3 + 4" (parsed as three
>> tokens NUMBER PLUS NUMBER) from "3 +4" (parsed as NUMBER NUMBER,
>> because the + is part of the number +4).
>>
>> I would like to ignore whitespace in general using the
>> "$channel=HIDDEN;" syntax. But only in this situation whitespace
>> does matter. Can you guide me to a good explanation of a possible
>> solution?
>
> Don't really know what docs are where, but off the top of my head...
> you need to have your NUMBER lexer rule start with an optional +, or
> presumably actually + or -, so a human-readable version of the
> grammar would have something like
>
> NUMBER : ('+'|'-')? DIGIT+; PLUS : '+'; MINUS : '-';
>
> Of course, ANTLR won't like that, because it's ambiguous. There are a
> few ways to resolve the ambiguity, you'll see one if you look on the
> wiki about how to differentiate between ints and floats in the lexer.
> I can't remember the syntax off the top of my head.
>
> Sam



More information about the antlr-interest mailing list