[antlr-interest] ANTLR 3 Lexical States

Fri Jan 25 10:11:44 PST 2008

Ah, well you didn’t say that you were controlling the state from the
parser ;-) ANTLR3 isn’t really designed to do that, and anyway there are
lots of potential problems with lookahead. Are you sure that you are not
trying to do too much in the lexer (I know, it worked in ANTLR2)?

Could you use STRING when the tokens can be the same, then inspect the
string in the parser and use gated predicates to select particular alts?
If your parser already knows that the next pattern ‘a’..’z’ should be
interpreted as a SPECIAL_STRING rather than STRING, then I am not sure
why you need to control the state of the lexer from the parser, but
perhaps the lexer would sometimes return two or more different tokens
rather than a dingle token.

Another option may be to derive your own class from CommonTokenStream
and have the LA (and possibly LT and LB methods) look at the state. When
it sees a STRING token (as in just get the lexer to match the base
patterns) and the parser says it is in state xyz, then change the token
type before it goes to the parser. Of course, you run into the issue of
lookahead and so on. It also would not work if the lexer would have
sometimes returned two tokens rather than one ‘bigger’ one.

Yet another possibility is embedded parsers/island grammars.

However, you should also check to see if your issue isn’t that you want
to match something like:

“ll” AND “hh”

And for some reason want the same token set to match as different tokens
when it is really only semantically rather than syntactically different.
I assume that this is not your issue though. What is it you are trying
to parse?

Jim

From: Bertalan Fodor (LilyPondTool) [mailto:lilypondtool at organum.hu] 
Sent: Friday, January 25, 2008 8:08 AM
To: Jim Idle
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] ANTLR 3 Lexical States

Yes, that's a good idea, but that doesn't solve the problem that the
state change must be done in the parser. So in the switch(state)
statement the value of state is always NORMAL, because the lexing will
be done first. 
Now I'm thinking of the following possibilities:
- Harald Müller's lexing parser - as I see currently it doesn't work
with overlapping Lexer rules, like if in the example below STRING is
'a'..'z' and SPECIAL_STRING is '<'|'a'
- David Holroyd's lazy token stream - with which I see the problem that
it lazily loads the tokens from the source, but not from the source, so
I may not be able to change the token type according to lexical state
- handling all lexer-state-pushing situations as recursively embedded
island-grammars - the problem is that these islands actually can be
infinitely embedded in each other.
- going back to Antlr 2
- writing the lexer with JFlex 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080125/c0050c34/attachment.html