[antlr-interest] Re: lexer "modes" for XML parsing etc...
Terence Parr
parrt at cs.usfca.edu
Sun Nov 20 11:42:32 PST 2005
On Nov 20, 2005, at 10:54 AM, Martin Probst wrote:
> Now I'm working with a manually written Lexer that follows (1), e.g.
> state switching is exclusively done by the Lexer. This works nicely,
> except that a handwritten Lexer for a lexically complex (23 states,
> 200
> different Token types) language is also a real pain. Slightly
> better as
> there are no bugs in the interop between the lexer and the parser, as
> it's only calling nextToken(), but still. This is why I'm trying to
> prod
> Terence into providing better support for stateful lexers ;-)
Your wish is my command. ;) Do we need something like
lexer grammar L;
ID : ... ;
SQLSTART : "sql(" {pushContext(SQL);} ;
WS : ... ;
context SQL {
ID : ... ;
ACTION : ...;
STRING : ... ;
ENDSQL : ')' {popContext();}
}
context island2 {
...
}
[note the push/pop rather than simple set; very useful]
Then, the lexer would simply generate multiple Tokens-like rules for
all contexts? You see a different lexer entry rule for each
context. How do you switch? We'd need an int constant (as we have
no function poitners in Java--a pox on their family) that would jump
to the right starting method.
Sounds easy. Is this what we want? It is proper for island grammars
that feed off the same input stream. Multiple input streams like
include files need to be handled with a multiplexing input buffer.
> Solving (2) would probably include identifying the sections where
> different tokens are possible depending on the lookahead decision,
> marking the character(!) stream and re-lexing the token(s) in the case
> of mismatches. That is IMHO complete overkill. It should be
> possible to
> pull down the rules about states etc. into the Lexer with any sane
> language.
Agreed. THat is really hard.
ter
More information about the antlr-interest
mailing list