[antlr-interest] Switching between different lexers from with in parser

Wed Jan 23 09:01:45 PST 2002

> is there a possibility to switch in a clean and deterministic way
> beetween different lexers from within the *parser*? 

Not really, since you always have already consumed at least k tokens.  To
switch from the parser would require some sort of rewind mechanism on your
input stream and then some synchronization from within the parser.  The real
problem is that where you are in the parser is dependent on what was in the
lookahead, and now you've just switched out from under it.  Doing this
within, say, a series of alternates could really confuse the parser.

> In contrast to the Javadoc example, where switching between different
> lexers is done from within the lexers, I have to deal with a language 
> where this is only possible from within the parser.
> (Think e.g. of C with embedded SQL, but without the leading 
> "EXEC SQL" 
> which usually introduces each statement of embedded SQL.)
> 
> In my language, the parser is the only instance which is able 
> to decide
> which lexer has to be chosen next. On the other hand, due to 
> the look-ahead 
> and the loose coupling between parser and lexers through the 
> token stream, 
> the wrong lexer might already have scanned and consumed 
> characters which 
> were in fact intended for the other lexer.
> 
> Any ideas?

Post a couple of worst case examples so we have something to chew on.  If it
is mostly a problem with different sets of literals then it may be easy to
solve.  How different are the lexers?  If the tokens are always broken up at
the same boundaries then there may be a way to have multiple types to tokens
explicitly checked with semantic predicates.  

Also remember that the lexer can use an infinite amount of lookahead via
syntactic predicates.  

I've successfully parsed ambiguous languages where keywords were
identifiers--"if if = true then then = false", etc.

You may be able to use a TokenStream between the lexer and parser which does
some "fuzzy parsing" to determine whether it is C or SQL and then change the
type before the parser sees it.  Think of it as a pre-parsing step.

For instance I wrote a BASIC pre-parser which looks for anything following a
GOTO or GOSUB or RETURN TO and forces it to be a label, so the parser
doesn't have any ambiguities about numbers following a GOTO.  But you can
have any arbitrarily complex parser in the middle, you just have to worry
about making sure every token the parser consumed somehow gets handled
properly--either discarded or queued after possibly being transformed and or
coalesced.

Monty

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/