[antlr-interest] running a second lexer on unbounded input

Tue Dec 2 13:49:20 PST 2008

On Tue, 2008-12-02 at 14:52 -0500, Ernest Pasour wrote:

> I have been working with a grammar for ActionScript, which is a rather freeform language.  One expression type that is allowed in the source code is raw xml.  For instance, the following code is legal:
> 
> var x:XML=<a>
>    </a>;
> 
> Or
> 
> //note the lack of semicolon
> var x:XML=<a>
>    </a>
> var i:int;
> 
> What is the best Antlr strategy for running a second lexer ?  I'm not an Antlr expert, but I think I want a second lexer that will process the same input stream that the main lexer started with (as opposed to having to hand-code an XML lexer to consume text inside an action or creating a separate input stream from the rest of the input document).   Will a sub-lexer work if the input has "extra" input on the end?  Is there a standard strategy for this type of problem in Antlr?

A second lexer is the best strategy. I used this approach for VB.Net for
instance. Essentially the lexer does all the work of finding the end of
the XML string and returns XML as a token, then you hand off XML parsing
to something else. Just be careful about encoding error detection
strategies into the lexer rules or error recovery is tough. Getting the
trigger points for the second lexer is also a little tricky as you need
to do it with minimal context. But as XML can only follow certain
things, such as '=', then =< is a trigger point and so on.

Jim

> 
> Thanks,
> Ernest
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081202/c2ce9c6b/attachment.html