[antlr-interest] ANTLR3 Nested parser

Thomas Brandon tbrandonau at gmail.com
Tue Jan 22 16:45:38 PST 2008


Lexing and parsing are completely independent and cannot influence each
other. In fact, in the current implementation of CommonTokenStream the
entire file is lexed when the first token is requested. The trouble with
using the nested parser to detect the extents is by the time it runs the
nested lexer will have already processed the entire input stream producing
errors you would have to ignore. You can try Jim's suggestion however it
seems that that will complicate error handling, with a likelihood of tokens
being processed by the wrong lexer.
It may be easier to write fragment lexer rules that did a basic parse of
your nested language. You should be able to take any parser rules, convert
all except the root rule to fragments and use them as is in a lexer
(assuming you don't use parameters, return values or scopes which would need
to be replaced with global variables). Then you can simplify them as much as
possible. Along the lines of the ANTLR grammar which parses actions for a
few elements (strings, character literals, curly delimited blocks etc) to
determine the end delimiter.

Tom.
On Jan 23, 2008 9:40 AM, Bertalan Fodor <lilypondtool at organum.hu> wrote:

>  Now I could try this. The problem is that I would have to emit the
> EOF_TOKEN from the parser, because the lexer has no idea whether the nested
> language input has come to its end or not. But the token buffer surely
> contains tokens that are not in the nested language, so I have to get out
> from the nested lexer and roll back the consumed tokens somehow.
>
> Do you have an idea for the solution?
>
> Thank you,
>
> Bert
>
> Thomas Brandon írta:
>
> Check out the island-grammar example. It shows lexer based nesting. The
> pertinent code in the main lexer is:
> JAVADOC : '/**'
>           {
>             // create a new javadoc lexer/parser duo that feeds
>             // off the current input stream
>             System.out.println("enter javadoc");
>             JavadocLexer j = new JavadocLexer(input);
>             CommonTokenStream tokens = new CommonTokenStream(j);
>             tokens.discardTokenType (JavadocLexer.WS);
>             JavadocParser p = new JavadocParser(tokens);
>             p.comment();
>             // returns a JAVADOC token to the java parser but on a
>             // different channel than the normal token stream so it
>             // doesn't get in the way.
>             $channel = JAVADOC_CHANNEL;
>           }
>         ;
> And in the nested lexer:
> /** When the javadoc parser sees end-of-comment it just says 'I'm done',
> which
>  *  consumes the tokens and forces this javadoc parser (feeding
>  *  off the input stream currently) to exit.  It returns from
>  *  method comment(), which was called from JAVADOC action in the
>  *  Simple parser's lexer.
>  */
> END     : '*/' {token = Token.EOF_TOKEN;}
>           {System.out.println("exit javadoc");}
>         ;
>
> Your code in the outer lexer looks OK, just passing the same input stream
> should keep it synched. Check your code for exiting the inner lexer.
>
> Tom.
> On Jan 22, 2008 8:00 PM, Bertalan Fodor (LilyPondTool) <
> lilypondtool at organum.hu> wrote:
>
> > Thank you for your answer.
> > If I understand your suggestion correctly, the problem with it is that I
> > can't lex the symbol: actually it is an embedded language, so only the
> > nested lexer and parser can find the end of the embedded part. So actually I
> > would need the following: feed the nested lexer/parser with all the input
> > beginning from the '/**' symbol and let it parse it. Then when the parsing
> > is over, I'd like to set the non-nested lexer/parser to the end position of
> > the nested parsing. So the problem is how to set the input stream position
> > to the end of the embedded part, either rewinding or forward winding. Maybe
> > I can use the return value of the nested rootRule() to find the position.
> > However I could not find a way yet how to do this all.
> >
> > Anyway, if you have some example of some nested parsing, that could
> > probably effectively help me.
> >
> > Thanks,
> >
> > Bert
> >
> > Harald M. Müller wrote:
> >
> > I wouldn't do it like this.
> > If you want to really do this in the (non-nested) lexer: "Lex" the symbol;
> > and then start a NEW StringReader on the symbols's getText(), from which you
> > feed your (nested) lexer and parser.
> > Maybe you want to do this in the (non-nested) parser ... easy if the symbol
> > turns up there; a little work if you pushed the symbol into the HIDDEN
> > channel (or some other channel) in the (non-nested) lexer, so that you have
> > to "undig" it somehow in the (non-nested) parser
> > Hope this very short explanation helps.
> > Regards
> > Harald
> >
> >
> >
> >  -----Original Message-----
> > From: antlr-interest-bounces at antlr.org
> > [mailto:antlr-interest-bounces at antlr.org <antlr-interest-bounces at antlr.org>] On Behalf Of Bertalan Fodor
> > Sent: Tuesday, January 22, 2008 12:45 AM
> > To: antlr-interest at antlr.org
> > Subject: [antlr-interest] ANTLR3 Nested parser
> >
> > Hi,
> >
> > I'm creating a parser with nested parser. To make it simple I
> > tell you my problem as if I was doing Javadoc parsing inside
> > a Java parser.
> > So I have this in my lexer:
> > JAVADOC: '/**' { JavadocParser javadocParser = new
> > JavadocParser(new CommonTokenStream(new
> > JavadocLexer(this.input))); javadocParser.rootRule(); }
> >
> > The problem is that while this code correctly switches to the
> > Javadoc lexer, and parses the Javadoc parts correctly, upon
> > returning from the Javadoc parsing, the character stream is
> > not correctly positioned.
> >
> > Can you help me how to achieve the nested parsing in antlr 3?
> >
> > Thank you very much,
> >
> > Bertalan Fodor
> >
> > ps I've already asked this on this list, but got no answer,
> > so I tried to make my question more simple.
> >
> >
> >
> >
> >
> >  --
> > LilyPondTool is the editor for LilyPond files.
> > See http://lilypondtool.organum.hu
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080123/3aa663f4/attachment.html 


More information about the antlr-interest mailing list