[antlr-interest] accepting nested code blocks

Fri Oct 16 19:08:17 PDT 2009

Yes you can do it - you probably need to keep state flags and either trigger lexer rules based upon them or perhaps better would be to trigger an external lexer. The main problem is error recovery - what does your lexer do if the Javascript does not have perfectly matched '{' '}' and so the lexer rules drops out. 

However, if all you need do is consume the JS and say "this is a blob of JS", then I would write a small method that knows how to consume a Javascript function. Probably easier than writing it all out as recursive lexer rules. 

I have used this recursive technique for embedded XML and similar.

However, another thought is that if you have any control over the language, you should change it so that it does not just arbitrarily generate Javascript but delimits it in some reasonable way with say << javascript >> or some other delimiter. Then your lexer can just pick that out and strip off the delimiters.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Espák Miklós
> Sent: Tuesday, October 13, 2009 11:11 PM
> To: Indhu Bharathi
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] accepting nested code blocks
> 
> Hi,
> 
> I understand your point of view, but the book states explicitly the
> following:
> 
> "ANTLR generates recursive-descent recognizers
> for lexers just as it does for parsers and tree parsers. Consequently,
> ANTLR supports recursive lexer rules, unlike other tools such as lex."
> 
> Using recursion it should be possible to create such a lexer rule. If
> not, what it can be used for?
> 
> My original problem is that the input files contain a JavaScript
> function definition. The other parts of the input are covered by the
> grammar. However, I do not need to check the validity of the JS
> function, just extract it as is, and pass to the JS engine later. So
> if it is not necessary, I do not want to parse it.
> 
> Is it possible somehow? Or should I denote the beginning and the end
> of the JS function by some special token to allow catching it by a
> lexer rule?
> 
> Cheers,
> Miklos
> 
> 2009/10/13 Indhu Bharathi <indhu.b at s7software.com>:
> > Balanced parenthesis cannot be expressed using regular expression
> which
> > means you cannot recognize it using lexer. You need a push down
> automata
> > which means you need a parser to recognize it. Try doing it using
> parser
> > rules.
> >
> >
> >
> > Cheers, Indhu
> >
> >
> >
> >
> >
> > From: antlr-interest-bounces at antlr.org
> > [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Espák Miklós
> > Sent: Tuesday, October 13, 2009 10:04 PM
> > To: antlr-interest at antlr.org
> > Subject: [antlr-interest] accepting nested code blocks
> >
> >
> >
> > Hi,
> >
> > I want to create a lexer rule accepting nested code blocks.
> >
> > I tried out the example of the Definitive ANTLR Reference (Section
> 4.3), but
> > it does not work.
> > It accepts only such inputs which do not contain any character other
> than
> > curly braces. Moreover, one closing brace is enough.
> >
> > The error is the following:
> > MismatchedTokenException: line 1:1 mismatched input UNKNOW expecting
> 125
> >
> > The original code of the book:
> >
> > fragment
> > CODE[boolean stripCurlies]:
> >   '{' ( CODE[stripCurlies] | ~('{' |'}' ) )* '}'
> >   {
> >     if ( stripCurlies ) {
> >       setText(getText().substring(1, getText().length()));
> >     }
> >   }
> >   ;
> >
> > The simplified version of the rule results the same:
> > fragment
> > Block: '{' ( Block | ~('{'|'}') )* '}';
> >
> > I use ANTLR 3.2.
> >
> > Does anybody have an idea, how to get around this?
> >
> > Thanks,
> >
> > Miklos
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address