[antlr-interest] accepting nested code blocks

Espák Miklós espakm at gmail.com
Tue Oct 13 10:41:25 PDT 2009


Hi,

I understand your point of view, but the book states explicitly the following:

"ANTLR generates recursive-descent recognizers
for lexers just as it does for parsers and tree parsers. Consequently,
ANTLR supports recursive lexer rules, unlike other tools such as lex."

Using recursion it should be possible to create such a lexer rule. If
not, what it can be used for?

My original problem is that the input files contain a JavaScript
function definition. The other parts of the input are covered by the
grammar. However, I do not need to check the validity of the JS
function, just extract it as is, and pass to the JS engine later. So
if it is not necessary, I do not want to parse it.

Is it possible somehow? Or should I denote the beginning and the end
of the JS function by some special token to allow catching it by a
lexer rule?

Cheers,
Miklos

2009/10/13 Indhu Bharathi <indhu.b at s7software.com>:
> Balanced parenthesis cannot be expressed using regular expression which
> means you cannot recognize it using lexer. You need a push down automata
> which means you need a parser to recognize it. Try doing it using parser
> rules.
>
>
>
> Cheers, Indhu
>
>
>
>
>
> From: antlr-interest-bounces at antlr.org
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Espák Miklós
> Sent: Tuesday, October 13, 2009 10:04 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] accepting nested code blocks
>
>
>
> Hi,
>
> I want to create a lexer rule accepting nested code blocks.
>
> I tried out the example of the Definitive ANTLR Reference (Section 4.3), but
> it does not work.
> It accepts only such inputs which do not contain any character other than
> curly braces. Moreover, one closing brace is enough.
>
> The error is the following:
> MismatchedTokenException: line 1:1 mismatched input UNKNOW expecting 125
>
> The original code of the book:
>
> fragment
> CODE[boolean stripCurlies]:
>   '{' ( CODE[stripCurlies] | ~('{' |'}' ) )* '}'
>   {
>     if ( stripCurlies ) {
>       setText(getText().substring(1, getText().length()));
>     }
>   }
>   ;
>
> The simplified version of the rule results the same:
> fragment
> Block: '{' ( Block | ~('{'|'}') )* '}';
>
> I use ANTLR 3.2.
>
> Does anybody have an idea, how to get around this?
>
> Thanks,
>
> Miklos


More information about the antlr-interest mailing list