[antlr-interest] accepting nested code blocks

Tue Oct 13 12:16:32 PDT 2009

This is exactly what I want, but it does not work for me. Incredible!
Do you use 3.2, too?

I just used 'fragment' because it was there in the example of the
book. (Although it was used there for another reason.) I also tried it
without 'fragment'.

Cheers,
Miklos

2009/10/13 Indhu Bharathi <indhu.b at s7software.com>:
> I tried the following
>
> r       :       BLOCK
>        ;
>
> BLOCK
>        : '{' ( BLOCK | ~('{'|'}') )* '}'
>        ;
>
> And it works fine for inputs like "{{}}" ,
> "{hsdgjsahdj{hasdjhsahdj}sdjhjsd}", etc
>
> What is the input you are trying to match? Why are you using 'fragment'?
>
> Cheers, Indhu
>
>
> -----Original Message-----
> From: Espák Miklós [mailto:espakm at gmail.com]
> Sent: Tuesday, October 13, 2009 11:11 PM
> To: Indhu Bharathi
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] accepting nested code blocks
>
> Hi,
>
> I understand your point of view, but the book states explicitly the
> following:
>
> "ANTLR generates recursive-descent recognizers
> for lexers just as it does for parsers and tree parsers. Consequently,
> ANTLR supports recursive lexer rules, unlike other tools such as lex."
>
> Using recursion it should be possible to create such a lexer rule. If
> not, what it can be used for?
>
> My original problem is that the input files contain a JavaScript
> function definition. The other parts of the input are covered by the
> grammar. However, I do not need to check the validity of the JS
> function, just extract it as is, and pass to the JS engine later. So
> if it is not necessary, I do not want to parse it.
>
> Is it possible somehow? Or should I denote the beginning and the end
> of the JS function by some special token to allow catching it by a
> lexer rule?
>
> Cheers,
> Miklos
>
> 2009/10/13 Indhu Bharathi <indhu.b at s7software.com>:
>> Balanced parenthesis cannot be expressed using regular expression which
>> means you cannot recognize it using lexer. You need a push down automata
>> which means you need a parser to recognize it. Try doing it using parser
>> rules.
>>
>>
>>
>> Cheers, Indhu
>>
>>
>>
>>
>>
>> From: antlr-interest-bounces at antlr.org
>> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Espák Miklós
>> Sent: Tuesday, October 13, 2009 10:04 PM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] accepting nested code blocks
>>
>>
>>
>> Hi,
>>
>> I want to create a lexer rule accepting nested code blocks.
>>
>> I tried out the example of the Definitive ANTLR Reference (Section 4.3),
> but
>> it does not work.
>> It accepts only such inputs which do not contain any character other than
>> curly braces. Moreover, one closing brace is enough.
>>
>> The error is the following:
>> MismatchedTokenException: line 1:1 mismatched input UNKNOW expecting 125
>>
>> The original code of the book:
>>
>> fragment
>> CODE[boolean stripCurlies]:
>>   '{' ( CODE[stripCurlies] | ~('{' |'}' ) )* '}'
>>   {
>>     if ( stripCurlies ) {
>>       setText(getText().substring(1, getText().length()));
>>     }
>>   }
>>   ;
>>
>> The simplified version of the rule results the same:
>> fragment
>> Block: '{' ( Block | ~('{'|'}') )* '}';
>>
>> I use ANTLR 3.2.
>>
>> Does anybody have an idea, how to get around this?
>>
>> Thanks,
>>
>> Miklos
>
>