[antlr-interest] accepting nested code blocks

Tue Oct 13 13:31:46 PDT 2009

Thank you very much, I will try this code.

The problem was that I tested my grammar in interpreted mode, and it
failed because of a bug in the interpreted mode.
After compiling and running, the same grammar behaves well.

However, your code is much more safe. I need a JavaScript function
only to specify a computation, so I do not have to deal with character
and string literals.

Thank you again,
Miklos

On Tue, Oct 13, 2009 at 10:06 PM, Gerald Rosenberg <gerald at certiv.net> wrote:
> The book and Indhu are both correct.  How to proceed is more a matter of
> what you know about the content between the delimiter pair, its complexity,
> and what you want out of it.  There are modal lexers, island grammars, and
> parser-evaluation alternatives.
>
> The recursive lexer rule definition does not handle quoted strings, which in
> the case of JavaScript will prove fatal.  I believe there is a way to avoid
> this problem, only using pure lexer rules, but it will be complex and the
> margin is too small to contain the proof.
>
> However, if all you want is a single token containing the clob between a
> balanced set of delimiters, there is this simple approach:
>
> @lexer ::members {
>
> public boolean pairMatch(int limit) {
>     return Helper.pairMatch(input, limit);
>   }
> }
>
> BRACE_BLOCK
>   :  '{' { pairMatch(500) }?
>   ;
>
> and include the attached helper in your build.  This version recognizes
> nested delimiters subject to line comments and both single and double quoted
> strings.
>
>
>
> At 10:41 AM 10/13/2009, Espák Miklós wrote:
>
> Hi,
>
> I understand your point of view, but the book states explicitly the
> following:
>
> "ANTLR generates recursive-descent recognizers
> for lexers just as it does for parsers and tree parsers. Consequently,
> ANTLR supports recursive lexer rules, unlike other tools such as lex."
>
> Using recursion it should be possible to create such a lexer rule. If
> not, what it can be used for?
>
> My original problem is that the input files contain a JavaScript
> function definition. The other parts of the input are covered by the
> grammar. However, I do not need to check the validity of the JS
> function, just extract it as is, and pass to the JS engine later. So
> if it is not necessary, I do not want to parse it.
>
> Is it possible somehow? Or should I denote the beginning and the end
> of the JS function by some special token to allow catching it by a
> lexer rule?
>
> Cheers,
> Miklos
>
> 2009/10/13 Indhu Bharathi <indhu.b at s7software.com>:
>> Balanced parenthesis cannot be expressed using regular expression which
>> means you cannot recognize it using lexer. You need a push down automata
>> which means you need a parser to recognize it. Try doing it using parser
>> rules.
>>
>>
>>
>> Cheers, Indhu
>>
>>
>>
>>
>>
>> From: antlr-interest-bounces at antlr.org
>> [ mailto:antlr-interest-bounces at antlr.org] On Behalf Of Espák Miklós
>> Sent: Tuesday, October 13, 2009 10:04 PM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] accepting nested code blocks
>>
>>
>>
>> Hi,
>>
>> I want to create a lexer rule accepting nested code blocks.
>>
>> I tried out the example of the Definitive ANTLR Reference (Section 4.3),
>> but
>> it does not work.
>> It accepts only such inputs which do not contain any character other than
>> curly braces. Moreover, one closing brace is enough.
>>
>> The error is the following:
>> MismatchedTokenException: line 1:1 mismatched input UNKNOW expecting 125
>>
>> The original code of the book:
>>
>> fragment
>> CODE[boolean stripCurlies]:
>>   '{' ( CODE[stripCurlies] | ~('{' |'}' ) )* '}'
>>   {
>>     if ( stripCurlies ) {
>>       setText(getText().substring(1, getText().length()));
>>     }
>>   }
>>   ;
>>
>> The simplified version of the rule results the same:
>> fragment
>> Block: '{' ( Block | ~('{'|'}') )* '}';
>>
>> I use ANTLR 3.2.
>>
>> Does anybody have an idea, how to get around this?
>>
>> Thanks,
>>
>> Miklos
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>