[antlr-interest] Context-sensitive lexer

Jonas jonas.hagmar at gmail.com
Fri Jun 17 06:29:28 PDT 2011


Hi John!

I believed that using the semantic predicate would hinder ANTLR from
trying to match TITLE_TEXT in other situations than when lexerState
indicates that we are in the title section. Anyway, changing the TEXT
fragment to (~('\r' | '\n'))+ does not prevent the infinite loop. Keep
the good ideas coming!

Best Regards,
Jonas

On Fri, Jun 17, 2011 at 3:06 PM, John B. Brodie <jbb at acm.org> wrote:
> Greetings!
>
> Your TEXT fragment (and therefore your TITLE_TEXT token) can be empty!
>
> Thus, I think your lexer is trying to recognize infinitely many
> TITLE_TEXT tokens.
>
> Hope this helps...
>   -jbb
>
> On Fri, 2011-06-17 at 14:15 +0200, Jonas wrote:
>> Hi,
>>
>> I'm developing a parser for a file format where context is very
>> important. I'm looking to
>> 1) understand why my ANTLR parser gets into infinite loops
>> 2) find out if there is any better way to implement context
>> sensitivity than what I am doing with semantic predicates.
>>
>> A typical beginning of a file looks like this:
>> TITLE
>> some title text
>>
>> SECTION1
>>  a=b*c
>> END
>>
>> SECTION2
>> ...
>>
>> SECTION3
>> ...
>>
>> The syntax differs from section to section; the 'TITLE' section is
>> terminated by the newline after the title text line, while other
>> sections can e.g. use single quote string literals and be terminated
>> by a keyword like 'END'. Here is a sample grammar, that gets into an
>> infinite loop:
>>
>> grammar test;
>>
>> options {
>>   output=AST;
>> }
>>
>> @lexer::members {
>>   static final int STATE_AT_BEGINNING = 0;
>>   static final int STATE_IN_TITLE = 1;
>>   static final int STATE_AFTER_TITLE = 2;
>>   int lexerState = STATE_AT_BEGINNING;
>> }
>>
>> file  :       title;
>>
>> title :       BEGIN_TITLE TITLE_TEXT END_TITLE;
>>
>> BEGIN_TITLE
>>       : {(lexerState == STATE_AT_BEGINNING)}? 'TITLE' WS_NL
>> {lexerState=STATE_IN_TITLE;}
>>       ;
>>
>> TITLE_TEXT
>>       : {lexerState == STATE_IN_TITLE}? TEXT
>>       ;
>>
>> END_TITLE
>>       : {lexerState == STATE_IN_TITLE}? NL {lexerState=STATE_AFTER_TITLE;}
>>       ;
>>
>> BLANK_ROW
>>       : {!(lexerState == STATE_IN_TITLE)}? WS_NL
>>       ;
>>
>> REMARK        : {!(lexerState == STATE_IN_TITLE)}? 'REMA' .* NL
>>       ;
>>
>> fragment
>> WS_NL :       (' ' | '\t')* NL;
>>
>> fragment
>> NL    :       '\r'? '\n';
>>
>> fragment
>> TEXT  :       (~('\r' | '\n'))*;
>>
>
>
>


More information about the antlr-interest mailing list