[antlr-interest] Context-sensitive lexer
Jonas
jonas.hagmar at gmail.com
Fri Jun 17 06:29:28 PDT 2011
Hi John!
I believed that using the semantic predicate would hinder ANTLR from
trying to match TITLE_TEXT in other situations than when lexerState
indicates that we are in the title section. Anyway, changing the TEXT
fragment to (~('\r' | '\n'))+ does not prevent the infinite loop. Keep
the good ideas coming!
Best Regards,
Jonas
On Fri, Jun 17, 2011 at 3:06 PM, John B. Brodie <jbb at acm.org> wrote:
> Greetings!
>
> Your TEXT fragment (and therefore your TITLE_TEXT token) can be empty!
>
> Thus, I think your lexer is trying to recognize infinitely many
> TITLE_TEXT tokens.
>
> Hope this helps...
> -jbb
>
> On Fri, 2011-06-17 at 14:15 +0200, Jonas wrote:
>> Hi,
>>
>> I'm developing a parser for a file format where context is very
>> important. I'm looking to
>> 1) understand why my ANTLR parser gets into infinite loops
>> 2) find out if there is any better way to implement context
>> sensitivity than what I am doing with semantic predicates.
>>
>> A typical beginning of a file looks like this:
>> TITLE
>> some title text
>>
>> SECTION1
>> a=b*c
>> END
>>
>> SECTION2
>> ...
>>
>> SECTION3
>> ...
>>
>> The syntax differs from section to section; the 'TITLE' section is
>> terminated by the newline after the title text line, while other
>> sections can e.g. use single quote string literals and be terminated
>> by a keyword like 'END'. Here is a sample grammar, that gets into an
>> infinite loop:
>>
>> grammar test;
>>
>> options {
>> output=AST;
>> }
>>
>> @lexer::members {
>> static final int STATE_AT_BEGINNING = 0;
>> static final int STATE_IN_TITLE = 1;
>> static final int STATE_AFTER_TITLE = 2;
>> int lexerState = STATE_AT_BEGINNING;
>> }
>>
>> file : title;
>>
>> title : BEGIN_TITLE TITLE_TEXT END_TITLE;
>>
>> BEGIN_TITLE
>> : {(lexerState == STATE_AT_BEGINNING)}? 'TITLE' WS_NL
>> {lexerState=STATE_IN_TITLE;}
>> ;
>>
>> TITLE_TEXT
>> : {lexerState == STATE_IN_TITLE}? TEXT
>> ;
>>
>> END_TITLE
>> : {lexerState == STATE_IN_TITLE}? NL {lexerState=STATE_AFTER_TITLE;}
>> ;
>>
>> BLANK_ROW
>> : {!(lexerState == STATE_IN_TITLE)}? WS_NL
>> ;
>>
>> REMARK : {!(lexerState == STATE_IN_TITLE)}? 'REMA' .* NL
>> ;
>>
>> fragment
>> WS_NL : (' ' | '\t')* NL;
>>
>> fragment
>> NL : '\r'? '\n';
>>
>> fragment
>> TEXT : (~('\r' | '\n'))*;
>>
>
>
>
More information about the antlr-interest
mailing list