[antlr-interest] Context-sensitive lexer
John B. Brodie
jbb at acm.org
Fri Jun 17 06:06:06 PDT 2011
Greetings!
Your TEXT fragment (and therefore your TITLE_TEXT token) can be empty!
Thus, I think your lexer is trying to recognize infinitely many
TITLE_TEXT tokens.
Hope this helps...
-jbb
On Fri, 2011-06-17 at 14:15 +0200, Jonas wrote:
> Hi,
>
> I'm developing a parser for a file format where context is very
> important. I'm looking to
> 1) understand why my ANTLR parser gets into infinite loops
> 2) find out if there is any better way to implement context
> sensitivity than what I am doing with semantic predicates.
>
> A typical beginning of a file looks like this:
> TITLE
> some title text
>
> SECTION1
> a=b*c
> END
>
> SECTION2
> ...
>
> SECTION3
> ...
>
> The syntax differs from section to section; the 'TITLE' section is
> terminated by the newline after the title text line, while other
> sections can e.g. use single quote string literals and be terminated
> by a keyword like 'END'. Here is a sample grammar, that gets into an
> infinite loop:
>
> grammar test;
>
> options {
> output=AST;
> }
>
> @lexer::members {
> static final int STATE_AT_BEGINNING = 0;
> static final int STATE_IN_TITLE = 1;
> static final int STATE_AFTER_TITLE = 2;
> int lexerState = STATE_AT_BEGINNING;
> }
>
> file : title;
>
> title : BEGIN_TITLE TITLE_TEXT END_TITLE;
>
> BEGIN_TITLE
> : {(lexerState == STATE_AT_BEGINNING)}? 'TITLE' WS_NL
> {lexerState=STATE_IN_TITLE;}
> ;
>
> TITLE_TEXT
> : {lexerState == STATE_IN_TITLE}? TEXT
> ;
>
> END_TITLE
> : {lexerState == STATE_IN_TITLE}? NL {lexerState=STATE_AFTER_TITLE;}
> ;
>
> BLANK_ROW
> : {!(lexerState == STATE_IN_TITLE)}? WS_NL
> ;
>
> REMARK : {!(lexerState == STATE_IN_TITLE)}? 'REMA' .* NL
> ;
>
> fragment
> WS_NL : (' ' | '\t')* NL;
>
> fragment
> NL : '\r'? '\n';
>
> fragment
> TEXT : (~('\r' | '\n'))*;
>
More information about the antlr-interest
mailing list