[antlr-interest] Context-sensitive lexer

Fri Jun 17 14:28:00 PDT 2011

On Fri, 2011-06-17 at 15:29 +0200, Jonas wrote:
> Hi John!
> 
> I believed that using the semantic predicate would hinder ANTLR from
> trying to match TITLE_TEXT in other situations than when lexerState
> indicates that we are in the title section. Anyway, changing the TEXT
> fragment to (~('\r' | '\n'))+ does not prevent the infinite loop. Keep
> the good ideas coming!

When I run your example from the command line I get this message printed
to the console continuously...

line 4:0 rule TITLE_TEXT failed predicate: {lexerState==STATE_IN_TITLE}?

perhaps predicates in the Lexer do not actually perform as you are
expecting? (look at the generated lexer code....)

> 
> Best Regards,
> Jonas
> 
> On Fri, Jun 17, 2011 at 3:06 PM, John B. Brodie <jbb at acm.org> wrote:
> > Greetings!
> >
> > Your TEXT fragment (and therefore your TITLE_TEXT token) can be empty!
> >
> > Thus, I think your lexer is trying to recognize infinitely many
> > TITLE_TEXT tokens.
> >
> > Hope this helps...
> >   -jbb
> >
> > On Fri, 2011-06-17 at 14:15 +0200, Jonas wrote:
> >> Hi,
> >>
> >> I'm developing a parser for a file format where context is very
> >> important. I'm looking to
> >> 1) understand why my ANTLR parser gets into infinite loops
> >> 2) find out if there is any better way to implement context
> >> sensitivity than what I am doing with semantic predicates.
> >>
> >> A typical beginning of a file looks like this:
> >> TITLE
> >> some title text
> >>
> >> SECTION1
> >>  a=b*c
> >> END
> >>
> >> SECTION2
> >> ...
> >>
> >> SECTION3
> >> ...
> >>
> >> The syntax differs from section to section; the 'TITLE' section is
> >> terminated by the newline after the title text line, while other
> >> sections can e.g. use single quote string literals and be terminated
> >> by a keyword like 'END'. Here is a sample grammar, that gets into an
> >> infinite loop:
> >>
> >> grammar test;
> >>
> >> options {
> >>   output=AST;
> >> }
> >>
> >> @lexer::members {
> >>   static final int STATE_AT_BEGINNING = 0;
> >>   static final int STATE_IN_TITLE = 1;
> >>   static final int STATE_AFTER_TITLE = 2;
> >>   int lexerState = STATE_AT_BEGINNING;
> >> }
> >>
> >> file  :       title;
> >>
> >> title :       BEGIN_TITLE TITLE_TEXT END_TITLE;
> >>
> >> BEGIN_TITLE
> >>       : {(lexerState == STATE_AT_BEGINNING)}? 'TITLE' WS_NL
> >> {lexerState=STATE_IN_TITLE;}
> >>       ;
> >>
> >> TITLE_TEXT
> >>       : {lexerState == STATE_IN_TITLE}? TEXT
> >>       ;
> >>
> >> END_TITLE
> >>       : {lexerState == STATE_IN_TITLE}? NL {lexerState=STATE_AFTER_TITLE;}
> >>       ;
> >>
> >> BLANK_ROW
> >>       : {!(lexerState == STATE_IN_TITLE)}? WS_NL
> >>       ;
> >>
> >> REMARK        : {!(lexerState == STATE_IN_TITLE)}? 'REMA' .* NL
> >>       ;
> >>
> >> fragment
> >> WS_NL :       (' ' | '\t')* NL;
> >>
> >> fragment
> >> NL    :       '\r'? '\n';
> >>
> >> fragment
> >> TEXT  :       (~('\r' | '\n'))*;
> >>
> >
> >
> >
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address