[antlr-interest] Context-sensitive lexer

Fri Jun 17 05:15:30 PDT 2011

Hi,

I'm developing a parser for a file format where context is very
important. I'm looking to
1) understand why my ANTLR parser gets into infinite loops
2) find out if there is any better way to implement context
sensitivity than what I am doing with semantic predicates.

A typical beginning of a file looks like this:
TITLE
some title text

SECTION1
 a=b*c
END

SECTION2
...

SECTION3
...

The syntax differs from section to section; the 'TITLE' section is
terminated by the newline after the title text line, while other
sections can e.g. use single quote string literals and be terminated
by a keyword like 'END'. Here is a sample grammar, that gets into an
infinite loop:

grammar test;

options {
  output=AST;
}

@lexer::members {
  static final int STATE_AT_BEGINNING = 0;
  static final int STATE_IN_TITLE = 1;
  static final int STATE_AFTER_TITLE = 2;
  int lexerState = STATE_AT_BEGINNING;
}

file 	:	title;

title	:	BEGIN_TITLE TITLE_TEXT END_TITLE;

BEGIN_TITLE
	: {(lexerState == STATE_AT_BEGINNING)}? 'TITLE' WS_NL
{lexerState=STATE_IN_TITLE;}
	;

TITLE_TEXT
	: {lexerState == STATE_IN_TITLE}? TEXT
	;

END_TITLE
	: {lexerState == STATE_IN_TITLE}? NL {lexerState=STATE_AFTER_TITLE;}
	;

BLANK_ROW
	: {!(lexerState == STATE_IN_TITLE)}? WS_NL
	;

REMARK	: {!(lexerState == STATE_IN_TITLE)}? 'REMA' .* NL
	;

fragment
WS_NL	:	(' ' | '\t')* NL;

fragment
NL	:	'\r'? '\n';

fragment
TEXT	:	(~('\r' | '\n'))*;

Best Regards,
Jonas