[antlr-interest] Context-sensitive lexer
Jonas
jonas.hagmar at gmail.com
Fri Jun 17 05:15:30 PDT 2011
Hi,
I'm developing a parser for a file format where context is very
important. I'm looking to
1) understand why my ANTLR parser gets into infinite loops
2) find out if there is any better way to implement context
sensitivity than what I am doing with semantic predicates.
A typical beginning of a file looks like this:
TITLE
some title text
SECTION1
a=b*c
END
SECTION2
...
SECTION3
...
The syntax differs from section to section; the 'TITLE' section is
terminated by the newline after the title text line, while other
sections can e.g. use single quote string literals and be terminated
by a keyword like 'END'. Here is a sample grammar, that gets into an
infinite loop:
grammar test;
options {
output=AST;
}
@lexer::members {
static final int STATE_AT_BEGINNING = 0;
static final int STATE_IN_TITLE = 1;
static final int STATE_AFTER_TITLE = 2;
int lexerState = STATE_AT_BEGINNING;
}
file : title;
title : BEGIN_TITLE TITLE_TEXT END_TITLE;
BEGIN_TITLE
: {(lexerState == STATE_AT_BEGINNING)}? 'TITLE' WS_NL
{lexerState=STATE_IN_TITLE;}
;
TITLE_TEXT
: {lexerState == STATE_IN_TITLE}? TEXT
;
END_TITLE
: {lexerState == STATE_IN_TITLE}? NL {lexerState=STATE_AFTER_TITLE;}
;
BLANK_ROW
: {!(lexerState == STATE_IN_TITLE)}? WS_NL
;
REMARK : {!(lexerState == STATE_IN_TITLE)}? 'REMA' .* NL
;
fragment
WS_NL : (' ' | '\t')* NL;
fragment
NL : '\r'? '\n';
fragment
TEXT : (~('\r' | '\n'))*;
Best Regards,
Jonas
More information about the antlr-interest
mailing list