[antlr-interest] Backtracking question
Eric Jungkurth
ejungkurth at yahoo.com
Fri May 30 15:54:31 PDT 2008
I have a grammar.
I can match multiple distinct phrases such as "SECTION 1. ORS 123.456 is amended to read"
but unfortunately my input is littered with a lot of other stuff I don't care about.
Most of that stuff gets ignored just as I'd like it to. However, if the stuff I don't
care about contains tokens such as "is" or "and", which I do care about in certain
contexts, then the parser throws a NoViableAltException and quits.
If I turn backtracking on I can no longer match anything, even if the entire input can be
matched with backtracking turned off. I've tried using a syntactic predicate but can
never match more than "SECTION 1." before I get a FailedPredicate.
What I'd really like to do is something like this:
phrase
: ors (IS | ARE) orsAction
| orcp (IS | ARE) orcpAction
| orl (IS | ARE) orlAction
| // I didn't match 'ORS'or 'ORCP' or 'SECTION' so go to next token
;
Any help? I'm especially not understanding why a grammar that works for certain inputs
won't match anything with backtracking on. That seems counterintuitive.
Thanks,
Eric
grammar ar;
measure : section+ EOF
;
section : SECTION_INDEX phrase+
;
phrase
: ors (IS | ARE) orsAction
| orcp (IS | ARE) orcpAction
| orl (IS | ARE) orlAction
;
orsAction
: FURTHER? AMENDED TO READ
| REPEALED
| ADDED TO AND MADE PART OF
;
orcpAction
: FURTHER? AMENDED TO READ
| REPEALED
| AMENDED BY ADDING A? NEW SECTION
;
orlAction
: FURTHER? AMENDED TO READ
| ADDED TO AND MADE PART OF orlSections
| REPEALED
;
ors : orsRange (COMMA orsRange)*
;
orsRange
: ORS ORS_BASE_SECTION (TO ORS_BASE_SECTION)?
;
ORS_BASE_SECTION
: BASE_SECTION PERIOD DIGIT+
;
orcp : ORCP orcpRange
;
orcpRange
: BASE_SECTION ((COMMA BASE_SECTION)* AND BASE_SECTION)?
;
BASE_SECTION
: DIGIT+ WS UPPERCASE_LETTER?
;
orl : orlSections
((COMMA AS AMENDED BY orlSections)
|(OF THIS DIGIT+ ACT))
;
orlSections
: SECTION orlBaseRange
((COMMA orlBaseRange)* AND orlBaseRange)?
;
orlBaseRange
: orlSection (TO orlSection)?
;
orlSection
: ORL_BASE_SECTION COMMA
(CHAPTER DIGIT+ COMMA)?
(OREGON LAWS DIGIT+)?
(LPAREN ENROLLED measureNumber RPAREN)?
;
ORL_BASE_SECTION
: DIGIT+ LOWERCASE_LETTER?
;
measureNumber
: measurePrefix DIGIT+
;
measurePrefix
: (HOUSE | SENATE) (BILL | (JOINT RESOLUTION))
;
A : 'a';
ACT : 'Act';
ADDED : 'added';
ADDING : 'adding';
AMENDED : 'amended';
AND : 'and';
ARE : 'are';
AS : 'as';
BILL : 'Bill';
BY : 'by';
CHAPTER : 'chapter';
ENROLLED: 'Enrolled';
FURTHER : 'further';
HOUSE : 'House';
IS : 'is';
JOINT : 'Joint';
LAWS : 'Laws';
LPAREN : '(';
MADE : 'made';
NEW : 'new';
OF : 'of';
ORCP : 'ORCP';
OREGON : 'Oregon';
ORS : 'ORS';
PART : 'part';
READ : 'read';
REPEALED: 'repealed';
RESOLUTION
: 'Resolution';
RPAREN : ')';
SECTION : ('S'|'s')('E'|'e')('C'|'c')('T'|'t')('I'|'i')('O'|'o')('N'|'n')'s'?;
SENATE : 'Senate';
TO : 'to';
THIS : 'this';
SECTION_INDEX
: SECTION WS DIGIT+ PERIOD;
COMMA : ',';
PERIOD : '.';
fragment
DIGIT : '0'..'9';
WS : (' ' | '\t' | '\n')* { skip(); };
fragment
UPPERCASE_LETTER
: 'A'..'Z';
fragment
LOWERCASE_LETTER
: 'a'..'z';
More information about the antlr-interest
mailing list