[antlr-interest] Backtracking question

Eric Jungkurth ejungkurth at yahoo.com
Fri May 30 15:54:31 PDT 2008


I have a grammar.
I can match multiple distinct phrases such as "SECTION 1. ORS 123.456 is amended to read" 
but unfortunately my input is littered with a lot of other stuff I don't care about. 
Most of that stuff gets ignored just as I'd like it to. However, if the stuff I don't 
care about contains tokens such as "is" or "and", which I do care about in certain 
contexts, then the parser throws a NoViableAltException and quits.
If I turn backtracking on I can no longer match anything, even if the entire input can be
matched with backtracking turned off. I've tried using a syntactic predicate but can
never match more than "SECTION 1." before I get a FailedPredicate.
What I'd really like to do is something like this:
phrase
 : ors (IS | ARE) orsAction 
 | orcp (IS | ARE) orcpAction
 | orl (IS | ARE) orlAction
 | // I didn't match 'ORS'or 'ORCP' or 'SECTION' so go to next token
 ;

Any help? I'm especially not understanding why a grammar that works for certain inputs
won't match anything with backtracking on. That seems counterintuitive.
Thanks,
Eric

grammar ar;
measure : section+ EOF
 ;
section : SECTION_INDEX phrase+
 ;
phrase
 : ors (IS | ARE) orsAction 
 | orcp (IS | ARE) orcpAction
 | orl (IS | ARE) orlAction
 ;
orsAction
 : FURTHER? AMENDED TO READ
 | REPEALED
 | ADDED TO AND MADE PART OF
 ;
 
orcpAction
 : FURTHER? AMENDED TO READ
 | REPEALED
 | AMENDED BY ADDING A? NEW SECTION
 ;
orlAction
 : FURTHER? AMENDED TO READ
 | ADDED TO AND MADE PART OF orlSections
 | REPEALED
 ;
 
ors : orsRange (COMMA orsRange)*
 ;
  
orsRange
 : ORS ORS_BASE_SECTION (TO ORS_BASE_SECTION)? 
 ;
 
ORS_BASE_SECTION
 : BASE_SECTION PERIOD DIGIT+
 ;
orcp : ORCP orcpRange
 ;
  
orcpRange
 : BASE_SECTION ((COMMA BASE_SECTION)* AND BASE_SECTION)? 
 ;
 
BASE_SECTION
 : DIGIT+ WS UPPERCASE_LETTER?
 ;
  
orl : orlSections 
  ((COMMA AS AMENDED BY orlSections)
  |(OF THIS DIGIT+ ACT)) 
 ;
orlSections
 : SECTION orlBaseRange
  ((COMMA orlBaseRange)* AND orlBaseRange)?
 ;
 
orlBaseRange
 : orlSection (TO orlSection)?
 ;
 
orlSection
 : ORL_BASE_SECTION COMMA
  (CHAPTER DIGIT+ COMMA)?
  (OREGON LAWS DIGIT+)?
  (LPAREN ENROLLED measureNumber RPAREN)?
 ;
 
ORL_BASE_SECTION
 : DIGIT+ LOWERCASE_LETTER?
 ;
measureNumber
 : measurePrefix DIGIT+
 ;
  
measurePrefix
 : (HOUSE | SENATE) (BILL | (JOINT RESOLUTION))
 ;
 
A : 'a';
ACT : 'Act';
ADDED : 'added';
 
ADDING : 'adding';
AMENDED : 'amended';
AND : 'and';
ARE : 'are';
AS : 'as';
BILL : 'Bill';
BY : 'by';
CHAPTER : 'chapter';
ENROLLED: 'Enrolled';
FURTHER : 'further';
HOUSE : 'House';
IS : 'is';
JOINT : 'Joint';
LAWS : 'Laws';
LPAREN : '(';
MADE : 'made';
NEW : 'new';
OF : 'of';
ORCP : 'ORCP';
OREGON : 'Oregon';
ORS : 'ORS';
PART : 'part';
READ : 'read';
REPEALED: 'repealed';
RESOLUTION
 : 'Resolution';
 
RPAREN : ')';
SECTION : ('S'|'s')('E'|'e')('C'|'c')('T'|'t')('I'|'i')('O'|'o')('N'|'n')'s'?;
SENATE : 'Senate';
TO : 'to';
 
THIS : 'this';
SECTION_INDEX
 : SECTION WS DIGIT+ PERIOD;
COMMA : ',';
PERIOD : '.';
fragment
DIGIT : '0'..'9';
WS : (' ' | '\t' | '\n')* { skip(); };
fragment
UPPERCASE_LETTER
 : 'A'..'Z';
fragment
LOWERCASE_LETTER
 : 'a'..'z';


      


More information about the antlr-interest mailing list