[antlr-interest] Backtracking question

Terence Parr parrt at cs.usfca.edu
Fri May 30 16:04:28 PDT 2008


sounds to me like you need filter=true in a lexer.  Check out fuzzy  
parser in examples-v3 tarball.
Ter
On May 30, 2008, at 3:54 PM, Eric Jungkurth wrote:

> I have a grammar.
> I can match multiple distinct phrases such as "SECTION 1. ORS  
> 123.456 is amended to read"
> but unfortunately my input is littered with a lot of other stuff I  
> don't care about.
> Most of that stuff gets ignored just as I'd like it to. However, if  
> the stuff I don't
> care about contains tokens such as "is" or "and", which I do care  
> about in certain
> contexts, then the parser throws a NoViableAltException and quits.
> If I turn backtracking on I can no longer match anything, even if  
> the entire input can be
> matched with backtracking turned off. I've tried using a syntactic  
> predicate but can
> never match more than "SECTION 1." before I get a FailedPredicate.
> What I'd really like to do is something like this:
> phrase
>  : ors (IS | ARE) orsAction
>  | orcp (IS | ARE) orcpAction
>  | orl (IS | ARE) orlAction
>  | // I didn't match 'ORS'or 'ORCP' or 'SECTION' so go to next token
>  ;
>
> Any help? I'm especially not understanding why a grammar that works  
> for certain inputs
> won't match anything with backtracking on. That seems  
> counterintuitive.
> Thanks,
> Eric
>
> grammar ar;
> measure : section+ EOF
>  ;
> section : SECTION_INDEX phrase+
>  ;
> phrase
>  : ors (IS | ARE) orsAction
>  | orcp (IS | ARE) orcpAction
>  | orl (IS | ARE) orlAction
>  ;
> orsAction
>  : FURTHER? AMENDED TO READ
>  | REPEALED
>  | ADDED TO AND MADE PART OF
>  ;
>
> orcpAction
>  : FURTHER? AMENDED TO READ
>  | REPEALED
>  | AMENDED BY ADDING A? NEW SECTION
>  ;
> orlAction
>  : FURTHER? AMENDED TO READ
>  | ADDED TO AND MADE PART OF orlSections
>  | REPEALED
>  ;
>
> ors : orsRange (COMMA orsRange)*
>  ;
>
> orsRange
>  : ORS ORS_BASE_SECTION (TO ORS_BASE_SECTION)?
>  ;
>
> ORS_BASE_SECTION
>  : BASE_SECTION PERIOD DIGIT+
>  ;
> orcp : ORCP orcpRange
>  ;
>
> orcpRange
>  : BASE_SECTION ((COMMA BASE_SECTION)* AND BASE_SECTION)?
>  ;
>
> BASE_SECTION
>  : DIGIT+ WS UPPERCASE_LETTER?
>  ;
>
> orl : orlSections
>   ((COMMA AS AMENDED BY orlSections)
>   |(OF THIS DIGIT+ ACT))
>  ;
> orlSections
>  : SECTION orlBaseRange
>   ((COMMA orlBaseRange)* AND orlBaseRange)?
>  ;
>
> orlBaseRange
>  : orlSection (TO orlSection)?
>  ;
>
> orlSection
>  : ORL_BASE_SECTION COMMA
>   (CHAPTER DIGIT+ COMMA)?
>   (OREGON LAWS DIGIT+)?
>   (LPAREN ENROLLED measureNumber RPAREN)?
>  ;
>
> ORL_BASE_SECTION
>  : DIGIT+ LOWERCASE_LETTER?
>  ;
> measureNumber
>  : measurePrefix DIGIT+
>  ;
>
> measurePrefix
>  : (HOUSE | SENATE) (BILL | (JOINT RESOLUTION))
>  ;
>
> A : 'a';
> ACT : 'Act';
> ADDED : 'added';
>
> ADDING : 'adding';
> AMENDED : 'amended';
> AND : 'and';
> ARE : 'are';
> AS : 'as';
> BILL : 'Bill';
> BY : 'by';
> CHAPTER : 'chapter';
> ENROLLED: 'Enrolled';
> FURTHER : 'further';
> HOUSE : 'House';
> IS : 'is';
> JOINT : 'Joint';
> LAWS : 'Laws';
> LPAREN : '(';
> MADE : 'made';
> NEW : 'new';
> OF : 'of';
> ORCP : 'ORCP';
> OREGON : 'Oregon';
> ORS : 'ORS';
> PART : 'part';
> READ : 'read';
> REPEALED: 'repealed';
> RESOLUTION
>  : 'Resolution';
>
> RPAREN : ')';
> SECTION : ('S'|'s')('E'|'e')('C'|'c')('T'|'t')('I'|'i')('O'|'o') 
> ('N'|'n')'s'?;
> SENATE : 'Senate';
> TO : 'to';
>
> THIS : 'this';
> SECTION_INDEX
>  : SECTION WS DIGIT+ PERIOD;
> COMMA : ',';
> PERIOD : '.';
> fragment
> DIGIT : '0'..'9';
> WS : (' ' | '\t' | '\n')* { skip(); };
> fragment
> UPPERCASE_LETTER
>  : 'A'..'Z';
> fragment
> LOWERCASE_LETTER
>  : 'a'..'z';
>
>
>



More information about the antlr-interest mailing list