[antlr-interest] Backtracking question
Terence Parr
parrt at cs.usfca.edu
Fri May 30 16:04:28 PDT 2008
sounds to me like you need filter=true in a lexer. Check out fuzzy
parser in examples-v3 tarball.
Ter
On May 30, 2008, at 3:54 PM, Eric Jungkurth wrote:
> I have a grammar.
> I can match multiple distinct phrases such as "SECTION 1. ORS
> 123.456 is amended to read"
> but unfortunately my input is littered with a lot of other stuff I
> don't care about.
> Most of that stuff gets ignored just as I'd like it to. However, if
> the stuff I don't
> care about contains tokens such as "is" or "and", which I do care
> about in certain
> contexts, then the parser throws a NoViableAltException and quits.
> If I turn backtracking on I can no longer match anything, even if
> the entire input can be
> matched with backtracking turned off. I've tried using a syntactic
> predicate but can
> never match more than "SECTION 1." before I get a FailedPredicate.
> What I'd really like to do is something like this:
> phrase
> : ors (IS | ARE) orsAction
> | orcp (IS | ARE) orcpAction
> | orl (IS | ARE) orlAction
> | // I didn't match 'ORS'or 'ORCP' or 'SECTION' so go to next token
> ;
>
> Any help? I'm especially not understanding why a grammar that works
> for certain inputs
> won't match anything with backtracking on. That seems
> counterintuitive.
> Thanks,
> Eric
>
> grammar ar;
> measure : section+ EOF
> ;
> section : SECTION_INDEX phrase+
> ;
> phrase
> : ors (IS | ARE) orsAction
> | orcp (IS | ARE) orcpAction
> | orl (IS | ARE) orlAction
> ;
> orsAction
> : FURTHER? AMENDED TO READ
> | REPEALED
> | ADDED TO AND MADE PART OF
> ;
>
> orcpAction
> : FURTHER? AMENDED TO READ
> | REPEALED
> | AMENDED BY ADDING A? NEW SECTION
> ;
> orlAction
> : FURTHER? AMENDED TO READ
> | ADDED TO AND MADE PART OF orlSections
> | REPEALED
> ;
>
> ors : orsRange (COMMA orsRange)*
> ;
>
> orsRange
> : ORS ORS_BASE_SECTION (TO ORS_BASE_SECTION)?
> ;
>
> ORS_BASE_SECTION
> : BASE_SECTION PERIOD DIGIT+
> ;
> orcp : ORCP orcpRange
> ;
>
> orcpRange
> : BASE_SECTION ((COMMA BASE_SECTION)* AND BASE_SECTION)?
> ;
>
> BASE_SECTION
> : DIGIT+ WS UPPERCASE_LETTER?
> ;
>
> orl : orlSections
> ((COMMA AS AMENDED BY orlSections)
> |(OF THIS DIGIT+ ACT))
> ;
> orlSections
> : SECTION orlBaseRange
> ((COMMA orlBaseRange)* AND orlBaseRange)?
> ;
>
> orlBaseRange
> : orlSection (TO orlSection)?
> ;
>
> orlSection
> : ORL_BASE_SECTION COMMA
> (CHAPTER DIGIT+ COMMA)?
> (OREGON LAWS DIGIT+)?
> (LPAREN ENROLLED measureNumber RPAREN)?
> ;
>
> ORL_BASE_SECTION
> : DIGIT+ LOWERCASE_LETTER?
> ;
> measureNumber
> : measurePrefix DIGIT+
> ;
>
> measurePrefix
> : (HOUSE | SENATE) (BILL | (JOINT RESOLUTION))
> ;
>
> A : 'a';
> ACT : 'Act';
> ADDED : 'added';
>
> ADDING : 'adding';
> AMENDED : 'amended';
> AND : 'and';
> ARE : 'are';
> AS : 'as';
> BILL : 'Bill';
> BY : 'by';
> CHAPTER : 'chapter';
> ENROLLED: 'Enrolled';
> FURTHER : 'further';
> HOUSE : 'House';
> IS : 'is';
> JOINT : 'Joint';
> LAWS : 'Laws';
> LPAREN : '(';
> MADE : 'made';
> NEW : 'new';
> OF : 'of';
> ORCP : 'ORCP';
> OREGON : 'Oregon';
> ORS : 'ORS';
> PART : 'part';
> READ : 'read';
> REPEALED: 'repealed';
> RESOLUTION
> : 'Resolution';
>
> RPAREN : ')';
> SECTION : ('S'|'s')('E'|'e')('C'|'c')('T'|'t')('I'|'i')('O'|'o')
> ('N'|'n')'s'?;
> SENATE : 'Senate';
> TO : 'to';
>
> THIS : 'this';
> SECTION_INDEX
> : SECTION WS DIGIT+ PERIOD;
> COMMA : ',';
> PERIOD : '.';
> fragment
> DIGIT : '0'..'9';
> WS : (' ' | '\t' | '\n')* { skip(); };
> fragment
> UPPERCASE_LETTER
> : 'A'..'Z';
> fragment
> LOWERCASE_LETTER
> : 'a'..'z';
>
>
>
More information about the antlr-interest
mailing list