[antlr-interest] Newbie Question about Syntactic Predicates
Jim O'Connor
Jim.OConnor at microfocus.com
Fri Nov 7 12:37:38 PST 2003
Hi Mike,
First, to point out an oversite.
rule1 : "Standalone";
rule2 : "Surface";
Grammar : rule1 rule2; // translates to "Standalone Surface"
Input: "Surface Standalone"
Grammar doesn't match Input.
Comments:
1. If you are not familiar with the "testLiterals" option, see the
documentation. I say this because you seem to want to do a lot of work
instead of making your life simple.
See it this makes sense,
(I'm not paying attention to syntax!!!!!)
Lexer - for this small ap consider making '.' part of IDENTIFIER
Tokens
{
SURFACE="NUMBER.OF.SURFACE";
STANDALONE="NUMBER.OF.STANDALONE";
... others
}
{testliterals = true}
IDENTIFIER: (LETTER | '.')*;
NUMBER : (DIGIT)*;
COLON : ':';
Private DIGIT : (0..9)
PARSER
Start : (rule)*; // do as many generic rules as possible
rule : stand
| surf
... others
;
stand : STAND COLON NUMBER;
surf : SURF COLON NUMBER;
Benefits _- predicates should not be an issue;
Large tokens handled.
Another mechanism is to use filters. (see examples that come with the
antlr.jar) You only recognize pieces of the file you are interested in.
Jim
> -----Original Message-----
> From: hawkwall [mailto:hawkwall at yahoo.com]
> Sent: Friday, November 07, 2003 3:09 PM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] Newbie Question about Syntactic Predicates
>
> Hello,
>
> I need to match the following data
>
> NUMBER.OF.SURFACE: 3
> NUMBER.OF.STANDALONE: 5
>
> Where all I am really concered about is that the surface has a number
> 3, and the standalone has a number 5. I put the following in my Lexer
>
> DIGITS : (0..9)+ ;
>
> DOT : '.' ;
>
> COLON : ':' ;
>
> SURFACE : "NUMBER" DOT "OF" DOT "SURFACE" COLON ;
>
> STANDALONE : "NUMBER" DOT "OF" DOT "STANDALONE" COLON;
>
> WS : ( ' '
> | '\t'
> | '\f'
> | ( options {generateAmbigWarnings=false;}
> : "\r\n" // DOS
> | '\r' // Macintosh
> | '\n' // Unix
> )
> {newline();}
> )+
>
> // now the overall whitespace action -- skip it!
> { $setType(Token.SKIP); }
> ;
>
> And my Parser looks like:
>
> start : rule1 rule2;
>
> rule1 : SURFACE DIGITS ;
>
> rule2 : STANDALONE DIGITS ;
>
> with some actions to print out the number it finds. If k<12 in the
> lexer, I get a nondeterminism error, and can see the problem in the
> generated Lexer. But k=12 takes awhile to generate the .java files.
> I can break this up into smaller tokens that are put together in the
> parser like:
>
> Lexer:
> NUMBER : "NUMBER" ;
> OF : "OF" ;
> DOT : '.' ;
> STANDALONE : "STANDALONE" ;
> SURFACE : "SURFACE" ;
> etc.
>
> and then in Parser:
> start : rule1
> rule2
>
> rule1 : NUMBER DOT OF DOT STANDALONE COLON DIGITS ;
> rule2 : NUMBER DOT OF DOT SURFACE COLON DIGITS ;
>
> and it works with a smaller k value in the lexer, but it doesn't seem
> like the best option and makes the parser harder to construct. I have
> read everything I can find about syntatic predicates, and they seem to
> be what I need, but I can't get it work. I added the following to the
> Lexer from above
>
> SURFACE_OR_STANDOFF
> : ("NUMBER" DOT "OF" DOT "SURFACE" ) =>
> "NUMBER" DOT "OF" DOT "SURFACE" DOT "TO" DOT "AIR" DOT
> "THREAT" DOT
> "CLASSES" COLON
> {$setType(SURFACE); }
> | ("NUMBER" DOT "OF" DOT "STANDOFF" DOT "RANGE" DOT "AIRCRAFT"
> DOT
> "CLASSES" COLON ) =>
> "NUMBER" DOT "OF" DOT "STANDOFF"
> {$setType(STANDOFF);}
> ;
>
> What am I missing? Is there a better way to match large tokens? I
> can tell that the Parser is backtracking at all. The error message I
> get is:
>
> parser exception: line 2:12: expecting 'U', found 'T'
> line 2:12: expecting 'U', found 'T'
>
> Which says to me the parser is still trying to match the SURFACE
> token. I tried to define the parser to match SURFACE once and then
> STANDALONE, but my head hurts from all the banging. Thanks for the
> help.
>
> Mike Wall
>
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
>
> ________________________________________________________________________
> This e-mail has been scanned for viruses by MCI's Internet Managed
> Scanning Services - powered by MessageLabs. For further information visit
> http://www.mci.com
> ________________________________________________________________________
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list