[antlr-interest] Newbie Question about Syntactic Predicates

Jim O'Connor Jim.OConnor at microfocus.com
Fri Nov 7 12:37:38 PST 2003


Hi Mike,
  First, to point out an oversite. 

rule1 : "Standalone";
rule2 : "Surface";

Grammar : rule1 rule2; // translates to "Standalone Surface"

Input: "Surface Standalone" 

Grammar doesn't match Input.

Comments: 
1. If you are not familiar with the "testLiterals" option, see the
documentation.  I say this because you seem to want to do a lot of work
instead of making your life simple.

See it this makes sense, 

(I'm not paying attention to syntax!!!!!)

Lexer - for this small ap consider making '.' part of IDENTIFIER

Tokens
{
SURFACE="NUMBER.OF.SURFACE"; 
STANDALONE="NUMBER.OF.STANDALONE";
... others
}

{testliterals = true}
IDENTIFIER:  (LETTER | '.')*;
NUMBER : (DIGIT)*;

COLON : ':';
Private DIGIT : (0..9)



PARSER

Start : (rule)*; // do as many generic rules as possible

rule : stand
     | surf
   ... others
      ;


stand : STAND COLON NUMBER;


surf  : SURF COLON NUMBER;
 


Benefits _- predicates should not be an issue;
		Large tokens handled.

Another mechanism is to use filters. (see examples that come with the
antlr.jar)  You only recognize pieces of the file you are interested in.

Jim





> -----Original Message-----
> From: hawkwall [mailto:hawkwall at yahoo.com]
> Sent: Friday, November 07, 2003 3:09 PM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] Newbie Question about Syntactic Predicates
> 
> Hello,
> 
> I need to match the following data
> 
> NUMBER.OF.SURFACE: 3
> NUMBER.OF.STANDALONE: 5
> 
> Where all I am really concered about is that the surface has a number
> 3, and the standalone has a number 5.  I put the following in my Lexer
> 
> DIGITS : (0..9)+ ;
> 
> DOT : '.' ;
> 
> COLON : ':' ;
> 
> SURFACE : "NUMBER" DOT "OF" DOT "SURFACE" COLON ;
> 
> STANDALONE : "NUMBER" DOT "OF" DOT "STANDALONE" COLON;
> 
> WS	:	(	' '
> 		|	'\t'
> 		|	'\f'
> 		|	(	options {generateAmbigWarnings=false;}
> 			:	"\r\n"  // DOS
> 			|	'\r'    // Macintosh
> 			|	'\n'	// Unix
> 			)
> 			{newline();}
> 		)+
> 
>     // now the overall whitespace action -- skip it!
>     { $setType(Token.SKIP); }
>     ;
> 
> And my Parser looks like:
> 
> start :  rule1 rule2;
> 
> rule1 : SURFACE DIGITS ;
> 
> rule2 : STANDALONE DIGITS ;
> 
> with some actions to print out the number it finds.  If k<12 in the
> lexer, I get a nondeterminism error, and can see the problem in the
> generated Lexer.  But k=12 takes awhile to generate the .java files.
> I can break this up into smaller tokens that are put together in the
> parser like:
> 
> Lexer:
> NUMBER : "NUMBER" ;
> OF : "OF" ;
> DOT : '.' ;
> STANDALONE : "STANDALONE" ;
> SURFACE : "SURFACE" ;
> etc.
> 
> and then in Parser:
> start : rule1
>         rule2
> 
> rule1 : NUMBER DOT OF DOT STANDALONE COLON DIGITS ;
> rule2 : NUMBER DOT OF DOT SURFACE COLON DIGITS ;
> 
> and it works with a smaller k value in the lexer, but it doesn't seem
> like the best option and makes the parser harder to construct.  I have
> read everything I can find about syntatic predicates, and they seem to
> be what I need, but I can't get it work.  I added the following to the
> Lexer from above
> 
> SURFACE_OR_STANDOFF
> 	: 	("NUMBER" DOT "OF" DOT "SURFACE"  ) =>
> 		"NUMBER" DOT "OF" DOT "SURFACE" DOT "TO" DOT "AIR" DOT
> "THREAT" DOT
> "CLASSES" COLON
> 		{$setType(SURFACE); }
> 	|	("NUMBER" DOT "OF" DOT "STANDOFF" DOT "RANGE" DOT "AIRCRAFT"
> DOT
> "CLASSES" COLON ) =>
> 		"NUMBER" DOT "OF" DOT "STANDOFF"
> 		{$setType(STANDOFF);}
> 	;
> 
> What am I missing?  Is there a better way to match large tokens?  I
> can tell that the Parser is backtracking at all.  The error message I
> get is:
> 
> parser exception: line 2:12: expecting 'U', found 'T'
> line 2:12: expecting 'U', found 'T'
> 
> Which says to me the parser is still trying to match the SURFACE
> token.  I tried to define the parser to match SURFACE once and then
> STANDALONE, but my head hurts from all the banging.  Thanks for the
> help.
> 
> Mike Wall
> 
> 
> 
> 
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
> 
> 
> 
> ________________________________________________________________________
> This e-mail has been scanned for viruses by MCI's Internet Managed
> Scanning Services - powered by MessageLabs. For further information visit
> http://www.mci.com
> ________________________________________________________________________

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list