[antlr-interest] Newbie Question about Syntactic Predicates

Fri Nov 7 12:09:09 PST 2003

Hello,

I need to match the following data

NUMBER.OF.SURFACE: 3
NUMBER.OF.STANDALONE: 5

Where all I am really concered about is that the surface has a number
3, and the standalone has a number 5.  I put the following in my Lexer

DIGITS : (0..9)+ ;

DOT : '.' ;

COLON : ':' ;

SURFACE : "NUMBER" DOT "OF" DOT "SURFACE" COLON ;

STANDALONE : "NUMBER" DOT "OF" DOT "STANDALONE" COLON;

WS	:	(	' '
		|	'\t'
		|	'\f' 
		|	(	options {generateAmbigWarnings=false;}
			:	"\r\n"  // DOS
			|	'\r'    // Macintosh
			|	'\n'	// Unix
			)
			{newline();}
		)+

    // now the overall whitespace action -- skip it!
    { $setType(Token.SKIP); }
    ;

And my Parser looks like:

start :  rule1 rule2;

rule1 : SURFACE DIGITS ; 

rule2 : STANDALONE DIGITS ;

with some actions to print out the number it finds.  If k<12 in the
lexer, I get a nondeterminism error, and can see the problem in the
generated Lexer.  But k=12 takes awhile to generate the .java files. 
I can break this up into smaller tokens that are put together in the
parser like:

Lexer:
NUMBER : "NUMBER" ;
OF : "OF" ; 
DOT : '.' ;
STANDALONE : "STANDALONE" ;
SURFACE : "SURFACE" ;
etc.

and then in Parser:
start : rule1
        rule2

rule1 : NUMBER DOT OF DOT STANDALONE COLON DIGITS ; 
rule2 : NUMBER DOT OF DOT SURFACE COLON DIGITS ;

and it works with a smaller k value in the lexer, but it doesn't seem
like the best option and makes the parser harder to construct.  I have
read everything I can find about syntatic predicates, and they seem to
be what I need, but I can't get it work.  I added the following to the
Lexer from above

SURFACE_OR_STANDOFF
	: 	("NUMBER" DOT "OF" DOT "SURFACE"  ) => 
		"NUMBER" DOT "OF" DOT "SURFACE" DOT "TO" DOT "AIR" DOT "THREAT" DOT
"CLASSES" COLON 
		{$setType(SURFACE); }
	|	("NUMBER" DOT "OF" DOT "STANDOFF" DOT "RANGE" DOT "AIRCRAFT" DOT
"CLASSES" COLON ) => 
		"NUMBER" DOT "OF" DOT "STANDOFF"  
		{$setType(STANDOFF);}
	;

What am I missing?  Is there a better way to match large tokens?  I
can tell that the Parser is backtracking at all.  The error message I
get is:

parser exception: line 2:12: expecting 'U', found 'T'
line 2:12: expecting 'U', found 'T'

Which says to me the parser is still trying to match the SURFACE
token.  I tried to define the parser to match SURFACE once and then
STANDALONE, but my head hurts from all the banging.  Thanks for the
help.

Mike Wall

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/