[antlr-interest] token precedence (and an ANTLRworks question)

Mon Nov 17 00:54:11 PST 2008

I am having what looks like a problem with rule precedence.

I have lexer rules that look as following:

        IDENTIFIER
        	: CHAR (CHAR | DIGIT)*
        	;

        TOKEN
        	: ~(NEWLINE|','|'>')+
        	;

        NEWLINE
        	: '\n'          // Line feed
                | '\r'          // Carriage return
                | '\u2028'      // Line separator
                | '\u2029'      // Paragraph separator
        	;

        fragment
        CHAR
        	: 'A' .. 'Z'
        	| 'a' .. 'z'
        	;

        fragment
        DIGIT
        	: '0' .. '9'
        	;

I'm trying to use it to parse the following text ('**' and '/' appear in
the parser rules):

	LINE,1500,4,60,60
	**INPUT/NOSICHECK

Into a token stream:

        |LINE|,|1500|,|4|,|60|,|60|
        |**|INPUT|/|NOSICHECK|

But instead what I'm ending up with is:

        |LINE|,|1500|,|4|,|60|,|60|
        |**INPUT/NOSICHECK|

This suggests to me that it's wrong of me to assume that the first rule
will be matched first. I can't find much discussion of precedence rules
in the ANTLR book.

Also, the ANTLRworks debugger can show you the token stream with little
red boxes around each token, but I can't seem to work out how to find
out the token type for that token, is there something I'm missing here?

Thanks in advance,
--davyd

-- 
Davyd Madeley        Software Engineer
Fugro Seismic Imaging, Perth Australia