[antlr-interest] Token order dependence in V3 grammars

Montebello asmith at moncons.co.uk
Wed Aug 16 14:14:48 PDT 2006


Hi all,

I am a newbie with ANTLR and doubtless should not be allowed anywhere
near V3. However, I have found an effect that was unexpected to me, ie
that the order of tokens in a grammar can have repercussions. If the
following grammar is run, starting at the unsignedInteger rule then it
will fail to recognise single digit integers, but will find two digits
or more. That is "5" in a file on its own will cause a Mismatched Token
Exception but "55" will be found correctly.

The grammar passes the ANTLRWorks Grammar/Check test and shows no errors
on code generation or compilation. Here is the (rather foolish) grammar:

// This is a cut down grammar to demonstrate a problem
// in ANTLR v3 with the recognition of single digit
// integers. It is intended only as a debugging aid.

grammar UnsInt;
options {k=2; backtrack=true; memoize=true;output=AST;}

//----------------------------------------------------------------------------
// The parser
//----------------------------------------------------------------------------


unsignedInteger
    : NUM_INT
    ;


//----------------------------------------------------------------------------
// The lexer
//----------------------------------------------------------------------------

//----------------------------------------------------------------------------
// Tokens
//----------------------------------------------------------------------------

CHARACTER
	: ~('\\'|'"')
	;

// a numeric literal.  Form is digits

NUM_INT : ('0'..'9')+ ;

If the CHARACTER and NUM_INT tokens are reversed in their order in the
grammar, or if CHARACTER is deleted entirely, then everything behaves as
expected and gives no error on either input.

All the parts have been stripped out of other working grammars to show
the point. A similar grammar in V2.7.6 did not show the effect.

Is this a bug, or merely something which needs better syntax or semantic
error checking? It certainly lost me about half a day finding the needle
in a haystack of 150+ rules.

Thanks,
Andrew Smith


More information about the antlr-interest mailing list