[antlr-interest] Common left prefix for Antlr tokens...

Stuart Dootson stuart.dootson at gmail.com
Mon Jan 16 07:08:48 PST 2012


Hello

One of my colleagues has been using Antlr 3 to create a lexer/parser
for the L5K language (used to program Allen-Bradley PLCs). This has
proceeded generally well, until coming across a little problem.

The problem is with the array literal start token ('[') and an
'extended property' indicator ('[[[___'). More specifically, nested
arrays with no whitespace between the outer and inner array start, for
example "[[1], 2]", are interpreted by Antlr as an extended property
introduction, causing a "mismatched character" exception.

I have come up with a workaround, by overriding the 'emit' and
'nextToken' methods of the lexer, to allow the strings "[[" and "[[["
to be converted to multiple "[" tokens through calling 'emit' in
actions, but was wondering if this use-case can be implemented without
requiring this extra code, through use of one or more options on the
grammar/rules?

A minimal Antlr grammar is appended...

Stuart Dootson

grammar arrays;

stat
	:	array
	|	EXTENDED_PROP
	;

array
	:	 LSQ value ( ',' value)* RSQ
	;

value
	:	INT
	|	array
	;
	
INT	:	('0' .. '9')+
	;


EXTENDED_PROP
	: '[[[___'
	;
	
LSQ	:	'['
	;

RSQ	:	']'
	;

WS	: (' '|'\n'|'\r')+ {$channel=HIDDEN;}
	;


More information about the antlr-interest mailing list