[antlr-interest] Order of token matching

Jenny Balfer ai06087 at Lehre.BA-Stuttgart.De
Wed Sep 3 09:00:53 PDT 2008


Hello guys,

I think I have too little understanding of the work of my lexer. I thought
the rules that are specified first are matched first, but in my grammar
this is not the case. 
What I am trying to do is first skipping all comments of my source files,
and then skipping everything between curly braces:

MLCOM	:	'/*'
	;
SLCOM	:	'//'
	;
RCOM	:	'*/'
	;
NL	:	'\r'			{skip();}
	|	'\n'			{skip();}
	;
WS	:	' '			{$channel=HIDDEN;}
	|	'\t'			{skip();}
	;

COMMENT	:	SLCOM (options{greedy=false;}: .)* NL		{skip();}
	|	MLCOM (options{greedy=false;}: .)* RCOM		{skip();}
	;
IMPL	:	'{' (IMPL|'}')* '}'	{skip();}
	;

Rule IMPL matches everything between curly braces, but in between counts
them (by recursively calling itself). 
Now the problem appears if there are braces in comments:

someFunction = function(a,b) {
   // this is one brace too much: {
}

My lexer now sees the opening brace in the comment and searches for the
closing one until the end of file, which results in:
mismatched character '<EOF>' expecting '}'

What I want my lexer to do is first sort out all comments, and second sort
out everything between curly braces. Are there any predicates that could
cause this?

Thanks!



More information about the antlr-interest mailing list