[antlr-interest] Order of token matching

Jenny Balfer ai06087 at Lehre.BA-Stuttgart.De
Wed Sep 3 09:34:10 PDT 2008


No, it is too long. But I reproduced the error in a short one. 
An example for the occuring error would be the following string:

isWorking = function(param1,param2) {
	some implementation;
	some expressions;
}

function throwsError(param1, param2) {
	// this is a nasty comment {
	something else
}

function isIgnored() {
	// lexer is still searching for a closing brace
}

***********
* GRAMMAR *
***********

program	:	statement*
	;
statement
	:	'(' statement ')'
	|	declaration
	;	
declaration
	:	ID '=' FUNCTION '(' paramList ')'
	|	FUNCTION ID '(' paramList ')' 
	;	
paramList
	:	ID (',' ID)*
	;					

fragment LDOC
	:	'/**'
	;
fragment MLCOM	
	:	'/*'
	;
fragment SLCOM	
	:	'//'
	;
fragment RCOM
	:	'*/'
	;
FUNCTION:	'function'
	;
COMMENT
	:	SLCOM (options{greedy=false;}: .)* NL		{skip();}
	|	MLCOM (options{greedy=false;}: .)* RCOM		{skip();}
	;
IMPL
	:	'{' (IMPL|~'}')* '}'	{skip();}
	;	
NL
	:	'\r'			{skip();}
	|	'\n'			{skip();}
	;
WS
	:	' '			{$channel=HIDDEN;}
	|	'\t'			{skip();}
	;
ID	:	( LETTER | '$' | '_' )	( LETTER | '$' | '_' | DIGIT )*
	;
fragment LETTER
	:	'A'..'Z'
	|	'a'..'z'
	;
fragment DIGIT
	:	'0'..'9'
	;			
DEFAULT	:	.
	;		


On Wed, 03 Sep 2008 09:21:00 -0700, Jim Idle <jimi at temporal-wave.com>
wrote:
> On Wed, 2008-09-03 at 18:14 +0200, Jenny Balfer wrote:
> 
>> Thanks for that, but unfortunately this does not solve the problem. I
>> declared MLCOM etc. as fragment, but COMMENT and IMPL must not be
> fragments
>> in order to skip them.
> 
> 
> Have you shown all of your grammar here?
> 
> Jim
> 
>>
>> On Wed, 03 Sep 2008 09:05:48 -0700, Jim Idle <jimi at temporal-wave.com>
>> wrote:
>> > On Wed, 2008-09-03 at 18:00 +0200, Jenny Balfer wrote:
>> >
>> >> Hello guys,
>> >>
>> >> I think I have too little understanding of the work of my lexer. I
>> > thought
>> >> the rules that are specified first are matched first, but in my
> grammar
>> >> this is not the case.
>> >> What I am trying to do is first skipping all comments of my source
>> > files,
>> >> and then skipping everything between curly braces:
>> >>
>> >
>> >
>> > Make sure that any token that you don't want returned to the parser is
> a
>> > fragment:
>> >
>> > fragment
>> > MLCOM : '/*' ;
>> >
>> > etc. Then you should have more luck, your comment lead-ins are
> matching
>> > the MLCOM and SLCOM rules and then likely throwing recognition errors
>> > for the rest up until the '{'
>> >
>> > Jim
>> >
>> >
>> >> MLCOM	:	'/*'
>> >> 	;
>> >> SLCOM	:	'//'
>> >> 	;
>> >> RCOM	:	'*/'
>> >> 	;
>> >> NL	:	'\r'			{skip();}
>> >> 	|	'\n'			{skip();}
>> >> 	;
>> >> WS	:	' '			{$channel=HIDDEN;}
>> >> 	|	'\t'			{skip();}
>> >> 	;
>> >>
>> >> COMMENT	:	SLCOM (options{greedy=false;}: .)* NL		{skip();}
>> >> 	|	MLCOM (options{greedy=false;}: .)* RCOM		{skip();}
>> >> 	;
>> >> IMPL	:	'{' (IMPL|'}')* '}'	{skip();}
>> >> 	;
>> >>
>> >> Rule IMPL matches everything between curly braces, but in between
> counts
>> >> them (by recursively calling itself).
>> >> Now the problem appears if there are braces in comments:
>> >>
>> >> someFunction = function(a,b) {
>> >>    // this is one brace too much: {
>> >> }
>> >>
>> >> My lexer now sees the opening brace in the comment and searches for
> the
>> >> closing one until the end of file, which results in:
>> >> mismatched character '<EOF>' expecting '}'
>> >>
>> >> What I want my lexer to do is first sort out all comments, and second
>> > sort
>> >> out everything between curly braces. Are there any predicates that
> could
>> >> cause this?
>> >>
>> >> Thanks!
>> >>
>> >>
>> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> >> Unsubscribe:
>> >
>>
>
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>> >>
>>



More information about the antlr-interest mailing list