[antlr-interest] Order of token matching

Jim Idle jimi at temporal-wave.com
Wed Sep 3 09:21:00 PDT 2008


On Wed, 2008-09-03 at 18:14 +0200, Jenny Balfer wrote:

> Thanks for that, but unfortunately this does not solve the problem. I
> declared MLCOM etc. as fragment, but COMMENT and IMPL must not be fragments
> in order to skip them. 


Have you shown all of your grammar here?

Jim

> 
> On Wed, 03 Sep 2008 09:05:48 -0700, Jim Idle <jimi at temporal-wave.com>
> wrote:
> > On Wed, 2008-09-03 at 18:00 +0200, Jenny Balfer wrote:
> > 
> >> Hello guys,
> >>
> >> I think I have too little understanding of the work of my lexer. I
> > thought
> >> the rules that are specified first are matched first, but in my grammar
> >> this is not the case.
> >> What I am trying to do is first skipping all comments of my source
> > files,
> >> and then skipping everything between curly braces:
> >>
> > 
> > 
> > Make sure that any token that you don't want returned to the parser is a
> > fragment:
> > 
> > fragment
> > MLCOM : '/*' ;
> > 
> > etc. Then you should have more luck, your comment lead-ins are matching
> > the MLCOM and SLCOM rules and then likely throwing recognition errors
> > for the rest up until the '{'
> > 
> > Jim
> > 
> > 
> >> MLCOM	:	'/*'
> >> 	;
> >> SLCOM	:	'//'
> >> 	;
> >> RCOM	:	'*/'
> >> 	;
> >> NL	:	'\r'			{skip();}
> >> 	|	'\n'			{skip();}
> >> 	;
> >> WS	:	' '			{$channel=HIDDEN;}
> >> 	|	'\t'			{skip();}
> >> 	;
> >>
> >> COMMENT	:	SLCOM (options{greedy=false;}: .)* NL		{skip();}
> >> 	|	MLCOM (options{greedy=false;}: .)* RCOM		{skip();}
> >> 	;
> >> IMPL	:	'{' (IMPL|'}')* '}'	{skip();}
> >> 	;
> >>
> >> Rule IMPL matches everything between curly braces, but in between counts
> >> them (by recursively calling itself).
> >> Now the problem appears if there are braces in comments:
> >>
> >> someFunction = function(a,b) {
> >>    // this is one brace too much: {
> >> }
> >>
> >> My lexer now sees the opening brace in the comment and searches for the
> >> closing one until the end of file, which results in:
> >> mismatched character '<EOF>' expecting '}'
> >>
> >> What I want my lexer to do is first sort out all comments, and second
> > sort
> >> out everything between curly braces. Are there any predicates that could
> >> cause this?
> >>
> >> Thanks!
> >>
> >>
> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe:
> >
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >>
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080903/02555f75/attachment.html 


More information about the antlr-interest mailing list