[antlr-interest] Strange lexing behavior ?

Steven Obua obua at me.com
Sun Aug 2 15:19:38 PDT 2009


Hi,

I am using Antlr v3 to define the syntax of my own programming  
language and I stumbled across a strange behavior. I was messing  
around with how to best define C-style comments (both single-line and  
multi-line) and arrived at the following (these are the first rule  
definitions in my combined lexer/parser antlr file):

------------------------------------------------------------------------------------------------------
COMMENT1:	'/*' (options {greedy=false;} : .)* '*/' {$channel=HIDDEN;};


fragment
Newline	:	('\u000A' | '\u000D' | '\u0085' | '\u000C' | '\u2028' |  
'\u2029');

COMMENT2:	'//' (~Newline)* Newline* {$channel=HIDDEN;};
---------------------------------------------------------------------------------------------------------

But when parsing a file that starts with three single-line comments it  
will report a lexing error already in the first line (and also in the  
third line) !! I experimented a little, and now I use the following:

-------------------------------------------------------------------------------------------------------
COMMENT1:	'/*' (options {greedy=false;} : .)* '*/' {$channel=HIDDEN;};


fragment
Newline	:	('\u000A' | '\u000D' | '\u0085' | '\u000C' | '\u2028' |  
'\u2029');
	
fragment	
NotNewline
	:	~('\u000A' | '\u000D' | '\u0085' | '\u000C' | '\u2028' | '\u2029');

COMMENT2:	'//' NotNewline* Newline* {$channel=HIDDEN;};
----------------------------------------------------------------------------------------------------------

This works! Note that the only difference is that I replaced  
(~Newline)* with NotNewline* ...

Now I can parse the file without any errors. Is this a bug of ANTLR or  
am I missing some finer points of how Antlr's lexical analysis works ?

Cheers,

Steven Obua





More information about the antlr-interest mailing list