[antlr-interest] Strange lexing behavior ?
Steven Obua
obua at me.com
Sun Aug 2 15:19:38 PDT 2009
Hi,
I am using Antlr v3 to define the syntax of my own programming
language and I stumbled across a strange behavior. I was messing
around with how to best define C-style comments (both single-line and
multi-line) and arrived at the following (these are the first rule
definitions in my combined lexer/parser antlr file):
------------------------------------------------------------------------------------------------------
COMMENT1: '/*' (options {greedy=false;} : .)* '*/' {$channel=HIDDEN;};
fragment
Newline : ('\u000A' | '\u000D' | '\u0085' | '\u000C' | '\u2028' |
'\u2029');
COMMENT2: '//' (~Newline)* Newline* {$channel=HIDDEN;};
---------------------------------------------------------------------------------------------------------
But when parsing a file that starts with three single-line comments it
will report a lexing error already in the first line (and also in the
third line) !! I experimented a little, and now I use the following:
-------------------------------------------------------------------------------------------------------
COMMENT1: '/*' (options {greedy=false;} : .)* '*/' {$channel=HIDDEN;};
fragment
Newline : ('\u000A' | '\u000D' | '\u0085' | '\u000C' | '\u2028' |
'\u2029');
fragment
NotNewline
: ~('\u000A' | '\u000D' | '\u0085' | '\u000C' | '\u2028' | '\u2029');
COMMENT2: '//' NotNewline* Newline* {$channel=HIDDEN;};
----------------------------------------------------------------------------------------------------------
This works! Note that the only difference is that I replaced
(~Newline)* with NotNewline* ...
Now I can parse the file without any errors. Is this a bug of ANTLR or
am I missing some finer points of how Antlr's lexical analysis works ?
Cheers,
Steven Obua
More information about the antlr-interest
mailing list