[antlr-interest] trouble with lexer prediction DFA
Daniel Killebrew
killebrew.daniel at gmail.com
Tue Dec 22 18:32:49 PST 2009
Hello,
I am encountering some trouble with my lexer. What I am trying to do is
make a lexer that handles source text that is given to the lexer one
line at a time (how Visual Studio works with language services, also
what Sam Harwell has been doing). There are multiple types of tokens
that can be split across lines, among them is the C style comment:
/*foo*/ My main lexer finds the start of a /* comment, and then I
switch to another lexer to identify the continuation or end of it. I
tried using gated semantic predicates to turn on parts of my grammar
when inside a multiline comment; that did not work too well either, but
that's another story. I am using Antlr version 3.2 from Sep 23rd.
The following grammar produces a mTokens() prediction DFA that loops
forever when given the test input '*/' I assume this is a bug and
unintended behavior. Or is my understanding of Antlr lacking (in which
case an explanation would be appreciated)?
I tested in 3 different targets, Java, CSharp2, and Sam's CSharp3, they
all loop forever. Turning on/off greedy and backtracking doesn't seem to
help, I still get a bad mTokens() rule. If I access the rules
individually, through calls to mENDMULTILINECOMMENT() or
mCONTINUEMULTILINECOMMENT(), they seem to work as expected.
In english, what I want the grammar to do, and what I think it should be
doing:
ENDMULTILINECOMMENT: match zero or more of ('*' not followed by '/', or
anything that's not end of line, end of file) followed by '*/'
CONTINUEMULTILINECOMMENT: match zero or more of ('*' not followed by
'/', or anything that's not end of line, end of file) followed by end of
line
Regardless, Antlr is really cool and the rest of my lexer works well.
Thanks to Terence and the rest who have created it.
Thanks in advance,
Daniel
lexer grammar CommentLexer;
options {
language=Java;
}
ENDMULTILINECOMMENT
: (options{greedy=false;}:
('*' ~'/')=> '*'
| ~('*' | ENDOFLINEFRAGMENT | ENDOFFILEFRAGMENT))*
'*/'
;
CONTINUEMULTILINECOMMENT
: (options{greedy=false;}:
('*' ~'/')=> '*'
| ~('*' | ENDOFLINEFRAGMENT | ENDOFFILEFRAGMENT))*
ENDOFLINEFRAGMENT
;
fragment
ENDOFLINEFRAGMENT
: '\n' | '\u2029' | '\u2028'
;
fragment
ENDOFFILEFRAGMENT
: ('\u0000' | '\u001A')
;
More information about the antlr-interest
mailing list