[antlr-interest] Generated lexer is affected by parser rules?! A bug?
Haralambi Haralambiev
hharalambiev at gmail.com
Sat May 17 01:44:47 PDT 2008
Hello,
A colleague of mine is working on some grammar and I was bemused when
she told me that a string literal '50' was throwing an error, while the '00'
was not throwing.
The exception said "mismatched character '5' expecting set null".
So, I started investigating... the lexer rule for string literal is the
following:
-----------
fragment
Apos : '\'';
StringLiteral: Apos ~Apos* Apos
-----------
Everything seemed fine, except that in the generated java code, the
mStringLiteral method had the following line:
-----------
mApos();
// ...NewTest.g:84:9: (~ Apos )*
loop2:
do {
int alt2=2;
int LA2_0 = input.LA(1);
if ( ((LA2_0>='\u0000' && LA2_0<='&')||(LA2_0>='(' && LA2_0<='\uFFFE')) ) {
alt2=1;
}
switch (alt2) {
case 1 :
// ...NewTest.g:197:9: ~ Apos
{
*if ( (input.LA(1)>='\u0000' && input.LA(1)<='4')||(input.LA(1)>='6' &&
input.LA(1)<='\uFFFE') ) {*
input.consume();
}
-----------
This was totally unexpected (checking if the character is different than
'5'), so I did the following experiment:
- I removed all the parser rules.
- I changed the grammar to a lexer grammar.
When I generated the lexer, the corrupt if statement mentioned above was
changed to the following:
-----------
switch (alt2) {
case 1 :
// ...NewTest.g:84:9: ~ Apos
{
*if ( (input.LA(1)>='\u0000' &&
input.LA(1)<='\u0014')||(input.LA(1)>='\u0016' && input.LA(1)<='\uFFFE') ) {
*
input.consume();
}*
*-----------
So, now the situation changed and the mentioned string '50' is OK, but it is
obvious that the check is wrong.
I tested a simple grammar with the Apos and StringLiteral lexer rules only:
-----------
lexer grammar testStringLiteral;
StringLiteral : Apos ~Apos* Apos;
Apos : '\'';
-----------
it generates the following if, which I consider again wrong:
-----------
*if ( (input.LA(1)>='\u0000' &&
input.LA(1)<='\u0003')||(input.LA(1)>='\u0005' && input.LA(1)<='\uFFFE') ) {
*
input.consume();
}
-----------
Taking into account the things said above,
I have two question:
- Why the parser rules affect the lexer class?
- Why is this if clause before the consume() method different than the if
clause that is deciding the alternative?
Of course, I assume that I could have made some stupid mistake, so please
excuse me if I have done so.
Best regards,
Hari
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080517/b163a662/attachment.html
More information about the antlr-interest
mailing list