[antlr-interest] Generated lexer is affected by parser rules?! A bug?

Haralambi Haralambiev hharalambiev at gmail.com
Sat May 17 01:44:47 PDT 2008


Hello,

A colleague of mine is working on some grammar and I was bemused when
she told me that a string literal '50' was throwing an error, while the '00'
was not throwing.

The exception said "mismatched character '5' expecting set null".

So, I started investigating... the lexer rule for string literal is the
following:
-----------
fragment
Apos : '\'';

StringLiteral: Apos ~Apos* Apos
-----------

Everything seemed fine, except that in the generated java code, the
mStringLiteral method had the following line:

-----------
mApos();
// ...NewTest.g:84:9: (~ Apos )*
loop2:
do {
int alt2=2;
int LA2_0 = input.LA(1);

if ( ((LA2_0>='\u0000' && LA2_0<='&')||(LA2_0>='(' && LA2_0<='\uFFFE')) ) {
alt2=1;
}

switch (alt2) {
case 1 :
// ...NewTest.g:197:9: ~ Apos
{
*if ( (input.LA(1)>='\u0000' && input.LA(1)<='4')||(input.LA(1)>='6' &&
input.LA(1)<='\uFFFE') ) {*
input.consume();

}
-----------

This was totally unexpected (checking if the character is different than
'5'), so I did the following experiment:

   - I removed all the parser rules.
   - I changed the grammar to a lexer grammar.

When I generated the lexer, the corrupt if statement mentioned above was
changed to the following:

-----------
switch (alt2) {
case 1 :
// ...NewTest.g:84:9: ~ Apos
{
*if ( (input.LA(1)>='\u0000' &&
input.LA(1)<='\u0014')||(input.LA(1)>='\u0016' && input.LA(1)<='\uFFFE') ) {
*
input.consume();

}*
*-----------

So, now the situation changed and the mentioned string '50' is OK, but it is
obvious that the check is wrong.

I tested a simple grammar with the Apos and StringLiteral lexer rules only:
-----------
lexer grammar testStringLiteral;

StringLiteral : Apos ~Apos* Apos;
Apos : '\'';
-----------

it generates the following if, which I consider again wrong:
-----------
*if ( (input.LA(1)>='\u0000' &&
input.LA(1)<='\u0003')||(input.LA(1)>='\u0005' && input.LA(1)<='\uFFFE') ) {
*
input.consume();

}
-----------

Taking into account the things said above,
I have two question:

   - Why the parser rules affect the lexer class?
   - Why is this if clause before the consume() method different than the if
   clause that is deciding the alternative?

Of course, I assume that I could have made some stupid mistake, so please
excuse me if I have done so.

Best regards,
Hari
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080517/b163a662/attachment.html 


More information about the antlr-interest mailing list