[antlr-interest] Generated lexer is affected by parser rules?! A bug?

Terence Parr parrt at cs.usfca.edu
Tue May 20 10:11:35 PDT 2008


These Both worked perfectly with 3.1b1.  in one you are calling a  
rule, which is the only difference I see.
Ter
On May 17, 2008, at 4:35 AM, Haralambi Haralambiev wrote:

> Just revised the very simple grammar.
>
> Could someone point out what is the difference between the following  
> two grammars:
> -----------
> lexer grammar testStringLiteral1;
>
> StringLiteral : Apos ~Apos* Apos;
>
> fragment
> Apos : '\'';
> -----------
>
> and
>
> -----------
> lexer grammar testStringLiteral2;
>
> StringLiteral : '\'' ~'\''* '\'';
> -----------
>
> When generated to Java file - they differ, while I expected not to!
>
> -Hari
>
> On 5/17/08, Haralambi Haralambiev <hharalambiev at gmail.com> wrote:  
> Hello,
>
> A colleague of mine is working on some grammar and I was bemused  
> when she told me that a string literal '50' was throwing an error,  
> while the '00' was not throwing.
>
> The exception said "mismatched character '5' expecting set null".
>
> So, I started investigating... the lexer rule for string literal is  
> the following:
> -----------
> fragment
> Apos	:	'\'';
>
> StringLiteral:	Apos ~Apos* Apos
> -----------
>
> Everything seemed fine, except that in the generated java code, the  
> mStringLiteral method had the following line:
>
> -----------
> mApos();
> // ...NewTest.g:84:9: (~ Apos )*
> loop2:
> do {
> int alt2=2;
> int LA2_0 = input.LA(1);
>
> if ( ((LA2_0>='\u0000' && LA2_0<='&')||(LA2_0>='(' &&  
> LA2_0<='\uFFFE')) ) {
> alt2=1;
> }
>
> switch (alt2) {
> case 1 :
> // ...NewTest.g:197:9: ~ Apos
> {
> if ( (input.LA(1)>='\u0000' && input.LA(1)<='4')||(input.LA(1)>='6'  
> && input.LA(1)<='\uFFFE') ) {
> input.consume();
>
> }
> -----------
>
> This was totally unexpected (checking if the character is different  
> than '5'), so I did the following experiment:
> 	• I removed all the parser rules.
> 	• I changed the grammar to a lexer grammar.
> When I generated the lexer, the corrupt if statement mentioned above  
> was changed to the following:
>
> -----------
> switch (alt2) {
> case 1 :
> // ...NewTest.g:84:9: ~ Apos
> {
> if ( (input.LA(1)>='\u0000' && input.LA(1)<='\u0014')|| 
> (input.LA(1)>='\u0016' && input.LA(1)<='\uFFFE') ) {
> input.consume();
>
> }
> -----------
>
> So, now the situation changed and the mentioned string '50' is OK,  
> but it is obvious that the check is wrong.
>
> I tested a simple grammar with the Apos and StringLiteral lexer  
> rules only:
> -----------
> lexer grammar testStringLiteral;
>
> StringLiteral	:	Apos ~Apos* Apos;	
> Apos	 :	'\'';
> -----------
>
> it generates the following if, which I consider again wrong:
> -----------
> if ( (input.LA(1)>='\u0000' && input.LA(1)<='\u0003')|| 
> (input.LA(1)>='\u0005' && input.LA(1)<='\uFFFE') ) {
> input.consume();
>
> }
> -----------
>
> Taking into account the things said above,
> I have two question:
> 	• Why the parser rules affect the lexer class?
> 	• Why is this if clause before the consume() method different than  
> the if clause that is deciding the alternative?
> Of course, I assume that I could have made some stupid mistake, so  
> please excuse me if I have done so.
>
> Best regards,
> Hari
>



More information about the antlr-interest mailing list