[antlr-interest] Generated lexer is affected by parser rules?! A bug?
Terence Parr
parrt at cs.usfca.edu
Tue May 20 10:16:40 PDT 2008
uh wait. sorry.The generated code as a bug as you say; I was looking
at the NFA and DFA representations. Adding a bug.
http://www.antlr.org:8888/browse/ANTLR-268
Ter
On May 20, 2008, at 10:11 AM, Terence Parr wrote:
> These Both worked perfectly with 3.1b1. in one you are calling a
> rule, which is the only difference I see.
> Ter
> On May 17, 2008, at 4:35 AM, Haralambi Haralambiev wrote:
>
>> Just revised the very simple grammar.
>>
>> Could someone point out what is the difference between the
>> following two grammars:
>> -----------
>> lexer grammar testStringLiteral1;
>>
>> StringLiteral : Apos ~Apos* Apos;
>>
>> fragment
>> Apos : '\'';
>> -----------
>>
>> and
>>
>> -----------
>> lexer grammar testStringLiteral2;
>>
>> StringLiteral : '\'' ~'\''* '\'';
>> -----------
>>
>> When generated to Java file - they differ, while I expected not to!
>>
>> -Hari
>>
>> On 5/17/08, Haralambi Haralambiev <hharalambiev at gmail.com> wrote:
>> Hello,
>>
>> A colleague of mine is working on some grammar and I was bemused
>> when she told me that a string literal '50' was throwing an error,
>> while the '00' was not throwing.
>>
>> The exception said "mismatched character '5' expecting set null".
>>
>> So, I started investigating... the lexer rule for string literal is
>> the following:
>> -----------
>> fragment
>> Apos : '\'';
>>
>> StringLiteral: Apos ~Apos* Apos
>> -----------
>>
>> Everything seemed fine, except that in the generated java code, the
>> mStringLiteral method had the following line:
>>
>> -----------
>> mApos();
>> // ...NewTest.g:84:9: (~ Apos )*
>> loop2:
>> do {
>> int alt2=2;
>> int LA2_0 = input.LA(1);
>>
>> if ( ((LA2_0>='\u0000' && LA2_0<='&')||(LA2_0>='(' &&
>> LA2_0<='\uFFFE')) ) {
>> alt2=1;
>> }
>>
>> switch (alt2) {
>> case 1 :
>> // ...NewTest.g:197:9: ~ Apos
>> {
>> if ( (input.LA(1)>='\u0000' && input.LA(1)<='4')||(input.LA(1)>='6'
>> && input.LA(1)<='\uFFFE') ) {
>> input.consume();
>>
>> }
>> -----------
>>
>> This was totally unexpected (checking if the character is different
>> than '5'), so I did the following experiment:
>> • I removed all the parser rules.
>> • I changed the grammar to a lexer grammar.
>> When I generated the lexer, the corrupt if statement mentioned
>> above was changed to the following:
>>
>> -----------
>> switch (alt2) {
>> case 1 :
>> // ...NewTest.g:84:9: ~ Apos
>> {
>> if ( (input.LA(1)>='\u0000' && input.LA(1)<='\u0014')||
>> (input.LA(1)>='\u0016' && input.LA(1)<='\uFFFE') ) {
>> input.consume();
>>
>> }
>> -----------
>>
>> So, now the situation changed and the mentioned string '50' is OK,
>> but it is obvious that the check is wrong.
>>
>> I tested a simple grammar with the Apos and StringLiteral lexer
>> rules only:
>> -----------
>> lexer grammar testStringLiteral;
>>
>> StringLiteral : Apos ~Apos* Apos;
>> Apos : '\'';
>> -----------
>>
>> it generates the following if, which I consider again wrong:
>> -----------
>> if ( (input.LA(1)>='\u0000' && input.LA(1)<='\u0003')||
>> (input.LA(1)>='\u0005' && input.LA(1)<='\uFFFE') ) {
>> input.consume();
>>
>> }
>> -----------
>>
>> Taking into account the things said above,
>> I have two question:
>> • Why the parser rules affect the lexer class?
>> • Why is this if clause before the consume() method different than
>> the if clause that is deciding the alternative?
>> Of course, I assume that I could have made some stupid mistake, so
>> please excuse me if I have done so.
>>
>> Best regards,
>> Hari
>>
>
More information about the antlr-interest
mailing list