[antlr-interest] Problem when parsing numerics
Thomas Woelfle
thomas.woelfle at interactive-objects.com
Wed Mar 4 02:15:25 PST 2009
Hi Jim,
thanks for the reply.
I am still running into the same problem.
The grammer now is:
lexer grammar Simple;
options
{
language = Java;
}
@header
{
package test;
}
fragment DOT_PROG: ;
fragment DOT_SL: ;
fragment DOT_PRINT: ;
fragment DOT_ADD: ;
fragment DOT_SPP: ;
DOT: '.'
(
('PROG')=>'PROG' {$type=DOT_PROG;}
|('SL')=>'SL' {$type=DOT_SL;}
|('PRT')=>'PRT' {$type=DOT_PRINT;}
// |('ADD')=>'ADD' {$type=DOT_ADD;}
|('SPP')=>'SPP' {$type=DOT_SPP;}
)?
;
WORD: ('A'..'Z')+;
given the input ".S" the lexing result is a token DOT and then a token
WORD. But as soon as the comment is removed from the fourth alternative
using the same input the result is "no viable alternative at character
'<EOF>'"
I've read a bit through the generated lexer code. The major difference
between the version that works and the version that fails seems to be
that in the working version no "dfa.predict" call is used. I have no
idea why the ANTLR generator in one case generates code that uses the
DFA and in the other case generates code that doesn't use the DFAs. But
all in all this complete behaviour seems to me like a serious bug in
ANTLR. I've tried the same lexer grammar in JavaCC without any problems.
Is there any way to work around this bug without having to write a lexer
on my own?
Regards,
Thomas
> Thomas Woelfle wrote:
>> Hi,
>>
>> I've been running in an almost similar problem again.
>>
>> The subject language that has to be parsed defines some keywords which
>> begin with a '.'. Besides that there are specific names allowed and '.'
>> is allowed to be a token too.
>>
>> The reduced lexer grammar that produces the problem is:
>>
>> DOT: '.';
>>
>> ARG: ('.ARG')=> '.ARG';
>>
>> ATT: ('.ATT')=> '.ATT';
>>
>> NAME
>> :
>> ('A'..'Z')*;
>>
>>
>>
> This token allows a match of an empty string and is going to cause all
> sorts of problems. You want:
>
> NAME : ('A'..'Z')+;
>
> Then if you still have problems, either do:
>
> DOT : '.';
> ARG: 'ARG';
> ATT : 'ATT';
>
> ident : ID
> | DOT (ARG|ATT)
> ;
>
> Or:
>
> fragment ARG : ; // Define token number and document
> fragment ATT : ; // Define token number and document
> DOT : '.'
> ( ('ARG')=>'ARG' { $type = ARG; }
> | ('ATT')=>'ATT' { $type = ATT; }
> )
>
>
> Jim
> ------------------------------------------------------------------------
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
--
Interactive Objects Software GmbH
Basler Strasse 61
79100 Freiburg, Germany
Phone: +49 761 400 73 0
mailto:thomas.woelfle at interactive-objects.com
------------------------------------------------------------------------
Interactive Objects' Legacy Modernization Solutions
Get Your Applications SOA-Ready!
See http://www.interactive-objects.com/ for more information.
------------------------------------------------------------------------
Interactive Objects Software GmbH | Freiburg | Geschäftsführer: Alberto Perandones, Andrea Hemprich
| AG Frbg. HRB 5810 | USt-ID: DE 197983057
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: SimpleWorks.java
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20090304/317ecef8/attachment.pl
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: SimpleFails.java
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20090304/317ecef8/attachment-0001.pl
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Main.java
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20090304/317ecef8/attachment-0002.pl
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Simple.g
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20090304/317ecef8/attachment-0003.pl
More information about the antlr-interest
mailing list