[antlr-interest] why it fails without predicates - Lexer issue

Johannes Luber jaluber at gmx.de
Wed Oct 1 10:49:49 PDT 2008


Manikandan Subramanian schrieb:
> Hi,
>  
> I got to identify integer, decimal and EOD tokens (End of Description,
> like we have in Copybook)
>  
> the following is the grammar
>  
> *
> 
> grammar
> 
> * TestDecimal;
> 
> *
> 
> options
> 
> * {
> 
> language = Java;
> 
> }
> 
> @lexer::members
> 
> {
> 
> *private* boolean isNotEOD()
> 
> {
> 
> *return* (((*char*)input.LA(2)!= *'\f'*) && ((*char*)input.LA(2)!=
> *'\t'*) &&((*char*)input.LA(2)!= *'\r'*) && ((*char*)input.LA(2)!=
> *'\n'*) &&((*char*)input.LA(2)!= *' '*) && (input.LA(2)!= -1));
> 
> }
> 
> }
> 
> @members
> 
> {
> 
> *public* static void main(/String/ args[]) *throws* /Exception/ {
> 
> ANTLRStringStream input =
> 
> *new* ANTLRStringStream(args[0]);
> 
> Lexer lexer =
> 
> *null*;
> 
> lexer =
> 
> *new* TestDecimalLexer(input);
> 
> CommonTokenStream tokens =
> 
> *null*;
> 
> tokens =
> 
> *new* CommonTokenStream(lexer);
> 
> *for*(Object obj: tokens.getTokens())
> 
> System.out.println(obj);
> 
> TestDecimalParser parser =
> 
> *new* TestDecimalParser (tokens);
> 
> parser.document();
> 
> }
> 
> }
> 
> document
> 
> : /INT/ /WS/? /INT/ /WS/? /EOD/;
> 
> /
> 
> INT
> 
> / : /DIGIT/+ ;
> 
> /
> 
> Decimal_
> 
> / : /INT/ /DOT/ /INT;/
> 
> /
> 
> EOD
> 
> / : '.' (/SS/+ | /EOF/);
> 
> *
> 
> fragment
> 
> * /DOT/ : {isNotEOD()}? => '.';
> 
> *
> 
> fragment
> 
> * /DIGIT/ : ('0'..'9');
> 
> *
> 
> fragment
> 
> * /SS/ : (' ' | '\t' | '\f' | '\r' | '\n');
> 
> /
> 
> WS
> 
> / : /SS/+;
> 
> }
> 
> I got the input "01 00. " and got following error:
> 
> line 1:5 rule DOT failed predicate: {isNotEOD()}?
> 
> [@0,0:1='01',<4>,1:0]
> 
> [@1,2:2=' ',<5>,1:2]
> 
> [@2,5:6='. ',<6>,1:5]
> 
> line 1:5 missing INT at '. '
> 
> If I replace the Decimal_  token definition with syntatic predicates
> like below, it works fine.
> // 
> /Decimal_/ : (/INT/) => /INT/ {_type=INT;}| (/INT/ /DOT/) => /INT/ /DOT/
> /_INT_/;
>  
> why it is not able to identify the input correctly with out syntatic
> predicates?
>  
> Why it fails to identify "00. " as INT EOD.
>  
> Is there any way to resolve this without syntatic/sematic predicates. I
> would like to resolve this issue with just production rules.
>  
> Thanks in advance.

Probably the cause lies in the fact that ANTLR lexers don't switch the
token type once they have chosen one. There are enough cases where the
current behaviour isn't enough. Possibly 3.2 contains a fix for this
issue, so you can't use another solution which isn't predicate-dependent
for now.

Johannes
>  
> Regards,
> Mani
> 
> 
> ------------------------------------------------------------------------
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 



More information about the antlr-interest mailing list