[antlr-interest] How can I avoid "mismatched input" error?

Tue Mar 24 03:06:38 PDT 2009

2009/3/24 Gabriel Petrovay <gabriel.petrovay at 28msec.com>:
> Hi Indhu,
>
> I was trying to simplify the example such that I still get the error and the
> example is simple enough for everybody to understand the problem.
>
> Here is the corrected grammar:
>
> //========================================
> grammar k;
> options {
> output=AST;
> }
>
> rule : KEYWORD1 (KEYWORD2 Name)? ';' ;
>
> KEYWORD1 : 'keywordA';
> KEYWORD2 : 'keywordB';
>
> Name : ('a'..'z' | 'A'..'Z')+ ;
> S : ('\t' | ' ' | '\n' | '\r')+  { $channel = HIDDEN; } ;
> //========================================
>
> With this the problems you mentioned are eliminated.
>
> As I can see your proposed solution is not scalable if I have the keywords:
> keywordA, keywordB,...,keywordZ, and the Name rules: Name1, Name2,...,
> NameN. Or is it?
>
> Any solution for this?
>
I think your fundamental problem is you have failed to realise that
lexing is not sensitive to parser context in ANTLR. Lexing occurs
entirely seperately to parsing. So you can't have multiple lexer rules
covering the same input.
You could have e.g. seperate rules for uppercase names and lowercase
names. In this case either each keyword would fall under a single name
rule or, if keywords were case insensitive, you could have something
like:
keyKEYWORD1
     :    {input.LT(1).getText().toLower().equals("keyword1")}? (UCName|LCName);

An alternate solution is to specify the keywords in the lexer as you
have and then have a parser rule like:
name: Name | KEYWORD1 | KEYWORD2;
This will perform slightly better but requires that you add keywords
in two places.

Both methods are perfectly scalable to any number of keyword\name rules.

Tom.

>
> Regards,
> Gabriel
>
>
> On Tue, Mar 24, 2009 at 9:29 AM, Indhu Bharathi <indhu.b at s7software.com>
> wrote:
>>
>> Looks like you are trying to use keyword as identifier. AFAIK, this cannot
>> be resolved in the lexer. You have to use predicates in the parser rule.
>> Something like this:
>>
>> rule : keyKEYWORD1 (keyKEYWORD2 enc=Name)? ';' ;
>>
>> keyKEYWORD1
>>     :    {input.LT(1).getText().equals("keyword1")}? Name ;
>>
>> keyKEYWORD2
>>     :    {input.LT(1).getText().equals("keyword2")}? Name ;
>>
>>
>> One more problem I see is the production "Name : Letter* ;". Lexer
>> production cannot define a zero length string.
>>
>> Another problem is you are expecting 'keyword1' to be parsed as Name but
>> production for Name doesn't allow numbers.
>>
>> - Indhu
>>
>> Gabriel Petrovay wrote:
>>
>> Hi all,
>>
>> I have the following grammar file:
>>
>> //========================================
>> grammar k;
>> options {
>> output=AST;
>> }
>>
>> rule : KEYWORD1 (KEYWORD2 enc=Name)? ';' ;
>>
>> KEYWORD1 : 'keyword1';
>> KEYWORD2 : 'keyword2';
>>
>> Name : Letter* ;
>> fragment Letter : 'a'..'z' | 'A'..'Z' ;
>>
>> S            :    ('\t' | ' ' | '\n' | '\r')+  { $channel = HIDDEN; } ;
>> //========================================
>>
>>
>> The following text is not a valid one.
>>
>> INPUT:
>> =====
>> keyword1 keyword2 keyword1 ;
>>
>> OUTPUT:
>> =======
>> line 1:18 mismatched input 'keyword1' expecting Name
>> <mismatched token: [@4,18:25='keyword1',<4>,1:18], resync=keyword1
>> keyword2 keyword1 ;>
>>
>>
>> How can I make a parser to recognize this input? I want to be able to
>> allow the keywords in the places where any char combination is allowed. How
>> can I make this?
>>
>> Regards,
>> Gabriel
>>
>> ________________________________
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
>
>
> --
> MSc Gabriel Petrovay
> MCSA, MCDBA, MCAD
> Mobile: +41(0)787978034
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>