[antlr-interest] How can I avoid "mismatched input" error?

Tue Mar 24 04:16:28 PDT 2009

Hi Tom,

So in your solution I would have to do something like this:

name1: Name1 | KEYWORD1 | KEYWORD2 | ... KEYWORDM;
name2: Name1 | KEYWORD1 | KEYWORD2 | ... KEYWORDM;
...
nameN: NameN | KEYWORD1 | KEYWORD2 | ... KEYWORDM;

This is what I meant when I said scalable. If you have 50 keywords, I think
that the production for each name rule is simply too large. This comes with
the cost of very expensive parsing. Isn't it?

Regards,
Gabriel

On Tue, Mar 24, 2009 at 11:06 AM, Thomas Brandon <tbrandonau at gmail.com>wrote:

> 2009/3/24 Gabriel Petrovay <gabriel.petrovay at 28msec.com>:
> > Hi Indhu,
> >
> > I was trying to simplify the example such that I still get the error and
> the
> > example is simple enough for everybody to understand the problem.
> >
> > Here is the corrected grammar:
> >
> > //========================================
> > grammar k;
> > options {
> > output=AST;
> > }
> >
> > rule : KEYWORD1 (KEYWORD2 Name)? ';' ;
> >
> > KEYWORD1 : 'keywordA';
> > KEYWORD2 : 'keywordB';
> >
> > Name : ('a'..'z' | 'A'..'Z')+ ;
> > S : ('\t' | ' ' | '\n' | '\r')+  { $channel = HIDDEN; } ;
> > //========================================
> >
> > With this the problems you mentioned are eliminated.
> >
> > As I can see your proposed solution is not scalable if I have the
> keywords:
> > keywordA, keywordB,...,keywordZ, and the Name rules: Name1, Name2,...,
> > NameN. Or is it?
> >
> > Any solution for this?
> >
> I think your fundamental problem is you have failed to realise that
> lexing is not sensitive to parser context in ANTLR. Lexing occurs
> entirely seperately to parsing. So you can't have multiple lexer rules
> covering the same input.
> You could have e.g. seperate rules for uppercase names and lowercase
> names. In this case either each keyword would fall under a single name
> rule or, if keywords were case insensitive, you could have something
> like:
> keyKEYWORD1
>     :    {input.LT(1).getText().toLower().equals("keyword1")}?
> (UCName|LCName);
>
> An alternate solution is to specify the keywords in the lexer as you
> have and then have a parser rule like:
> name: Name | KEYWORD1 | KEYWORD2;
> This will perform slightly better but requires that you add keywords
> in two places.
>
> Both methods are perfectly scalable to any number of keyword\name rules.
>
> Tom.
>
> >
> > Regards,
> > Gabriel
> >
> >
> > On Tue, Mar 24, 2009 at 9:29 AM, Indhu Bharathi <indhu.b at s7software.com>
> > wrote:
> >>
> >> Looks like you are trying to use keyword as identifier. AFAIK, this
> cannot
> >> be resolved in the lexer. You have to use predicates in the parser rule.
> >> Something like this:
> >>
> >> rule : keyKEYWORD1 (keyKEYWORD2 enc=Name)? ';' ;
> >>
> >> keyKEYWORD1
> >>     :    {input.LT(1).getText().equals("keyword1")}? Name ;
> >>
> >> keyKEYWORD2
> >>     :    {input.LT(1).getText().equals("keyword2")}? Name ;
> >>
> >>
> >> One more problem I see is the production "Name : Letter* ;". Lexer
> >> production cannot define a zero length string.
> >>
> >> Another problem is you are expecting 'keyword1' to be parsed as Name but
> >> production for Name doesn't allow numbers.
> >>
> >> - Indhu
> >>
> >> Gabriel Petrovay wrote:
> >>
> >> Hi all,
> >>
> >> I have the following grammar file:
> >>
> >> //========================================
> >> grammar k;
> >> options {
> >> output=AST;
> >> }
> >>
> >> rule : KEYWORD1 (KEYWORD2 enc=Name)? ';' ;
> >>
> >> KEYWORD1 : 'keyword1';
> >> KEYWORD2 : 'keyword2';
> >>
> >> Name : Letter* ;
> >> fragment Letter : 'a'..'z' | 'A'..'Z' ;
> >>
> >> S            :    ('\t' | ' ' | '\n' | '\r')+  { $channel = HIDDEN; } ;
> >> //========================================
> >>
> >>
> >> The following text is not a valid one.
> >>
> >> INPUT:
> >> =====
> >> keyword1 keyword2 keyword1 ;
> >>
> >> OUTPUT:
> >> =======
> >> line 1:18 mismatched input 'keyword1' expecting Name
> >> <mismatched token: [@4,18:25='keyword1',<4>,1:18], resync=keyword1
> >> keyword2 keyword1 ;>
> >>
> >>
> >> How can I make a parser to recognize this input? I want to be able to
> >> allow the keywords in the places where any char combination is allowed.
> How
> >> can I make this?
> >>
> >> Regards,
> >> Gabriel
> >>
> >> ________________________________
> >>
> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe:
> >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >>
> >
> >
> >
> > --
> > MSc Gabriel Petrovay
> > MCSA, MCDBA, MCAD
> > Mobile: +41(0)787978034
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
> >
>

-- 
MSc Gabriel Petrovay
MCSA, MCDBA, MCAD
Mobile: +41(0)787978034
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090324/61abde2a/attachment.html