[antlr-interest] Ambiguous lexing task

Cliff Hudson cliff.s.hudson at gmail.com
Fri Apr 2 16:37:06 PDT 2010


No, there is no operator '>', so there aren't any additional ambiguities
here.  Thanks.

On Fri, Apr 2, 2010 at 2:56 PM, Daniels, Troy (US SSA) <
troy.daniels at baesystems.com> wrote:

>
>
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org
> > [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Cliff Hudson
> > Sent: Friday, April 02, 2010 4:59 PM
> > To: antlr-interest at antlr.org
> > Subject: Re: [antlr-interest] Ambiguous lexing task
> >
> > I've played around with it a bit, and I modified NAMECHAR to be:
> >
> > fragment NAMECHAR
> >     : LETTER
> >     | DIGIT
> >     | '_'
> >     | {input.LA(2) != '>'}?=> '-'
> >     ;
> >
> > This seems to do the trick.  However, I'm concerned this is
> > not a best practice for this kind of situation.  Could I get
> > a suggestion as to the "correct" way to go about this?
> >
>
> Is it every possible that that text should be interpreted as
>
> my-identifier-  >  foo
>
> (That is, my-identifier- "greater than" foo?) If it is, then the language
> is ambiguous to the lexer and you will have a lot of complications to deal
> with.  If this is not a valid interpretation, then that is a reasonable way
> to handle it.
>
> Troy
>
>
> > On Fri, Apr 2, 2010 at 1:48 PM, Cliff Hudson
> > <cliff.s.hudson at gmail.com>wrote:
> >
> > > I have a string which I need to parse for IDs and
> > operators.  This is
> > > normally pretty easy, but there is one case where a
> > character in the
> > > ID can also match one character in the operator.  The tokens are:
> > >
> > > OP_TRANSFORM : '->'
> > >
> > > ID : (LETTER | '_') (options { greedy=true } : NAMECHAR)*
> > >
> > > fragment NAMECHAR : LETTER | DIGIT | '_' | '-' ;
> > >
> > > LETTER : 'a'..'z' | 'A'..'Z' ;
> > > NUMBER: '0'..'9' ;
> > >
> > >
> > > The issue is in parsing the following string:
> > >
> > > my-identifier->foo
> > >
> > > The ID token of course matches 'my-identifier-', and then I am left
> > > with an extraneous '>'.  Is there a way to construct a set
> > of lexing
> > > rules, possibly with actions, that would correctly separate
> > out the ->
> > > from the ID?  In this case, I want the '-' in OP_TRANSFORM
> > to be the
> > > preferred path and to match '->' even in the above case.
> > >
> > > Thanks.
> > >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >


More information about the antlr-interest mailing list