[antlr-interest] How does INTEGER+ '.' INTEGER+ match "2."?

John B. Brodie jbb at acm.org
Sun Aug 8 18:21:46 PDT 2010


On Sun, 2010-08-08 at 20:50 -0400, Kevin J. Cummings wrote:
> On 08/08/2010 08:35 PM, Ken Klose wrote:
> > Thanks for replying.
> > 
> > 2. is not a valid PRICE.  PRICE should have at least 1 digit following the
> > '.'.  In the context of the string that I am trying to match "2." doesn't
> > have any particular significance, it is neither an INTEGER nor a PRICE.  It
> > is simply an INTEGER following by an SYMBOL token.  What I don't understand
> > is why ANTLR is getting hung up trying to match it as a PRICE when it
> > doesn't conform to the PRICE specification.  PRICE specifies another INTEGER
> > following the '.' which this input doesn't have.
> 
> Ken,
> 	What you are missing is that PRICE is that PRICE is a token.  Tokens
> get matched based on longest possible match.  Once the lexer sees that
> it has an INTEGER followed by a '.', its path is chosen.  Its either a
> PRICE or its an error (which you are seeing).  If that is not your
> intent, then you need to fix your lexer so that it knows better.
> 
> Gerald poses a possible solution.  But, perhaps he doesn't go far
> enough.  Would the following work for you?
> 
> INTEGER: DIGIT+ ( '.' DIGIT+ { $type=PRICE; } )?
>        ;
> 
> Now, if the lexer sees an INTEGER followed by a '.', it *must* be
> followed by DIGITs otherwise, it will just lex an INTEGER, and then try
> and deal with the '.' character....
> 

this is (i think) one of the very rare instances where a Syntactic
Predicate is appropriate -- because the implicit look-ahead involved is
clearly bounded. generally you should avoid any predicates and/or
back-tracking because of the potential unbounded look-ahead. but that is
not an issue in this intance.

so try:

INTEGER: DIGIT+ ( ('.' DIGIT)=> '.' DIGIT+ {$type=PRICE;} )? ;

where PRICE is an imaginary token defined in a tokens{} block before any
rule in your grammar.

also, as an aside, ... i would be *VERY* worried by your SYMBOL lexer
rule --- use of the negation meta-syntax has always given me more
problems than solutions. please be sure to unit-test the heck out of
that puppy ;-) YMMV

> > On Sun, Aug 8, 2010 at 7:28 PM, Gerald Rosenberg <gerald at certiv.net> wrote:
> > 
> >>
> >> ------ Original Message (Sunday, August 08, 2010 6:42:55 PM) From: Ken
> >> Klose ------
> >> Subject: [antlr-interest] How does INTEGER+ '.' INTEGER+ match "2."?
> >>
> >>  INTEGER: DIGIT+;
> >>> PRICE: INTEGER '.' INTEGER;
> >>>
> >> Integer and price are ambiguous and, if "2." is a valid price, need to make
> >> the decimal field optional.
> >>
> >> Try:
> >>
> >> INTEGER : DIGIT+
> >>                        (  '.' (DIGIT+)? { $type=PRICE; }  // define PRICE
> >>  in the token block
> >>                         |  // just an integer
> >>                        )
> >>                ;
> >>



More information about the antlr-interest mailing list