[antlr-interest] Location dependent token?

Oliver Zeigermann oliver.zeigermann at gmail.com
Wed Dec 31 09:36:33 PST 2008


Well, looking at this now, this is far too complicated. Just do:

expression : NUMBER MONETARY_SYMBOL (MEASUREMENT_SYMBOL)? ;

MONETARY_SYMBOL
  : SYMBOL
  ;

MEASUREMENT_SYMBOL
  : '/' SYMBOL { /* A little bit of Java code that strips the leading '/' }
  ;

fragment SYMBOL : A-Z A-Z A-Z ;

- Oliver

2008/12/31 Oliver Zeigermann <oliver.zeigermann at gmail.com>:
> Hi Mats!
>
> If you really need to distinguish monetary from measurement unit in
> the _lexer_ - which I doubt for the same reasons as the others
> answering - you could add a semantic predicate.
>
> Modifying Jared's grammar might lead to this:
>
> @lexer::members {
>  protected boolean isMonetarySymbol = true;
> }
>
>
> expression : NUMBER MONETARY_SYMBOL (SLASH MEASUREMENT_SYMBOL)? ;
>
> SLASH : '/' { isMonetarySymbol = false; };
>
> MONETARY_SYMBOL
>   : {isMonetarySymbol}? SYMBOL
>   ;
>
> MEASUREMENT_SYMBOL
>   : {!isMonetarySymbol}? SYMBOL { isMonetarySymbol = true; }
>   ;
>
> fragment SYMBOL : A-Z A-Z A-Z ;
>
>
> Be careful to set the predicate in the lexer, though!
>
> Oliver
>
> 2008/12/29 Jared Bunting <jared.bunting at peachjean.com>:
>> If the three-letter words can be anything, can you just define one token
>> that matches 3 uppercase letters?  Your parser should be able to tell
>> what's what based on context.
>>
>> maybe something like this?
>>
>> expression : NUMBER SYMBOL ('/' SYMBOL)? ;
>>
>> SYMBOL : A-Z A-Z A-Z ;
>>
>> -Jared
>>
>> Mats Ekberg wrote:
>>> Ok, maybe I was a bit unsharp.
>>> Monetary units are expressed as three-letter words; EUR GBP and so on.
>>> Measurement unitts are also expressed with three letters; TNE KGM and
>>> so on.
>>>
>>> The only way to know which is which is where the three letters are
>>> located. In one location its a monetary and another its a measurement.
>>>
>>> ok?
>>>
>>> regards
>>> mats
>>>
>>> mån 2008-12-29 klockan 08:10 -0600 skrev Gary R. Van Sickle:
>>>> > From: Mats Ekberg
>>>> >
>>>> > Lets say a three letter word in uppercase can mean one of two
>>>> > tings like:
>>>> >
>>>> >   10  EUR
>>>> > where EUR means a monetary unit
>>>> >
>>>> >   10 EUR / TNE
>>>> > where EUR still means a monetary unit but the three letters
>>>> > TNE now means a measurement uniot.
>>>> >
>>>> > How can that be expressed in a grammar??
>>>> >
>>>> > /mats
>>>>
>>>> Your question must be missing some information, because what you're asking
>>>> is the most basic of lexing/parsing issues:
>>>>
>>>>
>>>> Lexer does something like this:
>>>>
>>>> NUMBER : [0..9]+ ;
>>>>
>>>> EUR : 'EUR' ;
>>>>
>>>> TNE : 'TNE' ;
>>>>
>>>>
>>>> Parser does something like this:
>>>>
>>>> num_with_monetary_unit_and_optional_per_unit
>>>>     : NUMBER monetary_unit ('/' measurement_unit)?
>>>>     ;
>>>>
>>>> monetary_unit
>>>>     : EUR
>>>>     | <<whatever other monies you support>>
>>>>     ;
>>>>
>>>> measurement_unit
>>>>     : TNE
>>>>     | <<whatever other measurement units you support>>
>>>>     ;
>>>>
>>>>
>>>> But was that really your question?
>>>>
>>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>


More information about the antlr-interest mailing list