[antlr-interest] [Antlr3 grammar] how to specify alpha token, numeric token and mix of both

Hieu Phung phungngochieu at gmail.com
Thu Oct 22 20:45:38 PDT 2009


Hi David,

Thank you for your suggestion. However, a MIX can start with number, '_' and
'.'  :((

Actually I am trying to write a CIMP message format in Antlr. Reference:
http://www.parse2.com/example-cargoimp-FFA4.shtml

Alpha   = %x41-5A;
Numeric = %x30-39;
Decimal = %x30-39 / ".";
Mixed   = Alpha / Numeric;
Text    = %x41-5A / %x30-39 / "." / "-" / " ";   <--- this is my MIX token

This format can be written in ABNF easily... but in Antlr, once I introduce
the MIX token, everything which is mixed of numeric and alpha is returned as
a MIX. Currently I have to use Java code in action to split the MIX string.
I wonder if there's a better way to define tokens because my grammar now is
full of Java code :(! For example:

manifestHeader
    :((n=NUMBER) SLANT (r1=field) SLANT (r2=field) SLANT (r3=ALPHA) (SLANT
(r4=field)?)? )
    {
    ffm.setAttribute("MessageSequenceNumber", $n.text);
    ffm.setAttribute("CarrierCode", $r1.value.substring(0,2));
    ffm.setAttribute("FlightNumber", $r1.value.substring(2));
    ffm.setAttribute("Day", $r2.value.substring(0,2));
    ffm.setAttribute("Month", $r2.value.substring(2));
    ffm.setAttribute("AirportCode", $r3.text);
    if ($r4.value != null) ffm.setAttribute("AircraftIdentification",
$r4.text);
    }
    ;

Regards,
Helen


> Message: 1
> Date: Thu, 22 Oct 2009 03:20:47 +0100
> From: David-Sarah Hopwood <david-sarah at jacaranda.org>
> Subject: Re: [antlr-interest] [Antlr3 grammar] how to specify alpha
>        token, numeric token and mix of both
> To: antlr-interest at antlr.org
> Message-ID: <4ADFC17F.1050000 at jacaranda.org>
> Content-Type: text/plain; charset=UTF-8
>
> Hieu Phung wrote:
> > Hi all,
> >
> > My grammar has 3 kinds of tokens:
> > 1) number: contain numeric character
> > 2) alpha: contain alphabetic character;
> > 3) mix: contain number and alpha and hyphen, full stop or space
> >
> > For example:
> > 1/VEC305/03MAR/PTY
> > => in the above input data, 03MAR should be interpreted as a number of
> > length 2 followed by alpha of length 3. But VEC305 is a mix of length 6.
> >
> > If I define grammar like below:
> >
> > NUMBER    : ('0'..'9')+ ;
> > ALPHA    : ('a'..'z'|'A'..'Z')+;
> > MIX    : (NUMBER | ALPHA | OTHER)+;
> > fragment OTHER    : (' ' | '-' | '.')+;
> > SLANT    :    '/';
> >
> > Antlr will return me VEC305 and 03MAR as two MIX tokens. Is there any way
> to
> > define tokens such that Antlr will return me number, slant, mix, slant,
> > number, alpha, slant, alpha for the input "1/VEC305/03MAR/PTY" ?
>
> Since you don't want "03MAR" to be interpreted as a MIX, presumably you
> mean that a MIX cannot start with a NUMBER. In that case, try:
>
>  fragment DIGIT  : '0'..'9' ;
>  fragment LETTER : 'a'..'z' | 'A'..'Z' ;
>  fragment SYMBOL : ' ' | '-' | '.' ;
>
>  NUMBER : DIGIT+ ;
>  ALPHA  : LETTER+ ;
>  MIX    : LETTER+ (DIGIT | SYMBOL) (DIGIT | LETTER | SYMBOL)*
>         | SYMBOL (DIGIT | LETTER | SYMBOL)*
>         ;
>  SLANT  : '/';
>
> --
> David-Sarah Hopwood  ?  http://davidsarah.livejournal.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20091023/46dec364/attachment.html 


More information about the antlr-interest mailing list