[antlr-interest] Basic predicate question

Kunal Sawlani kunalsawlani at gmail.com
Thu Jul 1 11:33:19 PDT 2010


Hi Larry,
The reason for the second test not getting parsed is the fact that the
('A'|'B') in the rule

    test    :   'TEST' COMMA INT COMMA FLOAT ( 'A' | 'B' )

                             COMMA HEX_DIGIT  HEX_DIGIT    ;
ends up creating two token definitions, with type 9 and 10 respectively. So,
the B in the test returns a token with type 10, rather than a token of type
HEX_DIGIT.

And for the 3rd test, your INT rule says one or more digits. So the 78 in
the third test is returned as an INT from the Lexer, as ANTLR always matches
characters against the longest possible lexer rule, unless specified
otherwise.

Take a look at the generated target file for token definitions.
Hope this helps.



On Thu, Jul 1, 2010 at 2:03 PM, Zeafla, Larry <zeaflal at aai.textron.com>wrote:

> I am new to Antlr, which I am trying to use to parse simple existing
> messages.  The message structure is exceptionally simple and
> straightforward.  Message fields include integer and floating-point
> numbers, single letter codes, and field separator characters.  Each
> individual message type has a narrowly defined structure, needs no look
> ahead, and typically has at most 2 possible tokens for any location in
> the message.
>
>
>
> My problem is that one of the fields is a 2-digit (in ASCII)
> representation of a hex number.  This is known purely from context.  It
> seems there should be a simple technique (probably a predicate), to
> force this behavior.  I just can't seem to find it.
>
>
>
> Here is a short sample grammar to illustrate:
>
>          grammar sample;
>          prog   :   test+ ;
>          test    :   'TEST' COMMA INT COMMA FLOAT ( 'A' | 'B' )
>
>                              COMMA HEX_DIGIT  HEX_DIGIT    ;
>
>          HEX_DIGIT   :  '0'..'9' | 'A'..'F' | 'a'..'f'  ;
>          INT         :  '0'..'9'+ ;
>          FLOAT       :  '0'..'9'+ ('.' '0'..'9'*)? ;
>          COMMA       :  ',' ;
>
> The associated test input is:
>
>          TEST,123,5.6A,2D
>
>          TEST,321,4.20A,3B
>
>          TEST,45,5.68B,78
>
>
>
> For this example, the hex digits are the last 2 characters on each line.
> For the first test statement, parsing is successful.  For the second, I
> get a MismatchedTokenException (0!=0) on the B (the last character).
> For the third, I get a MismatchedTokenException(0!=0)  on the 7 (the
> next to last character).  I am definitely confused.
>
>
>
> Thanks,
>
>
>
>    Larry
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>



-- 
Kunal


More information about the antlr-interest mailing list