[antlr-interest] Basic predicate question

Fri Jul 2 12:09:18 PDT 2010

It is true that Antlr is gross overkill for this example.  This example
is just a simplified version of the grammar of a subset of the overall
problem.  The example was selected as a starting point as it appeared to
be a relatively simple grammar to use for learning Antlr.  The overall
problem is much more complex and hopefully a more appropriate use of
Antlr.

          Larry

-----Original Message-----
From: John B. Brodie [mailto:jbb at acm.org] 
Sent: Thursday, July 01, 2010 4:23 PM
To: Zeafla, Larry
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Basic predicate question

Greetings!
On Thu, 2010-07-01 at 14:03 -0400, Zeafla, Larry wrote:
> I am new to Antlr, which I am trying to use to parse simple existing
> messages.  The message structure is exceptionally simple and
> straightforward.  Message fields include integer and floating-point
> numbers, single letter codes, and field separator characters.  Each
> individual message type has a narrowly defined structure, needs no
look
> ahead, and typically has at most 2 possible tokens for any location in
> the message.
> 
Welcome!

Respectfully, in my opinion, using ANTLR for this task seems to be
overkill. Why not just read each message into a String. Use the split()
method on the comma in order to get the fields. And then analyze the
array returned by split(",")? (or maybe regular expressions?)

> My problem is that one of the fields is a 2-digit (in ASCII)
> representation of a hex number.  This is known purely from context.
It
> seems there should be a simple technique (probably a predicate), to
> force this behavior.  I just can't seem to find it.
> 
>  
> 
> Here is a short sample grammar to illustrate:
> 
>           grammar sample;
>           prog   :   test+ ;
>           test    :   'TEST' COMMA INT COMMA FLOAT ( 'A' | 'B' ) 
> 
>                               COMMA HEX_DIGIT  HEX_DIGIT    ;
> 
>           HEX_DIGIT   :  '0'..'9' | 'A'..'F' | 'a'..'f'  ;
>           INT         :  '0'..'9'+ ;
>           FLOAT       :  '0'..'9'+ ('.' '0'..'9'*)? ; 
>           COMMA       :  ',' ;
> 
> The associated test input is:
> 
>           TEST,123,5.6A,2D
> 
>           TEST,321,4.20A,3B
> 
>           TEST,45,5.68B,78            
> 
> 
> 
> For this example, the hex digits are the last 2 characters on each
line.
> For the first test statement, parsing is successful.  For the second,
I
> get a MismatchedTokenException (0!=0) on the B (the last character).
> For the third, I get a MismatchedTokenException(0!=0)  on the 7 (the
> next to last character).  I am definitely confused.

as pointed out in another message in this thread. you have specified
that 'A' and 'B' are keywords in your language and yet you also want
them to be HEX_DIGITs. the lexer can not work out this ambiguity (i
believe). same problem with '0' .. '9' ---- are they a HEX_DIGIT or are
they a single digit INT?

if you really really want to do this task using ANTLR (see above rant
regarding split() and regex's) I think you will have to do all of the
work in the parser.

usually manipulating individual characters in parser rules quickly leads
to parsing ambiguities. but your problem as stated seems to be simple
enough that it will not be a problem (unless you are gonna add more
stuff).

attached please find an alternative grammar of your sample that
illustrates this approach tested with just your 3 sample inputs.

Hope this helps...
   -jbb