[antlr-interest] How to Parse a datastream of tokens and values

Mon Oct 31 14:20:51 PDT 2011

Sure, you can check, using a gated semantic predicate, if there is no "PR"
ahead when matching the VALUE-token.

Something like this:

grammar T;

@lexer::members {
  private boolean ahead(String text) {
    for(int i = 0; i < text.length(); i++) {
      if(text.charAt(i) != input.LA(i + 1)) {
        return false;
      }
    }
    return true;
  }
}

message
  :  productionReceipt EOF
  ;

productionReceipt
  :  PR VALUE
  ;

PR : 'PR';

VALUE
  :  {!ahead("PR")}?=> ('a'..'z'|'A'..'Z')+
  ;

Regards,

Bart.

On Mon, Oct 31, 2011 at 10:01 PM,
Weiler-Thiessen,David,SASKATOON,Engineering <
David.Weiler-Thiessen at purina.nestle.com> wrote:

>  Hi ****
>
> ** **
>
> Yes, I can see how that is happening.****
>
> ** **
>
> So, in my case, because I have token value pairs, and the values are not
> terminated by something deterministic, I can’t use ANTLR to lex the input
> stream.  Is that correct?****
>
> ** **
>
> Turns out that the input stream is fix length format, so it can be parsed
> in other ways.  I was just thinking that this might be a problem space that
> ANTLR could address also.****
>
> ** **
>
> David Weiler-Thiessen
> Nestlé Purina PetCare
> phone: 306-933-0232
> cell: 306-291-9770 ****
>
> *This e-mail, its electronic document attachments, and the contents of
> its website linkages may contain confidential information. This information
> is intended solely for use by the individual or entity to whom it is
> addressed. If you have received this information in error, please notify
> the sender immediately and promptly destroy the material and any
> accompanying attachments from your system.*****
>
> *From:* Bart Kiers [mailto:bkiers at gmail.com]
> *Sent:* Monday, October 31, 2011 12:09 PM
> *To:* Weiler-Thiessen,David,SASKATOON,Engineering
> *Cc:* antlr-interest at antlr.org
> *Subject:* Re: [antlr-interest] How to Parse a datastream of tokens and
> values****
>
> ** **
>
> Hi David,****
>
> ** **
>
> ANTLR's lexer greedily matches characters: the input "PRCLINTON" is being
> tokenized as a single VALUE-token, not as a PR- and VALUE-token.****
>
> ** **
>
> Regards,****
>
> ** **
>
> Bart.****
>
> ** **
>
> On Mon, Oct 31, 2011 at 6:24 PM, Weiler-Thiessen, David, SASKATOON,
> Engineering <David.Weiler-Thiessen at purina.nestle.com> wrote:****
>
> Hi
>
>
>
> I am trying to parse a string that is a collection of tokens and values.
> For example:
>
> PRCLINTON
>
>
>
> Where PR is my token, and CLINTON is the value for the token.
>
>
>
> I have started a simple grammar, see below, but it won't parse the sample
> above.
>
>
>
> message              :               productionReceipt
>
>                ;
>
>
>
> productionReceipt
>
>                :               PR VALUE
>
>                ;
>
>
>
> PR           :               'PR'
>
>                ;
>
>
>
> VALUE  :               ('a'..'z'|'A'..'Z')+
>
>                ;
>
>
>
>
>
> What am I doing wrong?  I get a MisMatchedTokenException in ANTLRWorks.
>
> David Weiler-Thiessen
> Nestlé Purina PetCare
> phone: 306-933-0232
> cell: 306-291-9770
>
> This e-mail, its electronic document attachments, and the contents of its
> website linkages may contain confidential information. This information is
> intended solely for use by the individual or entity to whom it is
> addressed. If you have received this information in error, please notify
> the sender immediately and promptly destroy the material and any
> accompanying attachments from your system.
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address****
>
> ** **
>