[antlr-interest] Re: Lexer - length/position as token delimiter?

Sat May 1 05:41:04 PDT 2004

Hello Mark,

The fields making up a tag are defined in the grammar so I am following your suggestion 
and having some of the `lexical analysis' performed by the parser.

I implemented a subset of the grammar in order to parse one message as proof of concept 
and I am pretty happy with the results. However, because the parser is doing a lot of the 
work, which ideally would be done by the lexical analyzer, we are concerned about 
performance overhead.

I will complete the grammar for our sample message type and run a batch of messages 
through it to get an idea of the performance.

Thanks for your help,

Norman

--- In antlr-interest at yahoogroups.com, Mark Lentczner <markl at g...> wrote:
> As offen is the case, the problems are with your grammar, not the 
> ability to lex or parse it.
> 
> > :23B:CRED
> > :32A:000612USD5443,99
> > :33B:USD5443,99
> 
> Does the grammar know from the tag what the format of the tag body 
> should be?  Or can any tag have any tag _body format?  If the later is 
> the case, then the grammar is almost certainly inherently ambiguous and 
> you won't be able to get far.  (Unless the tag_body formats are far 
> more restricted than I'm guessing from your example.)
> 
> Here's an example:
> 
> :33X:12040678,99
> 
> Unless the grammar says something about tag "33X", there is no way to 
> know if this is should be parsed as:
>      1) a date, "120406" and an amount "78,99"
> or  2) an amount "12040678,99"
> 
> Assuming there is a way to know from the tag what to expect from the 
> tag_body, then I'd approach this by putting most of the work in the 
> parser, not the lexer.
> 
> In the lexer I'd have:
> 
> class ScriptLexer extends Lexer;
>      options { testLiterals = false; }
> 
> TAG options{testLiterals=true;}: ':' DIGIT DIGIT LETTER ':';
> DIGIT: '0'..'9';
> COMMA: ',';
> LETTER: 'A'..'Z';
> 
> In the parser I'd define rules for each tag_body format:
> 
> transaction: (LETTER)+;
> date: DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT;
> currency: LETTER LETTER LETTER;
> value: (DIGIT)+ (COMMA (DIGIT)+)?;
> amount: currency value;
> dated_amount: date amount;
> 
> Then each I'd run the rest of the parser like:
> 
> message : headers entry+ trailer;
> line : (
>        ":23B:" transaction
>      | ":32A:" dated_amount
>      | ":33B:" amount
>      );
> 
> Notice the trick of allowing the literal test in the TAG rule, and then 
> using all the tag names as literals in the parser.
> 
> 	- Mark
> 
> Mark Lentczner
> markl at w...
> http://www.wheatfarm.org/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/