[antlr-interest] Re: Lexer - length/position as token delimiter?
angrymongoose
angrymongoose at yahoo.com
Sat May 1 05:41:04 PDT 2004
Hello Mark,
The fields making up a tag are defined in the grammar so I am following your suggestion
and having some of the `lexical analysis' performed by the parser.
I implemented a subset of the grammar in order to parse one message as proof of concept
and I am pretty happy with the results. However, because the parser is doing a lot of the
work, which ideally would be done by the lexical analyzer, we are concerned about
performance overhead.
I will complete the grammar for our sample message type and run a batch of messages
through it to get an idea of the performance.
Thanks for your help,
Norman
--- In antlr-interest at yahoogroups.com, Mark Lentczner <markl at g...> wrote:
> As offen is the case, the problems are with your grammar, not the
> ability to lex or parse it.
>
> > :23B:CRED
> > :32A:000612USD5443,99
> > :33B:USD5443,99
>
> Does the grammar know from the tag what the format of the tag body
> should be? Or can any tag have any tag _body format? If the later is
> the case, then the grammar is almost certainly inherently ambiguous and
> you won't be able to get far. (Unless the tag_body formats are far
> more restricted than I'm guessing from your example.)
>
> Here's an example:
>
> :33X:12040678,99
>
> Unless the grammar says something about tag "33X", there is no way to
> know if this is should be parsed as:
> 1) a date, "120406" and an amount "78,99"
> or 2) an amount "12040678,99"
>
> Assuming there is a way to know from the tag what to expect from the
> tag_body, then I'd approach this by putting most of the work in the
> parser, not the lexer.
>
> In the lexer I'd have:
>
> class ScriptLexer extends Lexer;
> options { testLiterals = false; }
>
> TAG options{testLiterals=true;}: ':' DIGIT DIGIT LETTER ':';
> DIGIT: '0'..'9';
> COMMA: ',';
> LETTER: 'A'..'Z';
>
> In the parser I'd define rules for each tag_body format:
>
> transaction: (LETTER)+;
> date: DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT;
> currency: LETTER LETTER LETTER;
> value: (DIGIT)+ (COMMA (DIGIT)+)?;
> amount: currency value;
> dated_amount: date amount;
>
> Then each I'd run the rest of the parser like:
>
> message : headers entry+ trailer;
> line : (
> ":23B:" transaction
> | ":32A:" dated_amount
> | ":33B:" amount
> );
>
> Notice the trick of allowing the literal test in the TAG rule, and then
> using all the tag names as literals in the parser.
>
> - Mark
>
> Mark Lentczner
> markl at w...
> http://www.wheatfarm.org/
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list