[antlr-interest] Customizing token separators without recompiling
Dukie Banderjee
dukie_banderjee at hotmail.com
Sun Jun 7 14:30:05 PDT 2009
Hi,
Sorry, I'm not following you. How would that work? E.g. A new customer comes along, they have their format that uses '_' (or whatever), and how do I get the lexer/parser to recognize their file format without re-generating/re-compiling the lexer/parser? What would Perl operate on? The grammar? Wouldn't that require re-generating/re-compiling the lexer?
Rob
Date: Sun, 7 Jun 2009 12:48:50 -0700
From: jsrs701 at yahoo.com
Subject: Re: [antlr-interest] Customizing token separators without recompiling
To: antlr-interest at antlr.org; dukie_banderjee at hotmail.com
Howdy,
I'm guessing there's more to the problem than just supporting arbitrary field separation tokens, because if that's all there is, just use something like perl and store the separator(s) in a config file...?
--S
--- On Sun, 6/7/09, Dukie Banderjee <dukie_banderjee at hotmail.com> wrote:
From: Dukie Banderjee <dukie_banderjee at hotmail.com>
Subject: [antlr-interest] Customizing token separators without recompiling
To: antlr-interest at antlr.org
Date: Sunday, June 7, 2009, 8:25 AM
Hi everyone,
I'm new to the list and new to ANTLR. I have a specific problem I need to solve and I hope ANTLR can help.
Our client has several end-customers who all have slightly different document formats used for data interchange.
All the documents are basically 'standard' EDI documents, meaning they have the same basic syntax. However, some customers will use a '+' to separate values, some will use '*', others will use '~', etc. (I'm reminded of the old saying, "The great thing about standards is that there are so many to choose from!")
So, basically, the following inputs are all basically the same, except for the character used to separate tokens:
FST*4290*D*W*20070607
FST+4290+D+W+20070607
FST~4290~D~W~20070607
The thing is, we don't know ahead of time which separator characters might be used in the future, and we need to be able to tweak each end-customer's file format without re-compiling the
lexer/parser. For example, a year from now there might be a customer who decides to use '_' or '$' or whatever, and we need to provide our client with a simple way (e.g. a per-customer configuration file) to customize the lexer/parser for such situations, without re-generating/re-compiling.
So, is this possible with ANTLR? How would I do this? Would it require a custom Lexer subclass with constructor parameters (e.g. new CustomLexer('_')) or something? How would this mesh with the generated lexer code from ANTLR?
I'm quite new to tools such as ANTLR (and parsers in general), so any help would be much appreciated. I really don't know where to start with this problem. For a hand-coded parser it's fairly simple, but I don't know enough about the workings of ANTLR to see where I would need to tweak it.
Thanks,
Rob
Create a cool, new character for your Windows Live™ Messenger. Check it out
-----Inline Attachment Follows-----
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
_________________________________________________________________
Windows Live helps you keep up with all your friends, in one place.
http://go.microsoft.com/?linkid=9660826
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090607/5960c829/attachment.html
More information about the antlr-interest
mailing list