[antlr-interest] Customizing token separators without recompiling

Dukie Banderjee dukie_banderjee at hotmail.com
Sun Jun 7 16:45:49 PDT 2009


"If you simply want to break apart a line of text based on an arbitrary
delimiter, it would be much easier to write a program in Perl, Python,
Java, etc. that split the text based on a configuration setting."

That's basically what I'm doing right now (in C#, by hand). Are you saying that ANTLR can't work at all with this?

At some level it becomes a parsing issue. Each line has a different meaning, and should perform a different action and/or gather different information.

It seems to me that these files would lend themselves very well to an intermediate AST form. For example, the style of document I showed you earlier was an Ansi 830 format. There is another format which is UN Edifact, which looks like this:
DTM+2:20080523:102'
QTY+1:1500:EA'
SCC+1++D:ZZZ'

Although this looks totally different, it is logically the same information as the previous example I showed (FST*...).

I was hoping to use ANTLR to work on two different grammars to translate the raw text into tokens, which could further be translated into a generic command tree (basically to add records into a DB) that would be functionally equivalent whether it originally came from Ansi 830 or UN Edifact.

It seems to me that ANTLR would have been a good tool to use to do this translation. I'd rather not be forced to do the entire thing by hand just because of this token separator issue.

Is there a way I could perform the token splitting manually (as you suggest), but then feed the resulting tokens into an ANTLR-generated parser to do the rest of the work?

Thanks,

Rob

Date: Sun, 7 Jun 2009 15:02:09 -0700
From: jsrs701 at yahoo.com
Subject: RE: [antlr-interest] Customizing token separators without recompiling
To: antlr-interest at antlr.org; dukie_banderjee at hotmail.com

Oh, I'm saying you wouldn't want to use a grammar at all.  The problem you've described is lexical, not grammatical.  If you simply want to break apart a line of text based on an arbitrary delimiter, it would be much easier to write a program in Perl, Python, Java, etc. that split the text based on a configuration setting.

If further parsing needs to happen on the newly-split fields, then you can attack that problem piecemeal on an individual basis.

Make sense?


_________________________________________________________________
We are your photos. Share us now with Windows Live Photos.
http://go.microsoft.com/?linkid=9666047


More information about the antlr-interest mailing list