[antlr-interest] Customizing token separators without recompiling

Dukie Banderjee dukie_banderjee at hotmail.com
Mon Jun 8 08:33:33 PDT 2009


Thanks Jim, that looks more like what I originally had in mind.

Rob

----------------------------------------
> From: jimi at temporal-wave.com
> To: dukie_banderjee at hotmail.com
> Subject: Re: [antlr-interest] Customizing token separators without recompiling
> Date: Sun, 7 Jun 2009 18:52:06 -0700
> CC: jsrs701 at yahoo.com; antlr-interest at antlr.org
>
> Hi,
>
> If the entire structure is just these lines then it is likely that a
> parser is overkill to be honest. However you can create a lexer rule
> that changes it's definition at runtime, but you must be careful that
> set of delimiters would never otherwise appear in the input.
>
> What you do is add a member method to the lexer that accepts
> the delimiter then use a gated predicate to select the token:
>
> @lexer::members {
> protected int delim;
> public void setDelim(int d) {
> delim = d;
> }
> }
>
> DELIM : {input.LA(1) == d}?=> . ;
>
> But note that by using this rule, you will always get DELIM for that
> character and so if you had:
>
> SEMI : ';' ;
>
> But set the delimiter to ';' then you would no longer get SEMI.
>
> Perhaps it would be best to write a custom lexer.
>
> EDU is another good idea screwed up by design by comittee where none
> if the members will give up their proprietory formats :(
>
> Jim
>
>
> On Jun 7, 2009, at 4:45 PM, Dukie Banderjee
>  wrote:
>
>>
>> "If you simply want to break apart a line of text based on an
>> arbitrary
>> delimiter, it would be much easier to write a program in Perl, Python,
>> Java, etc. that split the text based on a configuration setting."
>>
>> That's basically what I'm doing right now (in C#, by hand). Are you
>> saying that ANTLR can't work at all with this?
>>
>> At some level it becomes a parsing issue. Each line has a different
>> meaning, and should perform a different action and/or gather
>> different information.
>>
>> It seems to me that these files would lend themselves very well to
>> an intermediate AST form. For example, the style of document I
>> showed you earlier was an Ansi 830 format. There is another format
>> which is UN Edifact, which looks like this:
>> DTM+2:20080523:102'
>> QTY+1:1500:EA'
>> SCC+1++D:ZZZ'
>>
>> Although this looks totally different, it is logically the same
>> information as the previous example I showed (FST*...).
>>
>> I was hoping to use ANTLR to work on two different grammars to
>> translate the raw text into tokens, which could further be
>> translated into a generic command tree (basically to add records
>> into a DB) that would be functionally equivalent whether it
>> originally came from Ansi 830 or UN Edifact.
>>
>> It seems to me that ANTLR would have been a good tool to use to do
>> this translation. I'd rather not be forced to do the entire thing by
>> hand just because of this token separator issue.
>>
>> Is there a way I could perform the token splitting manually (as you
>> suggest), but then feed the resulting tokens into an ANTLR-generated
>> parser to do the rest of the work?
>>
>> Thanks,
>>
>> Rob
>>
>> Date: Sun, 7 Jun 2009 15:02:09 -0700
>> From: jsrs701 at yahoo.com
>> Subject: RE: [antlr-interest] Customizing token separators without
>> recompiling
>> To: antlr-interest at antlr.org; dukie_banderjee at hotmail.com
>>
>> Oh, I'm saying you wouldn't want to use a grammar at all. The
>> problem you've described is lexical, not grammatical. If you simply
>> want to break apart a line of text based on an arbitrary delimiter,
>> it would be much easier to write a program in Perl, Python, Java,
>> etc. that split the text based on a configuration setting.
>>
>> If further parsing needs to happen on the newly-split fields, then
>> you can attack that problem piecemeal on an individual basis.
>>
>> Make sense?
>>
>>
>> _________________________________________________________________
>> We are your photos. Share us now with Windows Live Photos.
>> http://go.microsoft.com/?linkid=9666047
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

_________________________________________________________________
Attention all humans. We are your photos. Free us.
http://go.microsoft.com/?linkid=9666046


More information about the antlr-interest mailing list