[antlr-interest] ANTLR 3 matching an exact number or bounded number of items

The Researcher researcher0x00 at gmail.com
Thu Mar 3 16:46:51 PST 2011


Jim,

I have been thinking a lot about what you have said and took for granted
when I said binary data that it refered to data that was not serialized
binary data. So instead of using the phrase binary data the two following
separate definitions are used:
1. serialized binary data
2. not serialized binary data - which I will call stored data structures for
lack of something
    better. This is data that is all binary and has *no* bytes or bit
patterns across
    the entire data stream that can be used to designate any delimits or
anchor points
    for use by a lexer. The data is just values for data structures stored
at certain offsets
    in the file. There is no requirement for the data to be continuous.
Their can be gaps
    between the data structures with invalid data.
    The only tokens for use by an analysis phase are byte and EOF. The byte
token value has
    no meaning until assigned to a field in a structure.

As for serialized binary data, then I have no problem with the concept of
loose with the syntax.
As for stored data structures, it still doesn't seem right, but  I am
restarting to see how it applies.

Another note of mention is that I rarely look at ANTLR as a tool for
translation, or to create Domain Specific Languages, or any of the things
one normally does with ANTLR.
With reference to "Flatland" by Edwin Abbott Abbott, I see it like the 2-D
entity looking at a 3-D entity, depending on the perspective it is one thing
and from another perspective it is a different thing, and yet it is still
one thing.
One of the main uses for me with  ANTLR is to use it to create lots of mini
state machines that can be orchestrated to do something I need. As such I
spend more time breaking with tradition than I do using it as designed.

 Hopefully this makes by goal more clear which is not to apply parsing
techniques to a file, but to read the file into a fully connected set of
data structures.

Again thanks for your input, I due value it and learn from it.

FYI for others
The file specification I am researching creating for an ANTLR 3 grammar is
based on:
 "The Common Language Infrastructure Annotated Standard" by James S. Miller,

ECMA-335 5th edition
"Expert .NET 2.0 IL Assembler" by Serge Lindin

Eric

On Wed, Mar 2, 2011 at 10:53 PM, The Researcher <researcher0x00 at gmail.com>wrote:

> antlr-interest at antlr.org
>
> Jim,
>
> Thanks for responding.
>
> If I don't get the thought across correctly don't shoot me.
>
> I know I didn't mention at the start of this post that I was specifically
> working on binary files.
> If this applies to binary files then I am not seeing how loose with the
> syntax works here and wouldn't mind learning.
>
> The notion I am working with is that there can be only two tokens; one for
> a byte and one for EOF. There are no special bytes reserved for delimiters,
> thus one lexer rule which makes a token out of a single byte. The working
> code currently forgoes the lexer and uses a hand built stream token for
> feeding the parser. It doesn't even have the fillbuffer().
>
> Based on that, all the work falls into the parser. Once in the parser, the
> only places I know of that have definite patterns are the two magic numbers.
> i.e MZ and PE/0/0 for two out the possible tens of structures. There is no
> guarantee that magic numbers are unique in the file. The only means that I
> know to read a binary files is based on structure definitions, values in
> those structures and offsets into the file.
>
> Thanks, Eric
>
>
>
>
> On Wed, Mar 2, 2011 at 9:52 PM, Jim Idle <jimi at temporal-wave.com> wrote:
>
>> That's because what you ask for is not the right way to do it. As per
>> prior posts ad nausea (as Private Eye would say), you should be as loose
>> as you can be with the syntax rules and make lots of semantic checks in a
>> later pass. If you do what you are asking for you will end up with an
>> error message such as:
>>
>> Syntax error at 2B, not expecting 2B, expecting ';'
>>
>> Whereas with a semantic check, you will parse as many as you see and
>> produce a tree (or you could issue this in the parser but best not to
>> really), that with a verification walk says:
>>
>> ... 33 44 55 66 2B 43 33 ;
>>                ^^^^^^^^
>>
>> Too many bytes specified, there can be only 4 and you have 7.
>>
>> You will also maximize the number of real errors that you can capture in
>> one pass.
>>
>> Jim
>>
>> > -----Original Message-----
>> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> > bounces at antlr.org] On Behalf Of The Researcher
>> > Sent: Wednesday, March 02, 2011 6:09 PM
>> > To: antlr-interest at antlr.org
>> > Subject: [antlr-interest] ANTLR 3 matching an exact number or bounded
>> > number of items
>> >
>> > Does ANTLR 3 have built-in support for matching an exact number or
>> > bounded number of items that does not rely on using {...}?=>
>> >
>> > e.g for a 32 bit value of four bytes the rule statement would be
>> >     byte[4]
>> >
>> > or for an a structure that has a bound of elements between 1 and 16 the
>> > rule statement would be
>> >     struc[1:16]
>> > While ANTLR 3 uses [ ] for rule parameters, here [ ] is used to signify
>> > element bounds.
>> >
>> > I have looked high and low for this, and found nothing tangible.
>> >
>> > Abusing {...}?=> works, but I would like to stop abusing it.
>> >
>>  > List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>> > email-address
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
>


More information about the antlr-interest mailing list