[antlr-interest] Binary support

The Researcher researcher0x00 at gmail.com
Thu Sep 15 09:44:08 PDT 2011


Hi Andi,

Yes it can be done with ANTLR, but ANTLR is not the correct tool for parsing
binary files.

The closest example for a binary file is *Java .class file
grammar*<http://www.antlr.org/grammar/1147639104266/classfile.tar.gz>

Basically you will be using semantic predicates for everything, which is
like calling assembly from C. On the surface it may appear to be ANTLR, but
in reality is abusing ANTLR to do something it was not primarily designed to
do. Also, these will be one off programs. You will have to create a new one
for each file layout.

Something you might want to do, but this is reinventing the wheel, is to
create your own grammar that defines binary layouts, and then use that as
input into a driver that reads the binary file. I have done both of these
and the latter is the better option.

Eric

On Thu, Sep 15, 2011 at 12:27 PM, <kleibi at gmx.net> wrote:

> Hi,
> I searched through the archives and through the ANTLR reference, but I got
> the feeling that building a parser for binary files is a bit hard.
>
> Are there efforts to allow something like the following:
>
>
> Interpretation of size
>
> E.g. in binary formats you often have things like the following:
>
> ---------------------------------------------
> | header | size of next block | block | ... |
> ---------------------------------------------
>
> If I got everything correct I could handle this by reading the size in a
> size rule, storing it in a variable and pass/use it in a block rule. I think
> it's not very elegant, but should work.
>
>
> Byte alignment
>
> Often you have some sort of byte alignment in binary files. E.g. in a four
> byte alignment you end up with 0 to 3 empty bytes. I think it would also be
> possible to do this using a variable and then calling a rule from within an
> action -- but I find this also not very elegant.
>
>
> Ranges for repetitive rule execution
>
> ANTLR already supports executing a rule
>  * exactly one time
>  * zero or one time
>  * zero or unlimited times
> So I think it shouldn't be a problem to say "execute it at least 3 but not
> more than 89 times", e.g. This would also be nice, because binary formats
> often have especially upper limits in lists.
>
>
> Specifying Hexadecimal values in rules
>
> If I got everything correctly, in current ANTLR versions it's not possible
> to specify hexadecimal (or octal or ...) in rules. Because binary files most
> of the time do not use UTF or ASCII but hexadecimal values etc. for
> specifying magic numbers etc. this would be quite nice.
>
>
> Bit handling
>
> In binary files you often have to extract bits or bit ranges.
>
>
> Perhaps I just didn't find or understand something correctly and some
> things mentioned above are already possible -- then just point me to the
> place where to look at.
>
>
> Bye,
> Andi
> --
> Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
> belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list