[antlr-interest] Binary support

Douglas Godfrey douglasgodfrey at gmail.com
Thu Sep 15 11:32:56 PDT 2011


Antlr is designed to parse stream oriented text files and does not work
well with binary data that may have nulls and line end characters that are
data.

A simple way around this is to write a override to the AntlrInputStream
that "Hexifies" the input, converting every input byte into 2 hexadecimal
digits.

A normal Antlr lexer and parser can then read the hexadecimal stream and
understand the file format.



On 9/15/11 12:27 PM, "kleibi at gmx.net" <kleibi at gmx.net> wrote:

>Hi,
>I searched through the archives and through the ANTLR reference, but I
>got the feeling that building a parser for binary files is a bit hard.
>
>Are there efforts to allow something like the following:
>
>
>Interpretation of size
>
>E.g. in binary formats you often have things like the following:
>
>---------------------------------------------
>| header | size of next block | block | ... |
>---------------------------------------------
>
>If I got everything correct I could handle this by reading the size in a
>size rule, storing it in a variable and pass/use it in a block rule. I
>think it's not very elegant, but should work.
>
>
>Byte alignment
>
>Often you have some sort of byte alignment in binary files. E.g. in a
>four byte alignment you end up with 0 to 3 empty bytes. I think it would
>also be possible to do this using a variable and then calling a rule from
>within an action -- but I find this also not very elegant.
>
>
>Ranges for repetitive rule execution
>
>ANTLR already supports executing a rule
> * exactly one time
> * zero or one time
> * zero or unlimited times
>So I think it shouldn't be a problem to say "execute it at least 3 but
>not more than 89 times", e.g. This would also be nice, because binary
>formats often have especially upper limits in lists.
>
>
>Specifying Hexadecimal values in rules
>
>If I got everything correctly, in current ANTLR versions it's not
>possible to specify hexadecimal (or octal or ...) in rules. Because
>binary files most of the time do not use UTF or ASCII but hexadecimal
>values etc. for specifying magic numbers etc. this would be quite nice.
>
>
>Bit handling
>
>In binary files you often have to extract bits or bit ranges.
>
>
>Perhaps I just didn't find or understand something correctly and some
>things mentioned above are already possible -- then just point me to the
>place where to look at.
>
>
>Bye,
>Andi
>-- 
>Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
>belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
>
>List: http://www.antlr.org/mailman/listinfo/antlr-interest
>Unsubscribe: 
>http://www.antlr.org/mailman/options/antlr-interest/your-email-address




More information about the antlr-interest mailing list