[antlr-interest] Binary support

Ron Burk ronburk at gmail.com
Thu Sep 15 15:03:28 PDT 2011


> I got the feeling that building a parser for binary files is a bit hard.

In general, most applications don't want to "parse" a binary file.
It is already "parsed" in many (most?) cases; looking like nothing
so much as in-memory data structures linearized the least amount
necessary for disk. Many applications take great pains to *not*
have to read more of the data than they have to, to avoid the
disk I/O.

Binary file formats also often just aren't directly representable
by context free grammars. For example, a header may contain
offsets of different objects, and the sizes of those objects may
have to be inferred from the difference in offsets. Grammars,
despite looking seductively similar because of having recursively
nested constructs in common, aren't a great match for this domain.

One could imagine useful domain-specific languages for
binary file formats, but they might not look quite like
grammar tools, and a single language might not be sufficient
for all tasks. For example, generating code for a target language
(e.g., C) to access a binary file format might call for a fairly
different syntax than a language that makes it easy to reverse engineer
existing unknown file formats by building up a format
description a bit at a time.

IMHO ::D


More information about the antlr-interest mailing list