[antlr-interest] parsing binary files

Thomas N. thn-d at gmx.de
Mon Dec 22 06:57:16 PST 2008


Hi,

I'm updating the Classfile.g grammar to Antlr 3.1.1 and run into some time consuming difficulties. It seems that Antlr3 is not designed to develop byte parsers, but I thing in case of Java class files it makes sense. (I'm from the ArgoUML project btw, and we're using Antlr for a long time in our classfile import module.)

Can you help me and give feedback on the following two approaches?

The first attempt is to let the Lexer read a stream on bytes. I miserably failed doing so because a Lexer always deals with CharTokens, which is a huge obstacle. Also, all provided XXXStream classes are based on chars. Can I abandon using a lexer, or is this still the way to go? How?

The second approach is: don't use a lexer at all (why having a lexer for a byte stream anyway?). I made a ByteTokenStream and fed the parser with it, and my grammar has no lexer rule at all. Then I run into the problem of dealing with literals, so I created a list of 256 tokens X00..XFF (!) and utilized tha fact that there is a direct mapping from the token type to the byte value (one constant offset). It works, but is this fine?

If you want to see some code, I can attach some working stuff (approach 2).

Thanks in advance,
Thomas Neustupny
-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger


More information about the antlr-interest mailing list