[antlr-interest] x86 assembler parser (AT&T syntax)
jtl at ira.uka.de
Tue Feb 28 08:00:10 PST 2006
In case anyone has interest in a grammar for assembler: I wrote a
parser, transformer, and emitter for x86 assembler (AT&T syntax). It
successfully parses all assembler produced in a typical Linux kernel
build (both Linux 2.4 and 2.6). Its performance seems pretty nice
(using the Antlr C++ backend).
This was my first Antlr grammar, and written in a big rush, thus
excuse strangeness in my grammar, and feel free to share critique.
The grammar file is here:
The project plus build instructions are here:
Some notes about the grammar :
Assembler has many mnemonics, and I wanted the mnemonic lookups to be
fast via the hash table. The problem: AT&T syntax appends a suffix
to the mnemonics to denote bit-width. I didn't see a way to separate
the mnemonic from the suffix in the lexer when using the hash table,
and so the grammar is a bit verbose with the mnemonic declarations.
I probably should generate the grammar from another grammar, to avoid
bugs that could easily accompany all of the tedious mnemonic
My grammar is covered with manual tree construction commands, because
I want a tree without any syntax residue, i.e., I want the tree nodes
to represent an intention, independent of the syntax. Syntax is
controversial for x86 assembler, because it has two types of syntax,
Intel and AT&T, which are so different that they even reverse the
ordering of source and destination operands. The grammar could be
the basis of a tool that transforms between the two syntax types
(although perhaps the grammar is too attached to AT&T syntax).
The parser is incomplete for lack of time, and because I reverse
engineered the GNU assembler syntax (commands, macros, etc.) for lack
of a concise definition. My goal for the grammar is to transform the
x86 instructions that are sensitive to privilege level, while
ignoring everything else, and thus I implemented an overly broad
grammar that probably accepts illegal assembler (and probably raises
errors on valid assembler).
More information about the antlr-interest