[antlr-interest] x86 assembler parser (AT&T syntax)
Joshua LeVasseur
jtl at ira.uka.de
Tue Feb 28 08:00:10 PST 2006
Hello all,
In case anyone has interest in a grammar for assembler: I wrote a
parser, transformer, and emitter for x86 assembler (AT&T syntax). It
successfully parses all assembler produced in a typical Linux kernel
build (both Linux 2.4 and 2.6). Its performance seems pretty nice
(using the Antlr C++ backend).
This was my first Antlr grammar, and written in a big rush, thus
excuse strangeness in my grammar, and feel free to share critique.
The grammar file is here:
http://l4hq.org/cvsweb/cvsweb/~checkout~/afterburner/asm-parser/Asm.g
The project plus build instructions are here:
http://l4ka.org/projects/virtualization/afterburn/
Some notes about the grammar :
Assembler has many mnemonics, and I wanted the mnemonic lookups to be
fast via the hash table. The problem: AT&T syntax appends a suffix
to the mnemonics to denote bit-width. I didn't see a way to separate
the mnemonic from the suffix in the lexer when using the hash table,
and so the grammar is a bit verbose with the mnemonic declarations.
I probably should generate the grammar from another grammar, to avoid
bugs that could easily accompany all of the tedious mnemonic
declarations.
My grammar is covered with manual tree construction commands, because
I want a tree without any syntax residue, i.e., I want the tree nodes
to represent an intention, independent of the syntax. Syntax is
controversial for x86 assembler, because it has two types of syntax,
Intel and AT&T, which are so different that they even reverse the
ordering of source and destination operands. The grammar could be
the basis of a tool that transforms between the two syntax types
(although perhaps the grammar is too attached to AT&T syntax).
The parser is incomplete for lack of time, and because I reverse
engineered the GNU assembler syntax (commands, macros, etc.) for lack
of a concise definition. My goal for the grammar is to transform the
x86 instructions that are sensitive to privilege level, while
ignoring everything else, and thus I implemented an overly broad
grammar that probably accepts illegal assembler (and probably raises
errors on valid assembler).
Joshua
More information about the antlr-interest
mailing list