[antlr-interest] Using fuzzy lexer to build AST

Thu Feb 22 15:30:04 PST 2007

> ANTLRv3b6
>
> Is it possible to build an AST using a fuzzy lexer? Do I need to make
> 3 separate ANTLR source files (one for the lexer, one for the parser,
> and one for the tree parser)?

I have the same problem, except I don't need a tree. I had to have two 
files, one for the lexer and another one for the parser, because I am lazy 
and I wanted to say filter=true in the lexer because then I don't have to 
define every token in the language I am parsing. Since there is no way to 
specify filter=true in a combined grammar, I had to use 2 grammar files: 
lexer and parser.

The idea I have followed is to strategically place a catch all rule in the 
parser that takes every token the lexer passes and simply does nothing 
with them (no tree construction, not printing to screen). The catch all is 
a massive rule OR'ing every non-fragment tokens of the lexer. All 
non-defined language keywords end up being swallowed as IDENTIFIERs (and 
ignored in the catch all parser rule). I am lucky enough that the parser 
knows when to go in catch all mode (simply the parsing context), and when 
to get out (there is a keyword).

Everything works according to plans for me, except that the lexer has 
problems distinguishing between keywords and identifiers... see my earlier 
posting today. I don't know what I'll do with that. Maybe I'll have to ask 
the coders to not hide keywords inside identifiers (like "int_is_nice" 
because the lexer see "int" in it). This would suck though.

Good luck,
Martin