[antlr-interest] Fwd: ANTLR generating invalid Java

Fri Aug 8 16:10:07 PDT 2008

On Fri, Aug 8, 2008 at 2:42 PM, Jim Idle <jimi at temporal-wave.com> wrote:
> OK - I am still not following you exactly here. Do you mean that you want
> the spaces and tabs to come back to the parser as individual tokens? In
> which case you just specify them as part of lexer rules or in their own
> lexer rule and they will come back as their own tokens. Whitespace has no
> special meaning to ANTLR unless you make it so.

Yes, that's what I'm doing. However the parser still contains an
attempt to invoke WS if none of the other lexical rules match; I added
a { false }? semantic predicate there so it will always fail, which is
a bit of a kludge (the error message returned is not optimal). It
isn't a big deal.

> I wonder if you are trying to do too much in the parser and what you really
> need is for the parer to pick up anything that looks like valid syntax, and
> have it produce a tree, which you then walk and match up indent levels and
> so on?

It isn't that simple... YAML is _very_ context sensitive, so that
doesn't really work, unless it were possible to set up a two-way
communication channel between the parser and the lexer, in effect
merging them. The model is very different than that used by
"programming languages", which ANTLR is superbly adapted to.

There are several hand-coded parsers for YAML, and I'm working on two
approaches in parallel. One is using ANTLR, and the other is
machine-processing the YAML productions to a simplified
recursive-state-machine form, with some special instructions such as
"push_N", "pop_N", and "repeat_N". A run-time that executes these
instructions is relatively easy to create, and even retarget. The
problem is that it would be 100% backtracking (barring any
pre-computed optimization to the instructions). I was hoping to
leverage ANTLR's N/DFA capabilities...

Oren.