[antlr-interest] syntactic predicates vs. backtrack=true

Sun Feb 10 05:20:40 PST 2008

On 2/6/08, Jim Idle <jimi at temporal-wave.com> wrote:
> There are some languages that are not necessarily good targets for an
> ANTLR grammar and there isn't a good way to parse them without
> backtracking or lots of predicates. Wiki markup languages are like that
> for instance because they are a bit haphazard and tend to have lots of
> context. If you are not trying to parse such a language, then it is
> usually possible to create a syntactic parser that has very few
> ambiguities. Experience and effort is required. Experience lets you spot
> the things that are truly an ambiguity in the language spec vs something
> that you could specify in a better way.

Hey Jim, would you mind expanding on why parsing Wiki markup with
ANTLR is a bad idea? I ask because, uh, I'm trying to write a parser
for wikitext with ANTLR :) The two problems you raise are "a bit
haphazard" and "lots of context". For me, the first is the entire
reason for the ANTLR grammar: to define all the rules rigidly (for the
first time) so that end users of the wikitext actually have a fixed
understanding of what the grammar is. And the second I haven't found
too bad: a small number of flags like "prohibit_literal_pipe" has
really solved that problem: rather than using "context" like "I'm
deeply nested inside a table, therefore I can't start a line with a
pipe", I push the restrictions down: "This table cell is normal text,
except no line can start with a pipe".

The biggest difficulty has been that *everything* is valid syntax: if
it doesn't match some particular language construction, it just
renders literally:

[[image:foo.jpg|thumbnail|this is a foo!]] <- that renders an image

[[image:foo.jpg|thumbnail|this is a foo!] <- that renders literally

Which means that potentially everything has to be parsed several times
for different constructs, and if all that fails, it has to be parsed
once more as literally. Is this what you mean by "allowing different
syntax paths"?

Anyway, I'd like to know if there are other reasons why ANTLR is a bad
choice for a wikitext parser.

Many thanks,
Steve