[antlr-interest] syntactic predicates vs. backtrack=true

Tue Feb 5 08:53:21 PST 2008

In my opinion you should never use backtracking unless performance is 
not a big deal for you. It is good for prototyping and as you say it can 
make the specification of your grammar easier/neater. 

If your grammar is ending up with lots of nested syntactic predicates 
(that cause multiple executions of them) and you dont want to put in 
the effort to solve the issues (remember that there are some places 
where you must use a predicate, such as distinguishing generics from 
logical expressions in C#), then using backtracking and memoizing might 
well be faster than all the predicates.

Backtracking basically trys each alt that might be predicted in turn 
until it finds one that works. The memoizing option allows the paresr to 
realize that it has already tried some parts of the path and skip them 
so it helps a lot performance wise, but you are still doing all that 
looking ahead and abandoning paths. With a good knowledge of what is 
happening though, you can favor the alts that are most likely to come up 
and help a lot.

I would say that if you are producing a tool that does not run 
continuously or on many many input streams, and/or the parse is 
generally a small part of the overall time, then you may find 
backtracking is just fine and infinitely preferable to lots of 
predicates.

There are some languages that are not necessarily good targets for an 
ANTLR grammar and there isn't a good way to parse them without 
backtracking or lots of predicates. Wiki markup languages are like that 
for instance because they are a bit haphazard and tend to have lots of 
context. If you are not trying to parse such a language, then it is 
usually possible to create a syntactic parser that has very few 
ambiguities. Experience and effort is required. Experience lets you spot 
the things that are truly an ambiguity in the language spec vs something 
that you could specify in a better way.

Remember that your parser is only there to filter out valid syntax from 
invalid syntax, not to try and allow different syntactic paths for 
things that are essentially the same syntax. Amalgamate all common token 
sequences, even if this allows through code that isnt really valid. 
Then filter such things out semantically in the tree parser (you should 
always end up with a syntactically sound tree). Many times language 
specs are written in an informative manner and distinguish things that 
are really semantics, such as methodName and interfaceName, which are 
syntactically the same thing and perhaps create long starting sequences 
before you can infer syntactically that you have a method declaration 
not an interface declaration and so on.

I hope that helps, but in summary, try not to use backtracking and 
memoizing, but dont throw out the options completely as there are 
reasonable situations where they are perfectly fine.

Jim

> -----Original Message-----
> From: Mark Volkmann [mailto:r.mark.volkmann at gmail.com]
> Sent: Tuesday, February 05, 2008 4:41 AM
> To: antlr-interest
> Subject: [antlr-interest] syntactic predicates vs. backtrack=true
> 
> Can someone describe a case where it's better to use syntactic
> predicates on rule alternatives than just specifying backtrack=true as
> a rule option? It certainly introduces less clutter in the grammar to
> use backtrack and of course there is less to type and less opportunity
> for error. Is it significantly more efficient to use syntactic
> predicates?
> 
> --
> R. Mark Volkmann
> Object Computing, Inc.