[antlr-interest] parsing just a subset of a grammar

Terence Parr parrt at cs.usfca.edu
Tue Nov 20 10:08:07 PST 2012


On Nov 20, 2012, at 9:47 AM, Alexander Kostikov wrote:

> Terence,
> 
> Thank you for the fuzzy parsing advice.
> 
> Fuzzy parsing seems to be the natural choice here. I've tried it
> yesterday and it worked on a sample data. But when I tried to supply
> some real file two things came up:
> 
> 1) Parser became very loose. ANTLR no longer finds out cases when
> input almost matches the acl rule. Fuzzy parsing via 'config: (acl |
> .)* EOF' ignores all input that is not 100% described by the acl rule.
> I understand that this is a conflicting goal but it looks like
> swallow_to_semi technique from Ivan's email could bring benefits from
> both fuzzy parsing and error handling by making grammar more verbose.

I think it's very sensitive to how you write the grammar. I was very happy with the fuzzy Java parser as it wasn't loose at all. this is with v4
> 
> 2) ANTLRWorks debugger took significant time to parse the real data.
> It was about ~40 seconds per file compared to ~1 second when I'm using
> my old regex-based parser. It was just a run under debugger and for a
> different target language (I'm targeting CSharp3) but performance is a
> valid concern for me. I don't want to have a speed regression when
> porting from the current regex parser. If there would be no way of
> doing quick parser I'll introduce an intermediate representation -
> only the parsing speed from this intermediate representation would
> matter.

ah. you must be using v3. All bets are off.  v3 option fuzzy is very slow O(n^2)

T


More information about the antlr-interest mailing list