[antlr-interest] parsing just a subset of a grammar

Alexander Kostikov alex.kostikov at gmail.com
Tue Nov 20 09:47:31 PST 2012


Terence,

Thank you for the fuzzy parsing advice.

Fuzzy parsing seems to be the natural choice here. I've tried it
yesterday and it worked on a sample data. But when I tried to supply
some real file two things came up:

1) Parser became very loose. ANTLR no longer finds out cases when
input almost matches the acl rule. Fuzzy parsing via 'config: (acl |
.)* EOF' ignores all input that is not 100% described by the acl rule.
I understand that this is a conflicting goal but it looks like
swallow_to_semi technique from Ivan's email could bring benefits from
both fuzzy parsing and error handling by making grammar more verbose.

2) ANTLRWorks debugger took significant time to parse the real data.
It was about ~40 seconds per file compared to ~1 second when I'm using
my old regex-based parser. It was just a run under debugger and for a
different target language (I'm targeting CSharp3) but performance is a
valid concern for me. I don't want to have a speed regression when
porting from the current regex parser. If there would be no way of
doing quick parser I'll introduce an intermediate representation -
only the parsing speed from this intermediate representation would
matter.

-- 
Alexander

On Mon, Nov 19, 2012 at 3:49 PM, Terence Parr <parrt at cs.usfca.edu> wrote:
> In the new v4 book and the v4 doc:
>
> http://www.antlr.org/wiki/display/ANTLR4/Wildcard+Operator+and+Nongreedy+Subrules
>
> i talk about fuzzy parsing.
>
> see
>
> http://media.pragprog.com/titles/tpantlr2/code/reference/FuzzyJava.g4
>
> Terence
>
> On Nov 19, 2012, at 11:23 AM, Alexander Kostikov wrote:
>
>> Hi,
>>
>> I'm new to ANTLR and I seek for a good advice.
>>
>> Here is my story. I'm parsing Cisco IOS config files. They are quite
>> loosely defined but actually I don't need to have whole the config
>> file parsed. I'm interested in just a subset of the config file (acl
>> rule below) and I don't really care about all other parts of the file
>> right now. Having said it, in the future I'll need to add other parts
>> as well (e.g. rule for interfaces definition) but again, I don't need
>> to have all of the config file parsed. I don't want to implement
>> complete Cisco IOS grammar since seams it would become a very hard
>> task indeed.
>>
>> To ignore all not interesting parts of the config file I defined the
>> grammar this way:
>>
>> /*
>> * Parser Rules
>> */
>>
>> config: (acl | any)* EOF;
>> any: (ID|INT)* EOL;
>> acl: 'ip' 'access-list' 'extended'? ID EOL (remark | rule)+ EOF;
>> remark: (index)? 'remark' (~EOL)* EOL;
>> rule: (index)? verb protocol source source_port destination
>> destination_port flag? log? EOL;
>>
>> // Not so interesting parser rules here...
>>
>> /*
>> * Lexer Rules
>> */
>>
>> fragment
>> CHAR: 'a'..'z' | 'A'..'Z' | '_' | '-' | '.' | '+' | '/' | ':' | '%';
>> fragment
>> NUMBER: '0'..'9';
>> INT: NUMBER+;
>> ID: CHAR (CHAR | NUMBER)*;
>> EOL: ('\r' | '\n')+;
>> WS: (' ' | '\t') { $channel=HIDDEN; };
>> COMMENT: '!' (~('\r' | '\n'))* EOL { $channel=HIDDEN; };
>> ILLEGAL: .;
>>
>> It turns out ANTLR doesn't behave the way I expected =) What I wanted
>> is for ANTLR to parse the following line "no ip bootp server" via
>> 'any' rule but ANTLR finds 'ip' token in the line and treats the line
>> as not correct 'acl' rule. Specifying syntactic predicate "config:
>> (('ip' 'access-list') => acl | any)* EOF" only makes things worse
>> judging by ANTLRWorks output - parser stops almost immediately with an
>> unrecoverable error.
>>
>> My question is - is there a way to achieve the kind of filtering I'm
>> talking about (parse only 'acl', ignore anything else) via ANTLR
>> grammar? What should I use? Syntactic predicate? Several-pass parsing?
>> Custom lexer (how do I even start implementing such beast?)? Parse out
>> all interesting sections from a file via regex before supplying them
>> to ANTLR grammar that is only ACL-oriented (at least I know how to
>> implement this last option)?
>>
>> -- Alexander
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list