[antlr-interest] parsing just a subset of a grammar

Mon Nov 19 15:49:32 PST 2012

In the new v4 book and the v4 doc:

http://www.antlr.org/wiki/display/ANTLR4/Wildcard+Operator+and+Nongreedy+Subrules

i talk about fuzzy parsing.

see

http://media.pragprog.com/titles/tpantlr2/code/reference/FuzzyJava.g4

Terence

On Nov 19, 2012, at 11:23 AM, Alexander Kostikov wrote:

> Hi,
> 
> I'm new to ANTLR and I seek for a good advice.
> 
> Here is my story. I'm parsing Cisco IOS config files. They are quite
> loosely defined but actually I don't need to have whole the config
> file parsed. I'm interested in just a subset of the config file (acl
> rule below) and I don't really care about all other parts of the file
> right now. Having said it, in the future I'll need to add other parts
> as well (e.g. rule for interfaces definition) but again, I don't need
> to have all of the config file parsed. I don't want to implement
> complete Cisco IOS grammar since seams it would become a very hard
> task indeed.
> 
> To ignore all not interesting parts of the config file I defined the
> grammar this way:
> 
> /*
> * Parser Rules
> */
> 
> config: (acl | any)* EOF;
> any: (ID|INT)* EOL;
> acl: 'ip' 'access-list' 'extended'? ID EOL (remark | rule)+ EOF;
> remark: (index)? 'remark' (~EOL)* EOL;
> rule: (index)? verb protocol source source_port destination
> destination_port flag? log? EOL;
> 
> // Not so interesting parser rules here...
> 
> /*
> * Lexer Rules
> */
> 
> fragment
> CHAR: 'a'..'z' | 'A'..'Z' | '_' | '-' | '.' | '+' | '/' | ':' | '%';
> fragment
> NUMBER: '0'..'9';
> INT: NUMBER+;
> ID: CHAR (CHAR | NUMBER)*;
> EOL: ('\r' | '\n')+;
> WS: (' ' | '\t') { $channel=HIDDEN; };
> COMMENT: '!' (~('\r' | '\n'))* EOL { $channel=HIDDEN; };
> ILLEGAL: .;
> 
> It turns out ANTLR doesn't behave the way I expected =) What I wanted
> is for ANTLR to parse the following line "no ip bootp server" via
> 'any' rule but ANTLR finds 'ip' token in the line and treats the line
> as not correct 'acl' rule. Specifying syntactic predicate "config:
> (('ip' 'access-list') => acl | any)* EOF" only makes things worse
> judging by ANTLRWorks output - parser stops almost immediately with an
> unrecoverable error.
> 
> My question is - is there a way to achieve the kind of filtering I'm
> talking about (parse only 'acl', ignore anything else) via ANTLR
> grammar? What should I use? Syntactic predicate? Several-pass parsing?
> Custom lexer (how do I even start implementing such beast?)? Parse out
> all interesting sections from a file via regex before supplying them
> to ANTLR grammar that is only ACL-oriented (at least I know how to
> implement this last option)?
> 
> -- Alexander
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address