[antlr-interest] parsing just a subset of a grammar

Mon Nov 19 12:28:45 PST 2012

On 11/19/2012 08:23 PM, Alexander Kostikov wrote:
> Hi,
>
> I'm new to ANTLR and I seek for a good advice.
>
> Here is my story. I'm parsing Cisco IOS config files. They are quite
> loosely defined but actually I don't need to have whole the config
> file parsed. I'm interested in just a subset of the config file (acl
> rule below) and I don't really care about all other parts of the file
> right now. Having said it, in the future I'll need to add other parts
> as well (e.g. rule for interfaces definition) but again, I don't need
> to have all of the config file parsed. I don't want to implement
> complete Cisco IOS grammar since seams it would become a very hard
> task indeed.
>
> To ignore all not interesting parts of the config file I defined the
> grammar this way:
>
> /*
>   * Parser Rules
>   */
>
> config: (acl | any)* EOF;
> any: (ID|INT)* EOL;
> acl: 'ip' 'access-list' 'extended'? ID EOL (remark | rule)+ EOF;
> remark: (index)? 'remark' (~EOL)* EOL;
> rule: (index)? verb protocol source source_port destination
> destination_port flag? log? EOL;
>
> // Not so interesting parser rules here...
>
> /*
>   * Lexer Rules
>   */
>
> fragment
> CHAR: 'a'..'z' | 'A'..'Z' | '_' | '-' | '.' | '+' | '/' | ':' | '%';
> fragment
> NUMBER: '0'..'9';
> INT: NUMBER+;
> ID: CHAR (CHAR | NUMBER)*;
> EOL: ('\r' | '\n')+;
> WS: (' ' | '\t') { $channel=HIDDEN; };
> COMMENT: '!' (~('\r' | '\n'))* EOL { $channel=HIDDEN; };
> ILLEGAL: .;
>
> It turns out ANTLR doesn't behave the way I expected =) What I wanted
> is for ANTLR to parse the following line "no ip bootp server" via
> 'any' rule but ANTLR finds 'ip' token in the line and treats the line
> as not correct 'acl' rule. Specifying syntactic predicate "config:
> (('ip' 'access-list') => acl | any)* EOF"nly makes things worse
> judging by ANTLRWorks output - parser stops almost immediately with an
> unrecoverable error.
>
> My question is - is there a way to achieve the kind of filtering I'm
> talking about (parse only 'acl', ignore anything else) via ANTLR
> grammar? What should I use? Syntactic predicate? Several-pass parsing?
> Custom lexer (how do I even start implementing such beast?)? Parse out
> all interesting sections from a file via regex before supplying themuse
> to ANTLR grammar that is only ACL-oriented (at least I know how to
> implement this last option)?
>
> -- Alexander
>
>

Maybe this is not what you want. Look at the PLSQL grammar.
For embedded SQL it uses such a trick:

SEMI: ';' ;

swallow_to_semi :
         ~( SEMI )+
     ;

select: 'SELECT' swallow_to_semi SEMI;

By using this you can "bypass" all the sections you're not interested in.

Ivan
PS: be warned, negation can make the grammar very complex if you
use many lexer tokens.