[antlr-interest] parsing just a subset of a grammar

Tue Nov 20 09:49:08 PST 2012

Ivan,

Thank you for the swallow_to_semi technique.

I've tried fuzzy parsing Terence pointed out, but as the downside the
parser became very loose and it no longer finds input that  _almost_
captures the acl rule. Probably the swallow_to_semi technique could
give me the ability not to implement the full parser and find out
almost matching input (indicating that the rule must be updated) at
the same time.

-- 
Alexander

On Mon, Nov 19, 2012 at 12:28 PM, Ivan Brezina <ibre5041 at ibrezina.net> wrote:
> On 11/19/2012 08:23 PM, Alexander Kostikov wrote:
>> Hi,
>>
>> I'm new to ANTLR and I seek for a good advice.
>>
>> Here is my story. I'm parsing Cisco IOS config files. They are quite
>> loosely defined but actually I don't need to have whole the config
>> file parsed. I'm interested in just a subset of the config file (acl
>> rule below) and I don't really care about all other parts of the file
>> right now. Having said it, in the future I'll need to add other parts
>> as well (e.g. rule for interfaces definition) but again, I don't need
>> to have all of the config file parsed. I don't want to implement
>> complete Cisco IOS grammar since seams it would become a very hard
>> task indeed.
>>
>> To ignore all not interesting parts of the config file I defined the
>> grammar this way:
>>
>> /*
>>   * Parser Rules
>>   */
>>
>> config: (acl | any)* EOF;
>> any: (ID|INT)* EOL;
>> acl: 'ip' 'access-list' 'extended'? ID EOL (remark | rule)+ EOF;
>> remark: (index)? 'remark' (~EOL)* EOL;
>> rule: (index)? verb protocol source source_port destination
>> destination_port flag? log? EOL;
>>
>> // Not so interesting parser rules here...
>>
>> /*
>>   * Lexer Rules
>>   */
>>
>> fragment
>> CHAR: 'a'..'z' | 'A'..'Z' | '_' | '-' | '.' | '+' | '/' | ':' | '%';
>> fragment
>> NUMBER: '0'..'9';
>> INT: NUMBER+;
>> ID: CHAR (CHAR | NUMBER)*;
>> EOL: ('\r' | '\n')+;
>> WS: (' ' | '\t') { $channel=HIDDEN; };
>> COMMENT: '!' (~('\r' | '\n'))* EOL { $channel=HIDDEN; };
>> ILLEGAL: .;
>>
>> It turns out ANTLR doesn't behave the way I expected =) What I wanted
>> is for ANTLR to parse the following line "no ip bootp server" via
>> 'any' rule but ANTLR finds 'ip' token in the line and treats the line
>> as not correct 'acl' rule. Specifying syntactic predicate "config:
>> (('ip' 'access-list') => acl | any)* EOF"nly makes things worse
>> judging by ANTLRWorks output - parser stops almost immediately with an
>> unrecoverable error.
>>
>> My question is - is there a way to achieve the kind of filtering I'm
>> talking about (parse only 'acl', ignore anything else) via ANTLR
>> grammar? What should I use? Syntactic predicate? Several-pass parsing?
>> Custom lexer (how do I even start implementing such beast?)? Parse out
>> all interesting sections from a file via regex before supplying themuse
>> to ANTLR grammar that is only ACL-oriented (at least I know how to
>> implement this last option)?
>>
>> -- Alexander
>>
>>
>
> Maybe this is not what you want. Look at the PLSQL grammar.
> For embedded SQL it uses such a trick:
>
> SEMI: ';' ;
>
> swallow_to_semi :
>          ~( SEMI )+
>      ;
>
> select: 'SELECT' swallow_to_semi SEMI;
>
> By using this you can "bypass" all the sections you're not interested in.
>
> Ivan
> PS: be warned, negation can make the grammar very complex if you
> use many lexer tokens.
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address