[antlr-interest] parsing just a subset of a grammar

Alexander Kostikov alex.kostikov at gmail.com
Tue Nov 20 09:45:41 PST 2012


Bernard,

Thanks for the debugging technique!

Was the resolution for '1) ID is built character by character, it
would be better to group them' to move all fragments to the very end
of the grammar?

I can't use ANTLR4 since there is no C# target for it yet (as far as I
know). I'm targeting C# but for the sake of grammar debugability I'm
trying out the grammar in ANTLRWorks first.

The problem with id_or_keyword approach is - there would be too many
keywords to keep track of in 'any' rule. Plus IP token is not a
keyword that always would start the 'acl' rule. 'ip' could be used as
protocol identifier as well. It looks like I would have to use
alteration like (IP|ID) in the parser rules and it doesn't seem right.

-- 
Alexander

On Tue, Nov 20, 2012 at 7:18 AM, Bernard Kaiflin
<bkaiflin.ruby at gmail.com> wrote:
> Learning every day ... I have rewritten the grammar to use fuzzy parsing in
> v4.
>
> grammar Cisco;
>
> /* Parse Cisco config file using fuzzy parsing. */
>
> config
> @init {System.out.println("Cisco last update 1606");}
>     :   .*? ( acl .*? )+
>     ;
>
> acl :   'ip' 'access-list' 'extended'? ID '\n'? ( remark | rule ) '\n'
>                {System.out.print("--- acl " + $acl.text);}
>     ;
>
> remark
>     :   INT? 'remark' ~'\n'*
>     ;
>
> rule:   INT? ID+ // the + either here or in rule acl after ( remark | rule )
>     ;            // to avoid ambiguity
>
> ID  :   ( LETTER | SPECIAL ) ( LETTER | SPECIAL | NUMBER )* ;
> INT :   NUMBER+ ;
> COMMENT : '!' .*? '\n' -> channel(HIDDEN) ;
> WS  :   [ \t\r\n]+ -> channel(HIDDEN) ;
>
> ILLEGAL : . ; // after all other lexer rules
>
> fragment LETTER  : 'a'..'z' | 'A'..'Z' ;
> fragment SPECIAL : '_' | '-' | '.' | '+' | '/' | ':' | '%' ;
> fragment NUMBER  : '0'..'9' ;
>
> To install ANTLR4 you can start here :
> http://forums.pragprog.com/forums/206/topics/11231
>
> $ echo $CLASSPATH
> .:/usr/local/lib/antlr-4.0b3-complete.jar
> $ antlr4 Cisco.g4
> $ javac Cisco*.java
> $ grun Cisco config -tokens -diagnostics -trace t.config
> [@0,0:1='no',<6>,1:0]
> [@1,2:2=' ',<9>,channel=1,1:2]
> [@2,3:4='ip',<2>,1:3]
> ...
> [@7,18:18='\n',<4>,1:18]
> [@8,19:20='ip',<2>,2:0]
> [@9,21:21=' ',<9>,channel=1,2:2]
> [@10,22:32='access-list',<3>,2:3]
> ...
> [@18,46:45='<EOF>',<-1>,4:8]
> enter   config, LT(1)=no
> Cisco last update 1606
> consume [@0,0:1='no',<6>,1:0] rule config alt=1
> consume [@2,3:4='ip',<2>,1:3] rule config alt=1
> consume [@4,6:10='bootp',<6>,1:6] rule config alt=1
> consume [@6,12:17='server',<6>,1:12] rule config alt=1
> consume [@7,18:18='\n',<4>,1:18] rule config alt=1
> enter   acl, LT(1)=ip
> consume [@8,19:20='ip',<2>,2:0] rule acl alt=1
> consume [@10,22:32='access-list',<3>,2:3] rule acl alt=1
> consume [@12,34:36='xyz',<6>,2:15] rule acl alt=1
> consume [@13,37:37='\n',<4>,2:18] rule acl alt=1
> enter   rule, LT(1)=abc
> consume [@14,38:40='abc',<6>,3:0] rule rule alt=1
> consume [@16,42:44='def',<6>,3:4] rule rule alt=1
> exit    rule, LT(1)=
>
> consume [@17,45:45='\n',<4>,3:7] rule acl alt=1
> --- acl ip access-list xyz
> abc def
> exit    acl, LT(1)=<EOF>
> exit    config, LT(1)=<EOF>
>
>
> 2012/11/20 Terence Parr <parrt at cs.usfca.edu>
>>
>> In the new v4 book and the v4 doc:
>>
>>
>> http://www.antlr.org/wiki/display/ANTLR4/Wildcard+Operator+and+Nongreedy+Subrules
>>
>> i talk about fuzzy parsing.
>>
>> see
>>
>> http://media.pragprog.com/titles/tpantlr2/code/reference/FuzzyJava.g4
>>
>> Terence
>>
>


More information about the antlr-interest mailing list