[antlr-interest] Config file parsing grammar

Terence Parr parrt at cs.usfca.edu
Sat Nov 25 09:55:57 PST 2006


Hi.  Try -trace or -debug options (that last one requires ANTLRWorks)
Ter
On Nov 25, 2006, at 2:21 AM, James Cook wrote:

> Howdy
>
> I've been banging on a grammar to parse Unix-style config files
> (notably /etc/hosts, /etc/ethers and dhcpd's leases file) but haven't
> had much luck.  I'm sure it's a simple fix but I've been at it for
> almost three days now and have just about reached the throwing-stuff
> stage.  =P  Anyway, here's the lexer bits:
>
> lexer grammar CommonUnixConfig;
>
> //  
> ---------------------------------------------------------------------
> // Base
> //  
> ---------------------------------------------------------------------
>
> WHITESPACE
>        :       (' ' | '\t')+
>        ;
>
> NEWLINE
>        :       ('\r\n' | '\n' | '\r')
>        ;
>
> COLON
>        :       ':'
>        ;
>
> DOT
>        :       '.'
>        ;
>
> STAR
>        :       '*'
>        ;
>
> DASH
>        :       '-'
>        ;
>
> HASH
>        :       '#'
>        ;
>
> SLASH
>        :       '/'
>        ;
>
> DIGIT
>        :       '0'..'9'
>        ;
>
> HEXDIGIT
>        :       DIGIT | 'a'..'f' | 'A'..'F'
>        ;
>
> LETTER
>        :       'a'..'z' | 'A'..'Z'
>        ;
>
> //  
> ---------------------------------------------------------------------- 
> ------
> // Configuration Cruft
> //  
> ---------------------------------------------------------------------- 
> ------
>
> COMMENT
>        :       HASH ~NEWLINE*
> //              { $channel=HIDDEN; System.out.println("comment"); }
>                { System.out.println("comment"); skip(); }
>        ;
>
> BLANKLINE
>        :       WHITESPACE? NEWLINE
>                { System.out.println("blankline"); skip(); }
>        ;
>
> //  
> ---------------------------------------------------------------------- 
> ------
> // Ethernet
> //  
> ---------------------------------------------------------------------- 
> ------
>
> CLIENTID
>        :       HEXPAIR COLON MACADDRESS
>        ;
>
> MACADDRESS
>        :       HEXPAIR COLON HEXPAIR COLON HEXPAIR COLON HEXPAIR
> COLON HEXPAIR COLON HEXPAIR
>        ;
>
> fragment
> HEXPAIR
>        :       HEXDIGIT HEXDIGIT
>        ;
>
> //  
> ---------------------------------------------------------------------- 
> ------
> // Internet Address (DNS and Bare IP)
> //  
> ---------------------------------------------------------------------- 
> ------
>
> IPADDRESS
>        :       IPV4ADDRESS | IPV6ADDRESS
>        ;
>
> IPV4ADDRESS
>        :       BYTE DOT BYTE DOT BYTE DOT BYTE
>        ;
>
> // RFC 2373 Appendix B is evil
> IPV6ADDRESS
>        :       HEXPART (COLON IPV4ADDRESS)?
>        ;
>
> HOSTNAME
>        :       DNSCHAR+ (DOT DNSCHAR+)* DOT?
>        ;
>
> // RFC 2373 Appendix B says the four parts of an IPv4address can  
> have only one
> // to three digits
> fragment
> BYTE
>        :       DIGIT (DIGIT DIGIT?)?
>        ;
>
> fragment
> HEXPART
>        :       HEXSEQ | HEXSEQ COLON COLON HEXSEQ? | COLON COLON  
> HEXSEQ?
>        ;
>
> fragment
> HEXSEQ
>        :       HEX4 (COLON HEX4)*
>        ;
>
> fragment
> HEX4
>        :       HEXDIGIT (HEXDIGIT (HEXDIGIT HEXDIGIT?)?)?
>        ;
>
> // As defined in RFC 1034
> fragment
> DNSCHAR
>        :       LETTER | DIGIT | DASH
>        ;
>
> ======
>
> Next up is the particular parser I've been focusing on:
>
> parser grammar Hosts;
>
> options {
>        tokenVocab = CommonUnixConfig;
> }
>
> go
>        :       hostline*
>        ;
>
> hostline
>        :       ip=IPADDRESS WHITESPACE hostname=HOSTNAME (WHITESPACE
> alias=HOSTNAME{System.out.println("alias: " + $alias);})* NEWLINE
>                {
>                        System.out.println("ip addr  : " + $ip);
>                        System.out.println("hostname : " + $hostname);
>                }
>        ;
>
> ======
>
> And then, finally, the test harness:
>
> import org.antlr.runtime.*;
>
> public class hosts
> {
>        public static void main(String args[])
>                throws Throwable
>        {
>                ANTLRFileStream in = new ANTLRFileStream(args[0]);
>                CommonUnixConfigLexer lexer = new  
> CommonUnixConfigLexer(in);
>
>                CommonTokenStream tokens = new CommonTokenStream 
> (lexer);
>
>                HostsParser parser = new HostsParser(tokens);
>                parser.go();
>        }
> }
>
> ======
>
> In the event that there aren't any blank lines or comments, the file
> parses properly.  However, add in a blank line or a comment and
> parsing seems to abort without throwing an exception.  =(  Also, the
> print statements never execute but I suspect I'm using them wrong.
>
> I didn't have any luck finding examples to pattern my efforts after -
> most often newlines and whitespace are ignorable whereas they're
> delimiters here.  Any help would be appreciated.  Thanks!
>
> -- 
> James



More information about the antlr-interest mailing list