[antlr-interest] Config file parsing grammar
Terence Parr
parrt at cs.usfca.edu
Sat Nov 25 09:55:57 PST 2006
Hi. Try -trace or -debug options (that last one requires ANTLRWorks)
Ter
On Nov 25, 2006, at 2:21 AM, James Cook wrote:
> Howdy
>
> I've been banging on a grammar to parse Unix-style config files
> (notably /etc/hosts, /etc/ethers and dhcpd's leases file) but haven't
> had much luck. I'm sure it's a simple fix but I've been at it for
> almost three days now and have just about reached the throwing-stuff
> stage. =P Anyway, here's the lexer bits:
>
> lexer grammar CommonUnixConfig;
>
> //
> ---------------------------------------------------------------------
> // Base
> //
> ---------------------------------------------------------------------
>
> WHITESPACE
> : (' ' | '\t')+
> ;
>
> NEWLINE
> : ('\r\n' | '\n' | '\r')
> ;
>
> COLON
> : ':'
> ;
>
> DOT
> : '.'
> ;
>
> STAR
> : '*'
> ;
>
> DASH
> : '-'
> ;
>
> HASH
> : '#'
> ;
>
> SLASH
> : '/'
> ;
>
> DIGIT
> : '0'..'9'
> ;
>
> HEXDIGIT
> : DIGIT | 'a'..'f' | 'A'..'F'
> ;
>
> LETTER
> : 'a'..'z' | 'A'..'Z'
> ;
>
> //
> ----------------------------------------------------------------------
> ------
> // Configuration Cruft
> //
> ----------------------------------------------------------------------
> ------
>
> COMMENT
> : HASH ~NEWLINE*
> // { $channel=HIDDEN; System.out.println("comment"); }
> { System.out.println("comment"); skip(); }
> ;
>
> BLANKLINE
> : WHITESPACE? NEWLINE
> { System.out.println("blankline"); skip(); }
> ;
>
> //
> ----------------------------------------------------------------------
> ------
> // Ethernet
> //
> ----------------------------------------------------------------------
> ------
>
> CLIENTID
> : HEXPAIR COLON MACADDRESS
> ;
>
> MACADDRESS
> : HEXPAIR COLON HEXPAIR COLON HEXPAIR COLON HEXPAIR
> COLON HEXPAIR COLON HEXPAIR
> ;
>
> fragment
> HEXPAIR
> : HEXDIGIT HEXDIGIT
> ;
>
> //
> ----------------------------------------------------------------------
> ------
> // Internet Address (DNS and Bare IP)
> //
> ----------------------------------------------------------------------
> ------
>
> IPADDRESS
> : IPV4ADDRESS | IPV6ADDRESS
> ;
>
> IPV4ADDRESS
> : BYTE DOT BYTE DOT BYTE DOT BYTE
> ;
>
> // RFC 2373 Appendix B is evil
> IPV6ADDRESS
> : HEXPART (COLON IPV4ADDRESS)?
> ;
>
> HOSTNAME
> : DNSCHAR+ (DOT DNSCHAR+)* DOT?
> ;
>
> // RFC 2373 Appendix B says the four parts of an IPv4address can
> have only one
> // to three digits
> fragment
> BYTE
> : DIGIT (DIGIT DIGIT?)?
> ;
>
> fragment
> HEXPART
> : HEXSEQ | HEXSEQ COLON COLON HEXSEQ? | COLON COLON
> HEXSEQ?
> ;
>
> fragment
> HEXSEQ
> : HEX4 (COLON HEX4)*
> ;
>
> fragment
> HEX4
> : HEXDIGIT (HEXDIGIT (HEXDIGIT HEXDIGIT?)?)?
> ;
>
> // As defined in RFC 1034
> fragment
> DNSCHAR
> : LETTER | DIGIT | DASH
> ;
>
> ======
>
> Next up is the particular parser I've been focusing on:
>
> parser grammar Hosts;
>
> options {
> tokenVocab = CommonUnixConfig;
> }
>
> go
> : hostline*
> ;
>
> hostline
> : ip=IPADDRESS WHITESPACE hostname=HOSTNAME (WHITESPACE
> alias=HOSTNAME{System.out.println("alias: " + $alias);})* NEWLINE
> {
> System.out.println("ip addr : " + $ip);
> System.out.println("hostname : " + $hostname);
> }
> ;
>
> ======
>
> And then, finally, the test harness:
>
> import org.antlr.runtime.*;
>
> public class hosts
> {
> public static void main(String args[])
> throws Throwable
> {
> ANTLRFileStream in = new ANTLRFileStream(args[0]);
> CommonUnixConfigLexer lexer = new
> CommonUnixConfigLexer(in);
>
> CommonTokenStream tokens = new CommonTokenStream
> (lexer);
>
> HostsParser parser = new HostsParser(tokens);
> parser.go();
> }
> }
>
> ======
>
> In the event that there aren't any blank lines or comments, the file
> parses properly. However, add in a blank line or a comment and
> parsing seems to abort without throwing an exception. =( Also, the
> print statements never execute but I suspect I'm using them wrong.
>
> I didn't have any luck finding examples to pattern my efforts after -
> most often newlines and whitespace are ignorable whereas they're
> delimiters here. Any help would be appreciated. Thanks!
>
> --
> James
More information about the antlr-interest
mailing list