[antlr-interest] Freeform Text Parsing

Tue Dec 21 06:25:57 PST 2010

I plan to use ANTLR to parse generated (and hence predictable) English 
sentences.  To verify that I know what I'm doing, I wanted to create a grammar 
that parses a simple address book into an entry tree.

The following grammar is wrong, but it expresses my intent.  (~(NEWLINE))+ 
doesn't grab arbitrary non-newline text, but rather matches known, non-newline 
tokens, which isn't what I want.  What's the right way?  Or is ANTLR unsuitable 
for grammars that can't identify string literals at the lexing stage?

John Rossi
Home
555-7293

Michael Raster
Work
555-8374

grammar AddressBook;
options {
output=AST;
ASTLabelType=CommonTree;
}

tokens {
ENTRIES;
ENTRY;
NAME;
CONTACTTYPE;
PHONE;
HOME;
WORK;
}

@header {
package org.roxton;
}

@lexer::header {
package org.roxton;
}

addressbook
:(entry (NEWLINE)?)+ -> ^(ENTRIES entry+);
entry:name NEWLINE contactType NEWLINE phone NEWLINE -> ^(ENTRY ^(NAME name) 
^(CONTACTTYPE contactType) ^(PHONE phone));
name:(~(NEWLINE))+;
contactType
:('Home'->HOME | 'Work'->WORK);
phone:(~(NEWLINE))+;

NEWLINE:'\r'? '\n' ;