[antlr-interest] Freeform Text Parsing
John Rossi
kjhedran at yahoo.com
Tue Dec 21 07:24:05 PST 2010
> You need to include the following rule at the end of your lexer. Without
> wildcards, all allowed characters must appear in explicit rules.
>
> ANY_CHAR : . ;
Thanks, Sam.
John Rossi
Home
555-7293
Homer Raster
Work
555-8374
Now yields:
(ENTRIES
(ENTRY (NAME J o h n R o s s i) (CONTACTTYPE HOME) (PHONE 5 5 5 - 7 2 9 3))
(ENTRY (NAME Home r R a s t e r) (CONTACTTYPE WORK) (PHONE 5 5 5 - 8 3 7 4))
)
Two things:
1) If I were to write an application to consume this tree, I wouldn't want each
character to be in its own child node. Is there a reasonable way to package
them up? Something in the grammar, or a second pass?
2) The fact that the "Home" in "Homer Raster" gets its token under the name
makes me feel like I did something wrong. Is that squick justified?
Thoughts? Thanks for your help,
-John
-------
grammar AddressBook;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ENTRIES;
ENTRY;
NAME;
CONTACTTYPE;
PHONE;
HOME;
WORK;
}
addressbook
:(entry (NEWLINE NEWLINE)?)+ -> ^(ENTRIES entry+);
entry:name NEWLINE contactType NEWLINE phone -> ^(ENTRY ^(NAME name)
^(CONTACTTYPE contactType) ^(PHONE phone));
name:(~(NEWLINE))+;
contactType
:('Home'->^(HOME) | 'Work'->^(WORK));
phone:(~(NEWLINE))+;
NEWLINE:'\r'? '\n' ;
ANY_CHAR:. ;
More information about the antlr-interest
mailing list