[antlr-interest] VCard parser

Sam Barnett-Cormack s.barnett-cormack at lancaster.ac.uk
Wed Aug 19 03:16:48 PDT 2009


Andreas Volz wrote:
> Hello,
> 
> I'm just starting with ANTLR and try to build a VCard parser. This is
> the data I like to parse:
> 
> BEGIN:VCARD
> VERSION:3.0
> N:Mustermann;Max
> FN:Max Mustermann
> ORG:Wikipedia
> URL:http://de.wikipedia.org/
> EMAIL;TYPE=INTERNET:max.mustermann at example.org
> TEL;TYPE=voice,work,pref:+49 1234 56788
> ADR;TYPE=intl,work,postal,parcel:;;Musterstraße 1;Musterstadt;;12345;Germany
> END:VCARD
> 
> My very first try is this:
> 
> grammar VCard;
> 
> BEGIN	:	'BEGIN:VCARD';
> 
> END	:	'END:VCARD';
> 
> VCARD	:	BEGIN LINE+ END;
> 
> LINE	: 	KEY DP VALUE;
> 
> KEY	:	~':';
> 
> DP 	:	':';
> 
> VALUE	:	('a'..'z' |'A'..'Z' )+;
> 
> 
> But this is only a first step. I couldn't get a more complex rule to work,
> 
> A problem that I open get is the message:
> 
> "The following token definitions can never be matched because prior tokens
>  match the same input"
> 
> Even if I don't use that token somewhere, Could anyone maybe explain me why
> this very simple grammar doesn't work:
> 
> grammar test;
> 
> KEY	:	('a'..'z' |'A'..'Z' )+;
> 
> VALUE	:	('a'..'z' |'A'..'Z' )+;
> 
> LINE	:	KEY '=' VALUE;
> 
> => "The following token definitions can never be matched because prior tokens 
> match the same input: VALUE"
> 
> How would I do this very simple key/value example?

You're mixing up lexing and parsing. KEY and VALUE are identical, so 
it'll never work. If you really want LINE as a lexer rule, KEY and VALUE 
have to be fragment rules. It makes more sense to have a rule, say, 
SYMBOL, and a rule EQ, as follows:

SYMBOL	:	('a'..'z'|'A'..'Z')+;

EQ	:	'=';

And then a *parser* rule for a line:

line	:	SYMBOL EQ SYMBOL ;

This ignore any other problems you'll have with this parsing strategy, 
and particularly any problems using it for VCARD.

The general point is, lexer rules don't refer to other lexer rules, 
unless those lexer rules are fragments (which will never produce tokens).

-- 
Sam Barnett-Cormack


More information about the antlr-interest mailing list