[antlr-interest] How to handle multicharacter delimiters?

Andrew Lentvorski bsder at allcaps.org
Wed Jul 25 17:17:52 PDT 2007


Okay, I've got gates running:

Given the following input and grammar:

$date
Fri Jan 26 11:28:51 2007
$end


grammar vcdfile2;

@lexer::members {
     boolean flgFreeText = false;
}

vcd	:	(declaration_command WS*)+ EOF;

declaration_command:	DATEK FREETEXT ENDK;

DATEK	:	'$date' {flgFreeText = true;};	
ENDK	:	'$end' {flgFreeText = false;};

FREETEXT:	{flgFreeText}?=> (~'$')+;

WS	:	(' '|'\n'|'\r'|'\t') ;
ANY	:	.;

I get a stream of tokens:
(DATEK="$date', FREETEXT="Fri Jan 26 11:28:51 2007", ENDK)


Exactly what I want.  However, if I change the input to:

$date
$ Fri Jan 26 11:28:51 2007
$end

It bombs since the '$' ceases matching FREETEXT but doesn't match a new 
token (strictly speaking, it matches ANY which then fails in the 
grammar).  Fine.  What I really want is "match any character unless it's 
part of the phrase "$end" in that order") a la:

FREETEXT:	{flgFreeText}?=> (~'$end')+;

That, of course, doesn't work.  Even with regexes, it would probably 
take one of the more advanced "lookaround" operators to pull it off.

About the only solution I can see is a combination of lexer and grammar 
rules.

grammar vcdfile2;

@lexer::members {
     boolean flgFreeText = false;
}

vcd	:	(declaration_command WS*)+ EOF;

declaration_command:	DATEK (FREETEXT | FREEDOL)* ENDK;

DATEK	:	'$date' {flgFreeText = true;};	
ENDK	:	'$end' {flgFreeText = false;};

FREETEXT:	{flgFreeText}?=> (~'$')+;
FREEDOL:	{flgFreeText}?=> '$';

WS	:	(' '|'\n'|'\r'|'\t') ;
ANY	:	.;

The FREEDOL rules gobbles up any stray '$' tokens that are not part of 
'$end' (which is larger than '$' so it matches first).  Then, it goes 
right back to FREETEXT.

Is this the only way to do this?  I still would rather have the token 
stream (DATEK, FREETEXT, ENDK), but I can probably live with the 
sequence like:

(DATEK, FREETEXT, FREEDOL, FREEDOL, FREETEXT, ENDK)

However, if there is a better way, I would love to hear it.

Thanks,
-a


More information about the antlr-interest mailing list