[antlr-interest] How to handle multicharacter delimiters?
Andrew Lentvorski
bsder at allcaps.org
Wed Jul 25 17:17:52 PDT 2007
Okay, I've got gates running:
Given the following input and grammar:
$date
Fri Jan 26 11:28:51 2007
$end
grammar vcdfile2;
@lexer::members {
boolean flgFreeText = false;
}
vcd : (declaration_command WS*)+ EOF;
declaration_command: DATEK FREETEXT ENDK;
DATEK : '$date' {flgFreeText = true;};
ENDK : '$end' {flgFreeText = false;};
FREETEXT: {flgFreeText}?=> (~'$')+;
WS : (' '|'\n'|'\r'|'\t') ;
ANY : .;
I get a stream of tokens:
(DATEK="$date', FREETEXT="Fri Jan 26 11:28:51 2007", ENDK)
Exactly what I want. However, if I change the input to:
$date
$ Fri Jan 26 11:28:51 2007
$end
It bombs since the '$' ceases matching FREETEXT but doesn't match a new
token (strictly speaking, it matches ANY which then fails in the
grammar). Fine. What I really want is "match any character unless it's
part of the phrase "$end" in that order") a la:
FREETEXT: {flgFreeText}?=> (~'$end')+;
That, of course, doesn't work. Even with regexes, it would probably
take one of the more advanced "lookaround" operators to pull it off.
About the only solution I can see is a combination of lexer and grammar
rules.
grammar vcdfile2;
@lexer::members {
boolean flgFreeText = false;
}
vcd : (declaration_command WS*)+ EOF;
declaration_command: DATEK (FREETEXT | FREEDOL)* ENDK;
DATEK : '$date' {flgFreeText = true;};
ENDK : '$end' {flgFreeText = false;};
FREETEXT: {flgFreeText}?=> (~'$')+;
FREEDOL: {flgFreeText}?=> '$';
WS : (' '|'\n'|'\r'|'\t') ;
ANY : .;
The FREEDOL rules gobbles up any stray '$' tokens that are not part of
'$end' (which is larger than '$' so it matches first). Then, it goes
right back to FREETEXT.
Is this the only way to do this? I still would rather have the token
stream (DATEK, FREETEXT, ENDK), but I can probably live with the
sequence like:
(DATEK, FREETEXT, FREEDOL, FREEDOL, FREETEXT, ENDK)
However, if there is a better way, I would love to hear it.
Thanks,
-a
More information about the antlr-interest
mailing list