[antlr-interest] Overlapping token definitions
Jan van Mansum
janvanmansum at gmail.com
Fri Jul 18 02:34:44 PDT 2008
Hello group,
I am parsing the structure file export of a DataPerfect database. Strings
are exported between tildes, like this:
~This is a string~
There are also strings that specify the format of a database field, with
some special
codes, like this:
~A25~ (alphanumeric field, 25 chars long)
Now I created a token rule for the general string and one for the
formatstring:
DP_STRING
: '~' ('a'..'z'|'A'..'Z'|'0'..'9'|' '|'-')* '~';
FORMAT
: '~' (ALPHANUM|NUM)'~'
;
fragment
ALPHANUM
: ('A'|'U') NUMBER ('A' NUMBER)
;
fragment
NUM
: ('G'|'H'|'N') ('9'|'Z'|'*'|'-'|'+'|'.'|','|'('|'$'|'F')+
;
However the lexer mostly emits DP_STRING token if it encounters a format
string.
The problem is that the formatstrings are a subset of the general strings.
My question: is there a way to have the lexer match the format strings
correctly and
only emit the DP_STRING in the remaining cases (i.e. when the string is not
compatible
with the FORMAT token definition)?
Thanks for any help, best regards,
--
Jan van Mansum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080718/9ef740d6/attachment-0001.html
More information about the antlr-interest
mailing list