[antlr-interest] Overlapping token definitions

Fri Jul 18 02:34:44 PDT 2008

Hello group,

I am parsing the structure file export of a DataPerfect database. Strings
are exported between tildes, like this:

~This is a string~

There are also strings that specify the format of a database field, with
some special
codes, like this:

~A25~ (alphanumeric field, 25 chars long)

Now I created a token rule for the general string and one for the
formatstring:

DP_STRING
    :     '~' ('a'..'z'|'A'..'Z'|'0'..'9'|' '|'-')*  '~';

FORMAT
    :    '~' (ALPHANUM|NUM)'~'
    ;

fragment
ALPHANUM
    :    ('A'|'U') NUMBER ('A' NUMBER)
    ;

fragment
NUM
    :    ('G'|'H'|'N') ('9'|'Z'|'*'|'-'|'+'|'.'|','|'('|'$'|'F')+
    ;

However the lexer mostly emits DP_STRING token if it encounters a format
string.
The problem is that the formatstrings are a subset of the general strings.

My question: is there a way to have the lexer match the format strings
correctly and
only emit the DP_STRING in the remaining cases (i.e. when the string is not
compatible
with the FORMAT token definition)?

Thanks for any help, best regards,

-- 
Jan van Mansum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080718/9ef740d6/attachment-0001.html