[antlr-interest] Match any unicode character
Basil Shkara
bshkara at gmail.com
Sun Nov 25 01:07:23 PST 2007
Hi there,
I've been running into a dead-end for what seems like a simple problem
and hopefully someone out there has come across it in the past.
I have token definitions like so:
WORD : ~('\r' | '\n' | WS)+;
WS : ' ' | '\t' | '\r' | '\n';
And I would like to be able to have a rule like this:
matchthis: '[' (WORD | WS)+;
Essentially, I would like to match a '[' followed by 1 or more unicode
characters as well as whitespace after it.
If I change the definition of WORD to be:
WORD : ~('\r' | '\n' | WS | '[')+;
Then my parser is able to match the rule above, however I would like
to be able to use this WORD token elsewhere in my parser grammar to
match other things like:
nowmatchthis: '!' (WORD | WS)+;
This then entails creating another WORD rule excluding the '!'
literal. However ANTLR doesn't like the existence of 2 of these token
definitions because it means that other tokens I have defined are
'unreachable'.
So my question is how would I approach something like this? I just
would like to match any unicode character after certain key characters.
Appreciate any help on the matter.
Thanks!
More information about the antlr-interest
mailing list