[antlr-interest] Match any unicode character

Sun Nov 25 01:07:23 PST 2007

Hi there,

I've been running into a dead-end for what seems like a simple problem  
and hopefully someone out there has come across it in the past.

I have token definitions like so:
WORD		:	~('\r' | '\n' | WS)+;
WS				:   ' ' | '\t' | '\r' | '\n';

And I would like to be able to have a rule like this:
matchthis:	'[' (WORD | WS)+;

Essentially, I would like to match a '[' followed by 1 or more unicode  
characters as well as whitespace after it.

If I change the definition of WORD to be:
WORD		:	~('\r' | '\n' | WS | '[')+;

Then my parser is able to match the rule above, however I would like  
to be able to use this WORD token elsewhere in my parser grammar to  
match other things like:

nowmatchthis:	'!' (WORD | WS)+;

This then entails creating another WORD rule excluding the '!'  
literal.  However ANTLR doesn't like the existence of 2 of these token  
definitions because it means that other tokens I have defined are  
'unreachable'.

So my question is how would I approach something like this?  I just  
would like to match any unicode character after certain key characters.

Appreciate any help on the matter.

Thanks!