[antlr-interest] Overloaded Lexemes!

Wed Apr 28 07:22:52 PDT 2004

All,

I am new to the newsgroup and so apologise for any early 
transgressions...

I am attempting to parse a computer language that contains comments 
(that may contain any characters). They are of the form:

COMMENT TEXT(jasdfjalk;fjkl;%$£$%lldf'slf)
COMMENT TEXT(jas...dfjalk;fjkl;%$£$%lldf'slfsd][}{}*&fdsadsvdf#'''""")
...

I am having trouble.  The following strategy works fine except for 
when I have characters such as '.', '&', ':', etc. where the 
prediction fails.

COMMENT	: TEXT! LPAREN! (~('\r'|'\n'))* RPAREN! '\r''\n'
	{newline();};

I've tried non-greedy options for the subrule (as discussed in the 
LEXER section of the ANTLR documentation) without success.

John Mitchell in posting 11899, refers to overloaded lexemes 
(namely '.' or DOT as in the java.g grammar).  There seems to be a 
special treatment for such characters (although my attempts to mimic 
it have failed thus far - the DOT type is not 'found').

Can anyone explain how to handle such characters/this situation or 
point me to any relevant text.

Best Regards,

Steve Taplin

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/