[antlr-interest] Some ANTLR questions

Thu Nov 9 05:36:28 PST 2006

Hello All:

At this point, I've gotten the entire language I'm working on recognized and
have had some good experiences for the most part with ANTLRworks (although 
I've
tripped a few of those sorts of things that might be keeping it in Beta).
I have a few questions

1) In my language UP and DOWN are keywords, yet when I tried to create rules 
to scan
    them, ANTLRworks treated them strangely, so I changed the token names to 
MYUP and MYDOWN
    and it worked.  Are UP and DOWN keywords in ANTLR or Java (my parser 
creates a Java file).
2) I'm now ready to actually do something with the language I'm recognizing, 
and I'm wondering
    if the smart thing is to go with an AST or to try to use rules to do 
attribute propagation
    (back when dinosaurs ruled the earth, i did a lex/yacc, flex/bison style 
compiler which uses
      rules to do it).  If I do an AST, do I need to give AST management 
annotations to each
      production, or can I do it incrementally for unit testing?
3) I had some scanning issues, since my language includes undelimited 
strings (sort of like
    identifiers in many languages).  I created a parse rule that matched all 
alpha numeric
    keywords and created  scanning rules at the end which looked like:
ASTERISK: '*';
//The following rule was not well received by the scanner :-(
ALPHANUMSTRINGWITHWILDCARD	:  (ALPHANUM)+ ASTERISK;
// I think this needs to be last to avoid recognizing keywords.
NONKEYWORDUNDELIMITEDSTRING	: (ALPHANUM)+;

    ANTLR3 and ANTLRworks didn't like it and the debugger hung, leaving the 
TCP/IP
    port (I think it was 49153) unavailable under Windows XP (my employer's 
development
    environment) and preventing future debugging attempts (I didn't bother 
resetting
    the debugger's port number) until reboot.

    As a work around, I created a rule:
alphaNumStringWithWildcard
	:
	NONKEYWORDUNDELIMITEDSTRING ASTERISK // this is a bit of a hack, it allows 
white space between the two
	;

      But that since I have a rule that discards white space by sending it 
to channel=99, then
      this should accept say 'abc*' and 'abc *' so it is  a bit over 
permissive, so I would rather
      use the lexer (or fix this rule somehow).
      What is the best fix for this kind of rule?

Regards:

Bill M.

_________________________________________________________________
Try Search Survival Kits: Fix up your home and better handle your cash with 
Live Search! 
http://imagine-windowslive.com/search/kits/default.aspx?kit=improve&locale=en-US&source=hmtagline