[antlr-interest] Problems defining StringLiteral processing in a grammar

Tue Dec 16 03:46:32 PST 2008

[Don't forget to use Reply All to keep messages on-list.]

At 23:55 16/12/2008, James Abley wrote:
 >I did see that in the Java5 grammar, but am still not 100% clear 

 >on the distinction between lexer rules and parser rules.

Lexer rules take characters as input and produce tokens, and have 
an uppercase initial letter.  (By convention, token names are 
usually totally uppercase, but that's optional.)

Parser rules take tokens as input and either have no output (just 
executing embedded actions), text output (via StringTemplate), or 
tree output (via output=AST).  They have a lowercase initial 
letter.

Tree parser rules are just like parser rules except they take an 
AST as input.  (And in 3.0 they can't output a second AST; in 3.1 
that restriction is removed.)

 >My grammar builds an AST. I don't think it's possible to use
 >an action within that grammar, to strip quotes as per
 >http://www.antlr.org/wiki/pages/viewpage.action?pageId=1461

Of course it is (although you might want to split your rule up 
into two separate tokens if you want to preserve some distinction 
between single-quoted and double-quoted strings).  That's also 
done at the lexer level, and the lexer isn't the bit that's 
responsible for building the AST anyway.

Stripping surrounding quotes is trivial, as shown on the wiki 
page.  Processing escape sequences is a bit trickier, but can be 
done in a similar fashion.

(Although usually you'd use "$text = x" rather than "setText(x)" 
and "$text" instead of "getText()" nowadays.  I think the wiki 
page is out of date.)

One caveat to stripping the text during lexer processing though is 
that this can lead to confusing error messages (if you want to 
print the token text as part of the error), since it won't have 
the quotes/escapes any more.