[antlr-interest] Problems defining StringLiteral processing in a grammar

Tue Dec 16 04:37:37 PST 2008

2008/12/16 Gavin Lambert <antlr at mirality.co.nz>:
> [Don't forget to use Reply All to keep messages on-list.]

Sorry!

>
> At 23:55 16/12/2008, James Abley wrote:
>>I did see that in the Java5 grammar, but am still not 100% clear
>>on the distinction between lexer rules and parser rules.
>
> Lexer rules take characters as input and produce tokens, and have an
> uppercase initial letter.  (By convention, token names are usually totally
> uppercase, but that's optional.)
>
> Parser rules take tokens as input and either have no output (just executing
> embedded actions), text output (via StringTemplate), or tree output (via
> output=AST).  They have a lowercase initial letter.
>
> Tree parser rules are just like parser rules except they take an AST as
> input.  (And in 3.0 they can't output a second AST; in 3.1 that restriction
> is removed.)
>
>>My grammar builds an AST. I don't think it's possible to use
>>an action within that grammar, to strip quotes as per
>>http://www.antlr.org/wiki/pages/viewpage.action?pageId=1461
>
> Of course it is (although you might want to split your rule up into two
> separate tokens if you want to preserve some distinction between
> single-quoted and double-quoted strings).  That's also done at the lexer
> level, and the lexer isn't the bit that's responsible for building the AST
> anyway.
>
> Stripping surrounding quotes is trivial, as shown on the wiki page.
>  Processing escape sequences is a bit trickier, but can be done in a similar
> fashion.

So the lexer would have the code for processing any escape sequence
that I want? I think I'll check out the JavaFX sources to see if I can
find a good example. IIRC, Java processes unicode escape sequences in
source files prior to feeding anything else to the compiler. It's a
separate stage. Not sure that's something to be emulated...

http://74.125.77.132/search?q=cache:nKOkLkJvcoIJ:www.javapuzzlers.com/java-puzzlers-sampler.pdf+java+puzzlers+unicode&hl=en&ct=clnk&cd=3&gl=uk#6

>
> (Although usually you'd use "$text = x" rather than "setText(x)" and "$text"
> instead of "getText()" nowadays.  I think the wiki page is out of date.)
>
> One caveat to stripping the text during lexer processing though is that this
> can lead to confusing error messages (if you want to print the token text as
> part of the error), since it won't have the quotes/escapes any more.
>
>

Thanks for your comprehensive responses.

Cheers,

James