[antlr-interest] Too many uses for escape character giving me lexer troubles.

Terence Parr parrt at cs.usfca.edu
Wed Mar 14 10:05:33 PDT 2007


On Mar 13, 2007, at 6:51 PM, Jeremy D. Frens wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I'm using ANTLR v3 (and quite liking it).
>
> In my language (http://nolatte.sf.net/), the backslash character is  
> the
> escape character, and it gets used for (at least) two different tasks.
> Here's a stripped down grammar:
>
> atom		:  WORD | IDENTIFIER ;
> WORD		:  ( ('a'..'z') | ( '\\' '{' ) )+ ;
> IDENTIFIER	:   '\\' ('a'..'z')+ ;
>
> The key is that the backslash gets used for two purposes: as a real
> escape character (to escape '{' in a WORD) and as the beginning of an
> IDENTIFIER.  The problem comes in when my grammar tries to scan and/or
> parse something like this:
>
>   abc\xyz
>
> This should be two tokens: a WORD "abc" and an IDENTIFIER "\xyz".
> However, since the backslash is allowed at all in a WORD, the lexer
> consumes it, and then it gets confused by the 'x'.

try putting ID before WORD
Ter


More information about the antlr-interest mailing list