[antlr-interest] Too many uses for escape character giving me lexer troubles.

Thu Mar 15 00:35:16 PDT 2007

On Wed, 14 Mar 2007 21:37:07 -0400, Jeremy D. Frens wrote:
> Terence Parr wrote:
> > On Mar 13, 2007, at 6:51 PM, Jeremy D. Frens wrote:
> >> In my language (http://nolatte.sf.net/), the backslash character is the
> >> escape character, and it gets used for (at least) two different tasks.
> >> Here's a stripped down grammar:
> >>
> >> atom        :  WORD | IDENTIFIER ;
> >> WORD        :  ( ('a'..'z') | ( '\\' '{' ) )+ ;
> >> IDENTIFIER    :   '\\' ('a'..'z')+ ;
> >>
> >> The key is that the backslash gets used for two purposes: as a real
> >> escape character (to escape '{' in a WORD) and as the beginning of an
> >> IDENTIFIER.  The problem comes in when my grammar tries to scan and/or
> >> parse something like this:
> >>
> >>   abc\xyz
> >>
> >> This should be two tokens: a WORD "abc" and an IDENTIFIER "\xyz".
> >> However, since the backslash is allowed at all in a WORD, the lexer
> >> consumes it, and then it gets confused by the 'x'.
> > 
> > try putting ID before WORD
> 
> Same problem.  Three more observations:

  I would simply not do that at lexer level.

  What would the following give ?:

    atom : word | identifier;
    word : ( LOWCASE | BACKSLASH OBRACE )+;
    identifier : BACKSLASH LOWCASE+
    BACKSLASH : '\\';
    OBRACE : '{';
    LOWCASE : 'a'..'z';

-- 
  Eric Deplagne
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20070315/8fd730b4/attachment.bin