[antlr-interest] Too many uses for escape character giving me lexer troubles.

Jeremy D. Frens jdfrens at calvin.edu
Thu Mar 15 07:46:26 PDT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Eric Deplagne wrote:
> On Wed, 14 Mar 2007 21:37:07 -0400, Jeremy D. Frens wrote:
>>>> atom        :  WORD | IDENTIFIER ;
>>>> WORD        :  ( ('a'..'z') | ( '\\' '{' ) )+ ;
>>>> IDENTIFIER    :   '\\' ('a'..'z')+ ;
>>>>
>>>> The key is that the backslash gets used for two purposes: as a real
>>>> escape character (to escape '{' in a WORD) and as the beginning of an
>>>> IDENTIFIER.
>   I would simply not do that at lexer level.
> 
>   What would the following give ?:
> 
>     atom : word | identifier;
>     word : ( LOWCASE | BACKSLASH OBRACE )+;
>     identifier : BACKSLASH LOWCASE+
>     BACKSLASH : '\\';
>     OBRACE : '{';
>     LOWCASE : 'a'..'z';

I've thought about this solution, but I haven't tried it yet.  I'm
probably inclined to go this way just so that I can move forward (if for
no other reason).  However, there's a part of me that's intrigued.

There's another intriguing option this way: I can't throw away the
BACKSLASH in the lexer with "BACKSLASH!".  (Again, another intriguing
question: why not?)  In the parser, I can.

jdf

- --
* Jeremy D. Frens * Professor, Computer Science * jdfrens at calvin.edu *
   ``You are only young once, but you can stay immature indefinitely.''
                                 -- Unknown

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF+VxCOcBu2deY79IRAuXsAJ0VTn6R6yymN4m9xs0JKrSnwVKoRgCeMXbE
loVV7XQeAlycUVCdyX5g/vU=
=OviZ
-----END PGP SIGNATURE-----


More information about the antlr-interest mailing list