[antlr-interest] Too many uses for escape character giving me lexer troubles.
Eric Deplagne
Eric.Deplagne at loria.fr
Thu Mar 15 00:35:16 PDT 2007
On Wed, 14 Mar 2007 21:37:07 -0400, Jeremy D. Frens wrote:
> Terence Parr wrote:
> > On Mar 13, 2007, at 6:51 PM, Jeremy D. Frens wrote:
> >> In my language (http://nolatte.sf.net/), the backslash character is the
> >> escape character, and it gets used for (at least) two different tasks.
> >> Here's a stripped down grammar:
> >>
> >> atom : WORD | IDENTIFIER ;
> >> WORD : ( ('a'..'z') | ( '\\' '{' ) )+ ;
> >> IDENTIFIER : '\\' ('a'..'z')+ ;
> >>
> >> The key is that the backslash gets used for two purposes: as a real
> >> escape character (to escape '{' in a WORD) and as the beginning of an
> >> IDENTIFIER. The problem comes in when my grammar tries to scan and/or
> >> parse something like this:
> >>
> >> abc\xyz
> >>
> >> This should be two tokens: a WORD "abc" and an IDENTIFIER "\xyz".
> >> However, since the backslash is allowed at all in a WORD, the lexer
> >> consumes it, and then it gets confused by the 'x'.
> >
> > try putting ID before WORD
>
> Same problem. Three more observations:
I would simply not do that at lexer level.
What would the following give ?:
atom : word | identifier;
word : ( LOWCASE | BACKSLASH OBRACE )+;
identifier : BACKSLASH LOWCASE+
BACKSLASH : '\\';
OBRACE : '{';
LOWCASE : 'a'..'z';
--
Eric Deplagne
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20070315/8fd730b4/attachment.bin
More information about the antlr-interest
mailing list