[antlr-interest] Too many uses for escape character giving me lexer troubles.
Eric Deplagne
Eric.Deplagne at loria.fr
Thu Mar 15 10:13:50 PDT 2007
On Thu, 15 Mar 2007 11:29:43 -0400, John B. Brodie wrote:
>
> >> On Wed, 14 Mar 2007 21:37:07 -0400, Jeremy D. Frens wrote:
> >>>>> atom : WORD | IDENTIFIER ;
> >>>>> WORD : ( ('a'..'z') | ( '\\' '{' ) )+ ;
> >>>>> IDENTIFIER : '\\' ('a'..'z')+ ;
> >>>>>
> >>>>> The key is that the backslash gets used for two purposes: as a real
> >>>>> escape character (to escape '{' in a WORD) and as the beginning of an
> >>>>> IDENTIFIER.
> >> I would simply not do that at lexer level.
> >>
> >> What would the following give ?:
> >>
> >> atom : word | identifier;
> >> word : ( LOWCASE | BACKSLASH OBRACE )+;
> >> identifier : BACKSLASH LOWCASE+
> >> BACKSLASH : '\\';
> >> OBRACE : '{';
> >> LOWCASE : 'a'..'z';
> >
> >I've thought about this solution, but I haven't tried it yet. I'm
> >probably inclined to go this way just so that I can move forward (if for
> >no other reason). However, there's a part of me that's intrigued.
> >
>
> Pardon me for butting in... I have not been following this discusion; so
> maybe this suggestion is completely bogus. But how about (untested):
>
> atom : WORD | IDENTIFIER ;
> WORD : ('a'..'z') WORD_TAIL ;
> IDENTIFIER : '\\' ( ( '{' WORD_TAIL { $type=WORD; } )
> | ('a'..'z')+
> ) ;
> fragment
> WORD_TAIL : ( ('a'..'z') | ( '\\' '{' ) )+ ;
>
> basically this is just left-factoring the handling of the initial backslash
> character...
>
> Hope this helps
> -jbb
Just looks like an horrible hack to me, this $type= statement...
--
Eric Deplagne
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20070315/13a98d11/attachment-0001.bin
More information about the antlr-interest
mailing list