[antlr-interest] Too many uses for escape character giving me lexer troubles.
    Jeremy D. Frens 
    jdfrens at calvin.edu
       
    Wed Mar 14 18:37:07 PDT 2007
    
    
  
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Terence Parr wrote:
> On Mar 13, 2007, at 6:51 PM, Jeremy D. Frens wrote:
>> In my language (http://nolatte.sf.net/), the backslash character is the
>> escape character, and it gets used for (at least) two different tasks.
>> Here's a stripped down grammar:
>>
>> atom        :  WORD | IDENTIFIER ;
>> WORD        :  ( ('a'..'z') | ( '\\' '{' ) )+ ;
>> IDENTIFIER    :   '\\' ('a'..'z')+ ;
>>
>> The key is that the backslash gets used for two purposes: as a real
>> escape character (to escape '{' in a WORD) and as the beginning of an
>> IDENTIFIER.  The problem comes in when my grammar tries to scan and/or
>> parse something like this:
>>
>>   abc\xyz
>>
>> This should be two tokens: a WORD "abc" and an IDENTIFIER "\xyz".
>> However, since the backslash is allowed at all in a WORD, the lexer
>> consumes it, and then it gets confused by the 'x'.
> 
> try putting ID before WORD
Same problem.  Three more observations:
Interpreting in ANTLRWorks as a WORD, I get a MismatchedTokenException
(complaining about getting an 'x' instead of a '{').
Interpreting in ANTLRWorks as an atom, I get what appears to be a valid
AST, although the leaf node had "abc\xyz" in it as if that's the text of
the leaf token.
Running the generated Java code, the lexer actually returns *just* "z"
as a WORD for the "abc\xyz" input.
At first, I was thinking that the problem was one with follow sets, but
now I'm not so sure.  Shouldn't a simple lookahead of k=2 detect when to
stop the current WORD and start an IDENTIFIER?  I originally thought it
might be follow sets because if a follow set only contained single
characters, then "\" becomes ambiguous: it's sometimes part of a WORD,
it sometimes follows a word.  But it occurs to me now that follow sets
don't normally enter into a lexer.
jdf
- --
* Jeremy D. Frens * Professor, Computer Science * jdfrens at calvin.edu *
   ``It just as easily could have gone the other way.''
           -- Chicago Cubs manager Don Zimmer on
               his team's 4-4 record
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF+KNDOcBu2deY79IRAnB/AKC0/qSCCkbnJ0EHJggYiLRwIUO3pgCcDouC
UdNn3O8HG7Yeowa5Auad2Tw=
=1iMw
-----END PGP SIGNATURE-----
    
    
More information about the antlr-interest
mailing list