[antlr-interest] Simple lexer grammar doesn't match '\''

Gavin Lambert antlr at mirality.co.nz
Wed Aug 29 04:43:13 PDT 2007


At 22:40 29/08/2007, Mauro Pellicioli wrote:
 >fragment
 >LINK:'<a href="' STRING_LINK {System.out.println("Link:
 >"+$STRING_LINK.text); '">' STRING {System.out.println("Hotel:
 >"+$STRING.text);} '</a>';
[...]
 >fragment
 >STRING: ( ('\u0020'..'\u003B') | '\u003D' | ('\u003F'..'\u007E')
 >|('\u0080'..'\u017F') )+;
[...]
 ><a
 >href="/hotel/us/enfant-plaza.html?sid=b02d5b4438247c402f4a43539dfc9
 >d8c">L'Enfant
 >Plaza Hotel</a>
 >
 >Output:
 >
 >Link:
 >/hotel/us/enfant-plaza.html?label=short-index.htmlerrorc_search_in_
 >invalid%3Dsi;sid=1892815e8db2e96caca618e2377948d8
 >Hotel: L
 >
 >Instead of:
 >
 >Link:/hotel/us/enfant-plaza.html?sid=b02d5b4438247c402f4a43539dfc9d
 >8c
 >Hotel:L'Enfant Plaza Hotel
 >Address:480 L'Enfant Plaza, SW, Washington (Washington DC)
 >
 >
 >It seems that STRING rule fails when it encounters a ' char (hex 

 >value 0x27), but STRING has the correct range of chars.

Are you certain that it's a ' character (0x27) and not a ' 
character (0x2019)?  Because it definitely looks like the latter 
one in your email message....

(Given 0x2019 is also 0x92 in the standard 1252 codepage, it's not 
that uncommon to see it in the wild.  You should probably be 
accepting it too.)



More information about the antlr-interest mailing list