[antlr-interest] Simple lexer grammar doesn't match '\''
Gavin Lambert
antlr at mirality.co.nz
Wed Aug 29 04:43:13 PDT 2007
At 22:40 29/08/2007, Mauro Pellicioli wrote:
>fragment
>LINK:'<a href="' STRING_LINK {System.out.println("Link:
>"+$STRING_LINK.text); '">' STRING {System.out.println("Hotel:
>"+$STRING.text);} '</a>';
[...]
>fragment
>STRING: ( ('\u0020'..'\u003B') | '\u003D' | ('\u003F'..'\u007E')
>|('\u0080'..'\u017F') )+;
[...]
><a
>href="/hotel/us/enfant-plaza.html?sid=b02d5b4438247c402f4a43539dfc9
>d8c">L'Enfant
>Plaza Hotel</a>
>
>Output:
>
>Link:
>/hotel/us/enfant-plaza.html?label=short-index.htmlerrorc_search_in_
>invalid%3Dsi;sid=1892815e8db2e96caca618e2377948d8
>Hotel: L
>
>Instead of:
>
>Link:/hotel/us/enfant-plaza.html?sid=b02d5b4438247c402f4a43539dfc9d
>8c
>Hotel:L'Enfant Plaza Hotel
>Address:480 L'Enfant Plaza, SW, Washington (Washington DC)
>
>
>It seems that STRING rule fails when it encounters a ' char (hex
>value 0x27), but STRING has the correct range of chars.
Are you certain that it's a ' character (0x27) and not a '
character (0x2019)? Because it definitely looks like the latter
one in your email message....
(Given 0x2019 is also 0x92 in the standard 1252 codepage, it's not
that uncommon to see it in the wild. You should probably be
accepting it too.)
More information about the antlr-interest
mailing list