[antlr-interest] Accessing lexer characters programmatically?
Gavin Lambert
antlr at mirality.co.nz
Mon Apr 28 06:54:12 PDT 2008
At 01:42 29/04/2008, Susan Jolly wrote:
>It would be useful if I could determine prior to scanning
whether
>or not the supplied table is consistent with the lexer; i.e.
whether
>it includes conversions for the (non-ASCII) characters used in
>certain of the lexer rules such as
>
>TYPE1: '\u20AC'|'\u00A5'|'\u2026';
>TYPE2: '\u201D'|'\u2019';
[...]
>I could, of course, keep a separate list of the characters of
>concern but that would create an extra maintenance point. Is
>there a better way?
Maybe I'm just misunderstanding what you're asking here, but why
does it need to be prior to scanning?
It seems to me that the simplest way of doing what you ask is to
let the lexer run to completion, and as you encounter each
extended character you add it to a list of "used"
characters. After that (either between lexer and parser or after
the parser) you compare the used list against the supplied
conversion table.
Of course, this will only report problems where the character is
actually encountered in the input and not present in the
conversion table; if you need to report mismatches for characters
not actually in the input stream then it won't be sufficient.
More information about the antlr-interest
mailing list