[antlr-interest] Accessing lexer characters programmatically?

Mon Apr 28 06:54:12 PDT 2008

At 01:42 29/04/2008, Susan Jolly wrote:
 >It would be useful if I could determine prior to scanning 
whether
 >or not the supplied table is consistent with the lexer; i.e. 
whether
 >it includes conversions for the (non-ASCII) characters used in
 >certain of the lexer rules such as
 >
 >TYPE1: '\u20AC'|'\u00A5'|'\u2026';
 >TYPE2: '\u201D'|'\u2019';
[...]
 >I could, of course, keep a separate list of the characters of
 >concern but that would create an extra maintenance point.   Is
 >there a better way?

Maybe I'm just misunderstanding what you're asking here, but why 
does it need to be prior to scanning?

It seems to me that the simplest way of doing what you ask is to 
let the lexer run to completion, and as you encounter each 
extended character you add it to a list of "used" 
characters.  After that (either between lexer and parser or after 
the parser) you compare the used list against the supplied 
conversion table.

Of course, this will only report problems where the character is 
actually encountered in the input and not present in the 
conversion table; if you need to report mismatches for characters 
not actually in the input stream then it won't be sufficient.