[antlr-interest] Re: Handling Lots of Keywords?

Thomas Brandon tom at psy.unsw.edu.au
Mon Oct 6 20:28:39 PDT 2003


As long as you use actual keywords (defined) in the tokens section 
Antlr should scale OK (see 
http://www.antlr.org/doc/metalang.html#TokensSection). Having 1000 
rules for your keywords would probably be a rather large performance 
hit due to large bitsets (depending where they are used I guess). But 
if you have keywords then all Antlr does is add them to a Hashtable 
and test them in the testLiterals routine.

However, it might be better to use your own checking code to avoid 
having to put all the keywords in the grammar. If you maintain your 
own Hashtable and use a semantic action like:
IDENT_OR_KEYWORD:
    IDENT
    { if(isBrailleKeyword($getText)) $setType(BRAILLE_KEYWORD); }
    ;

where boolean isBrailleKeyword(String) is you function to check 
against your hashtable. That way you just need to maintain your 
hashtable and don't need to maintain keywords in your grammar. I did 
something similar with the data stored into an XML file. That way you 
can associate other info with the keywords all in one place. This 
should scale as well as a Hashtable scales which for only 1000 items 
shouldn't be too bad.

Tom.
--- In antlr-interest at yahoogroups.com, "dotlessbraille" 
<easjolly at i...> wrote:
> I am trying to analyze braille texts using the current US standard 
> representation for braille math.  Braille uses 63 characters (and 
> the space). It is typically represented electronically with the 63 
> ASCII codes corresponding to the small (xor capital) letters and 
all 
> but 5 of the special characters so the input is well-defined.
> 
> If the tokenization is treated as a lexical problem, there is the 
> unusual feature that there are more than 1000 keywords, some with 
> more than a dozen characters.  (The keywords are mainly used to 
> represent mathematical symbols by a notation more intuitive than 
> Unicode character codes.)
> 
> If any of you have ever dealt with this number of keywords, I'd be 
> grateful for advice.   
> 
> Thanks,
> Susan


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list