[antlr-interest] big lexer problem

Wed Aug 15 08:17:58 PDT 2012

Hi,

I'm having big problem with big generated Lexer.java. Any help appreciated.

The language is COBOL. And I found multiple reasons that the lexer get's
too big:

1. I'm adding semantic predicate into the lexer, to simulate "lexer state"
as in YACC and JavaCC. It's like

       PICTURE_STRING: {lexerState==PIXTURE_STATE}?=> blah blah //
matching things like AXX(9).99 after a 'PIC' key word

   The lexer without semantic predicates is 18K lines.
   When I add predicates to one or two of the lexer rules, it grows to more
than 20K.
   When I add a single more, it explodes to more than 60K and ANTLR give up
generating lexer with error: code is too long.

2. COBOL has a LOT of key words, that may explain the original 18K lines.

3. I have tokens referencing other tokens.
   I've inlined most of them now, as suggested by others. But the size has
not reduced much.

So the question could be:
1. how to generate smaller lexer without removing semantic predicate?
2. If that's not possible, how to simulate "lexer state" without semantic
predicate?
3. Any other solution?

Thanks.

-- 
Regards,

Yang, Zhaohui