[antlr-interest] big lexer problem
Zhaohui Yang
yezonghui at gmail.com
Wed Aug 15 08:17:58 PDT 2012
Hi,
I'm having big problem with big generated Lexer.java. Any help appreciated.
The language is COBOL. And I found multiple reasons that the lexer get's
too big:
1. I'm adding semantic predicate into the lexer, to simulate "lexer state"
as in YACC and JavaCC. It's like
PICTURE_STRING: {lexerState==PIXTURE_STATE}?=> blah blah //
matching things like AXX(9).99 after a 'PIC' key word
The lexer without semantic predicates is 18K lines.
When I add predicates to one or two of the lexer rules, it grows to more
than 20K.
When I add a single more, it explodes to more than 60K and ANTLR give up
generating lexer with error: code is too long.
2. COBOL has a LOT of key words, that may explain the original 18K lines.
3. I have tokens referencing other tokens.
I've inlined most of them now, as suggested by others. But the size has
not reduced much.
So the question could be:
1. how to generate smaller lexer without removing semantic predicate?
2. If that's not possible, how to simulate "lexer state" without semantic
predicate?
3. Any other solution?
Thanks.
--
Regards,
Yang, Zhaohui
More information about the antlr-interest
mailing list