[antlr-interest] Lexer and Java keywords

Wed Dec 9 15:51:50 PST 2009

On Thu, Dec 10, 2009 at 12:21 AM, Jim Idle <jimi at temporal-wave.com> wrote:
> The issue is that your lexer is too complicated for the standard timeout on analysis values.

No good because I planned to add more to it :-)

 Use:
>
> -Xconversiontimeout=32000
>
> And it will generate just fine.
>

Great, thank you, I was not aware of existence of such config option.

> You might also play with:
>
> -Xmaxswitchcaselabels 30000 -Xminswitchalts 1
>
> To generate switches rather than DFA tables and see if it makes any difference to code size etc.
>
> Also, rather than list every valid character like you do, when there is no ambiguity just accept anything for identifier, then make a semantic check for illegal characters. The lexer will be much simpler and your error messages much nicer.
>

Actually, the Java Language Specification says something similar to
your observation, namely that: A “Java letter” is a character for
which the method Character.isJavaIdentifierStart(int) returns true. A
“Java letter-or-digit” is a character for which the method
Character.isJavaIdentifierPart(int) returns true.
I think I can use this observation for building semantic predicates,
but I am not sure how to build a rule saying (pseudo-code) "if
Character.isJavaIdentifierPart(LA(1)) then consume till
!Character.isJavaIdentifierPart(LA(1))". I will have to read the
matter over, I guess. Thank you very much for your help.

-- 
Greetings
Marcin Rzeźnicki