[antlr-interest] Lexer and Java keywords

Wed Dec 9 22:09:03 PST 2009

Jim Idle wrote:
> The issue is that your lexer is too complicated for the standard timeout on analysis values. Use:
> 
> -Xconversiontimeout=32000
> 
> And it will generate just fine.
[...]

This is probably due to listing the character ranges for JavaLetter and
JavaLetterOrDigit explicitly. Using the technique below (based on code
from the ECMAScript 3 grammar by Patrick Hulsmeijer) will probably allow
the lexer to be small enough to generate with the default timeout. Note
that you'll have to adjust this for any differences between the identifier
syntax language you're trying to parse, and that of Java -- I notice that
you had '\u0000'..'\u0008' | '\u000e'..'\u001b' in JavaLetterOrDigit,
for example.

fragment IdentifierStartASCII
  : 'a'..'z'
  | 'A'..'Z'
  | '$'
  | '_'
  ;

fragment IdentifierPart
  : IdentifierStartASCII
  | '0'..'9'
  | { Character.isJavaIdentifierPart(input.LA(1)) }?
      { matchAny(); }
  ;

// This generates mIdentifierRest() used below.
fragment IdentifierRest
  : IdentifierPart*
  ;

IDENTIFIER
  : IdentifierStartASCII IdentifierRest
  | { if (!Character.isJavaIdentifierStart(input.LA(1))) {
        throw new NoViableAltException("identifier start", 0, 0, input);
      }
      matchAny(); mIdentifierRest(); }
  ;

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 292 bytes
Desc: OpenPGP digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20091210/435e5a43/attachment.bin