[antlr-interest] Lexer and Java keywords

Sun Dec 13 10:09:42 PST 2009

ANTLR can only know so much from your grammar and in a lexer, when it sees a character it doesn't understand it can only show an error, consume it and move on. The error will not mean much to the user. The idea is leave errors to be reported as high up the tool chain as you can (in general) as you have context for the error whereas ANTLR does not, so your error would be "'x' is not a valid character for identifier use', hence you match ID with the inverse set of things that definitely must be in a different token, then check that what you gathered is a good identifier - otherwise you just get 'Invalid character...', which is much less useful. ANTLR knows only that 'this does not belong here'.

Jim

> -----Original Message-----
> From: Marcin Rzeznicki [mailto:marcin.rzeznicki at gmail.com]
> Sent: Sunday, December 13, 2009 8:16 AM
> To: Sam Harwell
> Cc: Jim Idle; antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Lexer and Java keywords
> 
> 2009/12/10 Sam Harwell <sharwell at pixelminegames.com>:
> > You're making this too complicated. Parse the identifier as loosely
> as absolutely possible. Many improper identifiers actually don't cause
> any problems in parsing, so you can treat them as valid and provide
> compiler error messages like semantics problems in post-AST analysis -
> the identifiers are just string literal keys to reference code
> constructs. After you perform semantic analysis check each identifier
> (variable and method names, etc.) by calling the Character class
> methods. Log the errors, but you don't have to stop the analysis from
> just that.
> 
> Right, but isn't it that ANTLR tries to, kind of, 'sync' the parser on
> an error? So that it does not stop actually the analysis but catches
> up?
> 
> >
> > The general rule is don't engineer your parser to fail until you can
> no longer provide useful error messages. You can always manually stop
> early - for example sometimes I throw an OperationCancelledException in
> an error listener to stop a background parse for IDE IntelliSense after
> a user-specified number of errors are logged.
> >
> 
> Good idea, but isn't that what ANTLR automatically does? I mean - it
> does not fail when it doesn't absolutely have to?
> 
> 
> 
> 
> --
> Greetings
> Marcin Rzeźnicki