[antlr-interest] antlr v4 wish list

Sam Harwell sharwell at pixelminegames.com
Tue Mar 29 20:29:35 PDT 2011


Hi Martin,

Replying to the individual points:

1. A token only needs to know the start position in the input stream and the
length. Considering a file may easily have hundreds of thousands of tokens,
it's very important to not add any information to the token that can be
efficiently derived in another manner, especially if that information is
infrequently used by applications. For example, the line/column information
can be efficiently derived if the lexer maintains an internal array of line
offsets (index 0 contains 0, the start position of line 0; index 1 contains
the offset to the start of line 1; etc...).

3. The current notation is pretty simple once you see it. Also, it's well
documented in the books.

4. With proper integration into the build system, generated files aren't
checked into source control or distributed. The ANTLR project itself
generates V2 and V3 grammars, and my .NET projects generate V3 grammars
(using my C# port of the Tool) at build time, so the generated files never
take up space in source control.

Sam

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Martin d'Anjou
Sent: Tuesday, March 29, 2011 9:33 PM
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] antlr v4 wish list

Hello,

My suggestions, for what it's worth:

1) In the Runtime section:
* Tokens and Trees should both know their start/stop line, start/stop char
position to make IDEs easier.

Not only IDEs, but for also for debugging on the command line in a terminal.
The file name is also needed.

2) Lexer debug enhancement:
Option on the lexer constructor to have the lexer print some debug info: 
token type by name, token value, filename, line and char position, without
having to replace antlr's built-in classes.

3) General:
I have spent many hours on a ridiculous little problem: the grammar
declaration statement! So I suggest enforcing the grammar type in the
grammar declaration:
parser grammar MyGrammar;
lexer grammar MyGrammar;
mixed grammar MyGrammar;  // lexer and parser grammar tree grammar
MyGrammar;

4) Gigantic source files, as described here:
http://v2kparse.blogspot.com/2008/06/first-pass-uploaded-to-sourceforce.html
Maybe this has been solved already?

Regards,
Martin d'Anjou


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list