[antlr-interest] antlr v4 wish list

Wed Mar 30 06:57:02 PDT 2011

Sam~

A token needs to know both start and end position.  Especially when
you add in the restriction that *synthetic* tokens should respond with
the positions for the entire rule that created them (if they weren't
based on another token).  Basically, you need Tree and Token to always
be able to provide locations in the original stream (even if those
locations are best guess) regardless of how many tree transformations
have taken place.  Whether it internally uses a shared array of line
offsets or stores duplicates in every token, I don't care, but pushing
all of that onto every language implementer is not a good trade off.

Matt

On Tue, Mar 29, 2011 at 11:29 PM, Sam Harwell
<sharwell at pixelminegames.com> wrote:
> Hi Martin,
>
> Replying to the individual points:
>
> 1. A token only needs to know the start position in the input stream and the
> length. Considering a file may easily have hundreds of thousands of tokens,
> it's very important to not add any information to the token that can be
> efficiently derived in another manner, especially if that information is
> infrequently used by applications. For example, the line/column information
> can be efficiently derived if the lexer maintains an internal array of line
> offsets (index 0 contains 0, the start position of line 0; index 1 contains
> the offset to the start of line 1; etc...).
>
> 3. The current notation is pretty simple once you see it. Also, it's well
> documented in the books.
>
> 4. With proper integration into the build system, generated files aren't
> checked into source control or distributed. The ANTLR project itself
> generates V2 and V3 grammars, and my .NET projects generate V3 grammars
> (using my C# port of the Tool) at build time, so the generated files never
> take up space in source control.
>
> Sam
>
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Martin d'Anjou
> Sent: Tuesday, March 29, 2011 9:33 PM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] antlr v4 wish list
>
> Hello,
>
> My suggestions, for what it's worth:
>
> 1) In the Runtime section:
> * Tokens and Trees should both know their start/stop line, start/stop char
> position to make IDEs easier.
>
> Not only IDEs, but for also for debugging on the command line in a terminal.
> The file name is also needed.
>
> 2) Lexer debug enhancement:
> Option on the lexer constructor to have the lexer print some debug info:
> token type by name, token value, filename, line and char position, without
> having to replace antlr's built-in classes.
>
> 3) General:
> I have spent many hours on a ridiculous little problem: the grammar
> declaration statement! So I suggest enforcing the grammar type in the
> grammar declaration:
> parser grammar MyGrammar;
> lexer grammar MyGrammar;
> mixed grammar MyGrammar;  // lexer and parser grammar tree grammar
> MyGrammar;
>
> 4) Gigantic source files, as described here:
> http://v2kparse.blogspot.com/2008/06/first-pass-uploaded-to-sourceforce.html
> Maybe this has been solved already?
>
> Regards,
> Martin d'Anjou
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>