[antlr-interest] ANTLR modification

Sun Oct 23 07:53:10 PDT 2005

> I have made a small modification to the ANTL Token, 
> CommonToken, and CharScanner classes. The changes make it 
> possible to for a Token (and an AST node as well) to have an 
> offset assigned to it that represents the offset from the 
> beginning of the character stream. I have a need for this, 
> and I added and thought others might find it useful as well.

Can the changes be isolated to a CharScanner sub-class and a CommonToken
subclass?

Not everyone needs (or can afford the cost of) the offset data.

> The changes include added a virtual getOffset/setOffset 
> method to the Token class, and re-implementing these in the 
> CommonToken class. The CharScanner class has a new member var 
> "offset" that is initially set to 0 in it's constructor(s) 
> and then incremented for each call of the
> consume() method. This is member is then used when the 
> CharScanner creates a new Token in the makeToken() method. 
> The Token is then assigned this value by calling the Token's 
> setOffset() method.
> 
> Would there be any interest in having this merged into the 
> general ANTLR codebase? If so, what's the best way to submit this?

Problems I see with this. 
1. Lexer state isn't stored in CharScanner. There is a separate class for
it.
2. Not everyone needs (or can afford) the overhead of both line/col and
offest tracking. Perhaps a more generic mechanism for specifying (perhaps on
construction) whether to track offsets, line/col or both?

Cheers,

Micheal