[antlr-interest] String lexing and partial tokens

Mon Nov 27 16:53:18 PST 2006

Loring,

To my mind, 5-10% (assuming that you mean runtime) is still quite an overhead, which seems quite a lot for such a feature when there are other ways of achieving the same that don't really cost anything (at least in the C runtime any way). 

I would be interested in how you measure this, because anything that causes the token string to be created rather than just be indexes into the source would surely have a higher overhead than this. I suppose that what could be done would be that when a part of the token spec includes ! on a fixed length leading or trailing part of the token, then the start and end indexes could be adjusted at token emit time, but it just doesn't seem such a big deal to me, so long as there are reasonable ways of achieving the same thing manually. It seems that the removal of " in strings is in fact the main use for this functionality.

However, I don't believe that Ter rejected looking at this out of hand, just that for the moment there are plenty of other things to work on. That said, for my part, I think it is just a matter of documenting some ways to achieve the same thing and people getting used to them. I don't think that people object to changing ways of doing things if they are reasonable. While it is obviously quite a lot easier to just add ! to the matching text, you do this work once, whereas the resulting lexer will presumably run many more times than once; it seems that it is worth the small effort at grammar specification time to keep the lexer as trim as possible.

I am a fan of the ANTLR 3 approach of simplification over ANTLR 2, which generally yields leaner code generation, and transferring a certain amount of the effort to the grammar author. There are limits to this of course, but I think ANTLR 3 is a reasonable blend, given that it makes grammar programming in general so much easier than its predecessors. 

However I am sure that your efforts in this regard will be appreciated if they turn out to yield something that has very little overhead and little time to incorporate into the main ANTLR product.

Jim

-----Original Message-----
From: Loring Craymer [mailto:lgcraymer at yahoo.com] 
Sent: Monday, November 27, 2006 4:27 PM
To: Jim Idle; antlr-interest at antlr.org
Subject: Re: [antlr-interest] String lexing and partial tokens

--- Jim Idle <jimi at intersystems.com> wrote:

..
> You can  ask Jim Idle about that, but we decided to
> use methods for  
> setting the text rather than implementing ! which
> makes everything  
> inefficient. I could swear there was something in
> the documentation.

! in the lexer does not "make everything inefficient";
you just have to be smart about the implementation. 
The lexer editing via ! that is currently in the
Yggdrasil 0.5b releases (I'll have b2 out soon) costs
about 5-10% (rough estimate from looking at generated
code); once I can analyze which rules edit, that drops
still further.

--Loring

____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com

-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.14.17/553 - Release Date: 11/27/2006

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.14.17/553 - Release Date: 11/27/2006