[antlr-interest] unicode 16bit versus new 21bit stuff

Mike Lischke lists at lischke-online.de
Sat Jun 19 11:09:59 PDT 2004


Hi Terence, 


> I'm secretly planning to allow all sorts of cool stuff like 
> parsers that can handle single char tokens w/o going to the 
> lexer and so on.  
> Having the parser at runtime be able to distinguish char from 
> token type just by looking at the value was going to be mighty handy.

How's that going to work? Will you let the parser take over some of the lexer functionality? How often do you expect is
it that one needs to recognise single character tokens? Isn't it usually so that entire words are being tokenized? Well,
right, there are the symbols used for example in expressions. But then you'd need a separate path in the parser a la "is
it a single char token then use it now, if not ask the lexer what it really is". Doesn't sound like a big speed
improvement to me.

What I rather would like is that the lexer would more work so that I could use tokens with overlapping definitions. For
instance currently I have to make

  INPUT_CHARACTER: ~('\n' | '\r' | '\u2028' | '\u2029');

protected, otherwise it conflicts with almost anything in my grammar. Currently I cannot define:

  DIGIT:           '0'..'9';
  HEX_DIGIT:       DIGIT | 'a'..'f';
  OCTAL_DIGIT:     '0'..'7';
 
Without making all three rules protected. I think you got the pattern.

> In the previous version, I made a number of decisions based 
> upon the current state of the art in CPU speed / 
> architecture, which of course changes pretty damn fast.  I 
> wonder if we shouldn't just go 64 bit for the token types 
> leaving a full 32-bits for characters and for token types all 
> within the same value.  

I think ANTLR is pretty damn fast and going 64 bit wouldn't hurt that much (can't speak for C++, though). I believe you
will much more gain than you loose when going that route.

> Hmm...I wonder how fast 64-bit processors will become the 
> norm (G5s are there and AMD is too, right?)?  

I read Intel will make their 64 bit variant earlier public than they planned originally and with Linux and other OSes
already 64 bit capable I believe 64 bit will soon also come to the desktop. Don't argue about the sense for the average
user having then 32 bits more about they don't know where to spend them, but they will come quite soon.

>How horrible 
> does Java do 64-bit ints now for comparison and other rot?  

Can't tell, I have not yet used Java on a 64 bit platform.

> ANTLR 3.0 won't be available for a while...perhaps 64 bits 
> ain't that bad an idea.  

Sure, and don't forgot the headlines: "ANTLR, the first native 64 bit compiler generator in the world" ;-)

> Too bad I don't have #define or a typename I could use so the 
> actual type could be changed later.  Would be nice to see 
> LabelType instead of int.

This would make a decision much easier, indeed. And neither generics nor wrapper classes would help, so even in the
future there seem to be no alternatives.

Mike
--
www.soft-gems.net



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list