[antlr-interest] Any thoughts of using java.util.Scanner (jdk5.x)
christopher.d.schultz at comcast.net
Thu Jan 20 21:08:36 PST 2005
> Any thoughts of using java.util.Scanner (jdk5.x) for tokenizing?
One major problem with the new Scanner class is that it doesn't work
well with hererogenius tokens. ANTLR's scanner (tokenizer), as well as
the tokenizers shipped with many other compiler compilers, works very
well recognizing tokens that are completely orthogonal.
You simply can't write an expression that returns tokens which sometimes
look like "AN_IDENTIFIER" and sometimes look like "3.141592654288".
Sure, you can split on whitespace, but that doesn't always work very well.
The approach given in this article for handling heterogenious tokens is
to layer one Scanner on top of another. However, the base-level Scanner
needs to generate very simple tokens, and then you have to layer
successively smarter Scanners on top of it. I think that having a
custom-generated tokenizer (a la ANTLR, lex/yacc, JavaCC, JLex/CUP,
etc.) makes more sense than using a very generic Scanner class (which is
essentially a regex used to split a String).
Probably a better reason not to use java.util.Scanner is breaking
compatibility: ANTLR will require Java 1.5, whereas today it only
requires Java 1.1.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 254 bytes
Desc: OpenPGP digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20050121/91a3b6a3/signature.bin
More information about the antlr-interest