[antlr-interest] Using the TokenRewriteStream

Susan Jolly easjolly at ix.netcom.com
Mon Jul 30 13:53:53 PDT 2007


Thanks Ter.  

First, yes I know about inserting imaginary/extra Tokens in the lexer and
I've found that very useful.  However, in my app it is much easier to
discover the need for certain insertions in the parser rather than in the
lexer.  (BTW, here's where ANTLR v3 shines.  Previous print-to-braille
translators use complex custom-coded finite state machines to determine
these insertions.)

I see now by looking at the source code instead of the comments that the
Rewrite operation does accept Objects.

However, I'm probably going to need modify TokenRewriteStream.toString() to
use and return a buffer that is a List (of imaginary and real Tokens) rather
than a StringBuffer.

One of the things I need to be able to do is to interline the input (which
is print text) with its braille translation.  ("Interline" means alternating
a line of print with a line of braille.) This requires adding line breaks to
the input so as to keep it in sync with the braille since each line of the
braille translation has a maximum length and must be terminated by a hard
return.  (The print lines typically have more characters than the braille
lines.) This alone I could do by estimating the length of a braille line by
totaling the String length of the real Tokens and then invoking toString
again with a smaller "end" index if the imaginary Tokens added by the
Rewrite make the line too long. (I can't put the line breaks in at a later
point because I'd lose the synchronization with the input.)

However, there are multiple ways to represent braille and a sighted user
might want more than one of them. In other words, the general case is that
the output consists of sets of lines where each set has one or more of the
following: a line of print, a line of standard braille (that controls the
line length), and possibly additional lines with other alternatives.

This is (I think) easier to do if I associate each Token with an object that
has all these multiple alternatives which is why I want to get a List of the
Tokens rather than the concatenated text.

The pictures at the tops of these two pages show what the output looks like
except that the first one doesn't include the original print text.
http://www.dotlessbraille.org/screencap.htm
http://www.dotlessbraille.org/gemini.htm

Susan




More information about the antlr-interest mailing list