[antlr-interest] Best practice to handle Lexer backtracking demand

Joachim Schrod jschrod at acm.org
Mon Aug 16 15:04:51 PDT 2010


Gerald Rosenberg writes:

Gerald,

>   The attached grammar illustrates two different patterns that
> could work to identify the markers.

Thanks a lot for these patterns. They illustrate usages that I was
not aware of, I learned something new today. :-)

> However, there is an open question about whether a valid marker
> can appear without prefix, suffix, or any escaped characters).

As I wrote, it always does. (The escaping is done in the data, not
in the marker strings.)

Actually, I have even found a different solution. In a first pass,
I filter/rewrite the input and insert at the start and end of each
marker string characters that won't appear in the data. (I use
\u0001 and \u0002, respectively.) With these characters as
delimiter, I can now define data word tokens that don't include
these characters. Thus I can formulate catch-all rules that are as
long as any marker strings and cover all prefixes of all marker
strings. No NoViableAltException any more.

So I solved the problem by redefining the task... I hope that I
don't have to parse another file in the future where lexer
backtracking is needed. ;-) Hmm, maybe combining full-fledged lexer
generators like Jflex with ANTLR parsers? Looking at the docs, this
shouldn't be too hard to realize.

Anyhow, thanks very much for the very fast and enduring responses
to my questions. As a newbie to ANTLR, I feel very welcomed.

Cheers,
	Joachim

PS: Let me add some final comments how that input happens to be as
convoluted as it is: It is created as a VARBLK file on a mainframe
system. There, the marker strings are always at the start and at
the end of a line. Transfer as text file (where line ends would be
properly inserted) to my Unix servers destroys umlaute, they are
mapped to []{} and such. Transfer as binary file leaves the umlaute
intact (well, in EBCDIC, but that's easy to handle), but doesn't
insert newlines. Data file creation and transfer is made by another
company and I'm not in the position to change this; so I have to
analyze the files that we receive.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Joachim Schrod				Email: jschrod at acm.org
Roedermark, Germany




More information about the antlr-interest mailing list