[antlr-interest] parsing a mix of structured and free-form text
milan at milica.com.au
Tue Jun 20 01:43:59 PDT 2006
I am trying to use ANTLR to parse files that are mostly structured
(being generated by another program), but here and there have some text
that's nearly free form, as it is basically user-input. The place where
this unstructured text appears within the structured varies, but is
defined by the grammar.
Ideally, what I would like to do, at the point in parsing when I know
that unstructured text follows, is to simply read enough characters
(these are fix-width fields, so I know how many I need to read), so that
parsing of the structured text can continue.
The problem here are look-ahead tokens, where Lexer goes a bit ahead of
the parser and chews up input characters in advance.
The places where this unstructured text appears are such that there's no
need to use look-ahead tokens to decide which grammar rule to apply.
I used ANTLR for some simpler things. I also used Bison and Flex before
and used Flex states to control grabbing characters when places with
unstructured text are approached. But I'm not that familiar with ANTLR
to know how to do it, or whether it's possible at all.
I don't know if it would be possible to get the text of look-ahead
tokens, discard them, and force lexer to continue from a different
position in the input stream.
Any help/hints/ideas are very much welcome.
More information about the antlr-interest