[antlr-interest] parsing a mix of structured and free-form text

Thu Jun 22 07:39:06 PDT 2006

It may not be what you desire, but I'd use flex for scanner, and ANTLR
for everything else; this allows use of scanner states as necessary...

For ANTLR v2, lex is also much faster...

On 6/20/06, Milan Durovic <milan at milica.com.au> wrote:
> Hi,
>
> I am trying to use ANTLR to parse files that are mostly structured
> (being generated by another program), but here and there have some text
> that's nearly free form, as it is basically user-input. The place where
> this unstructured text appears within the structured varies, but is
> defined by the grammar.
>
> Ideally, what I would like to do, at the point in parsing when I know
> that unstructured text follows, is to simply read enough characters
> (these are fix-width fields, so I know how many I need to read), so that
> parsing of the structured text can continue.
>
> The problem here are look-ahead tokens, where Lexer goes a bit ahead of
> the parser and chews up input characters in advance.
>
> The places where this unstructured text appears are such that there's no
> need to use look-ahead tokens to decide which grammar rule to apply.
>
> I used ANTLR for some simpler things. I also used Bison and Flex before
> and used Flex states to control grabbing characters when places with
> unstructured text are approached. But I'm not that familiar with ANTLR
> to know how to do it, or whether it's possible at all.
>
> I don't know if it would be possible to get the text of look-ahead
> tokens, discard them, and force lexer to continue from a different
> position in the input stream.
>
> Any help/hints/ideas are very much welcome.
>
> Milan
>