[antlr-interest] parsing a mix of structured and free-form text

Tue Jun 27 06:56:57 PDT 2006

Hi!

Your problem sounds like the standard problem you have with comments.
You simply want to skip them. Usually you do this using the lexer.

Isn't that possible? Do you need the parser stage to really determine
which parts you can ignore and which not? If so are you really sure
this is the case?

Oliver

2006/6/20, Milan Durovic <milan at milica.com.au>:
> Hi,
>
> I am trying to use ANTLR to parse files that are mostly structured
> (being generated by another program), but here and there have some text
> that's nearly free form, as it is basically user-input. The place where
> this unstructured text appears within the structured varies, but is
> defined by the grammar.
>
> Ideally, what I would like to do, at the point in parsing when I know
> that unstructured text follows, is to simply read enough characters
> (these are fix-width fields, so I know how many I need to read), so that
> parsing of the structured text can continue.
>
> The problem here are look-ahead tokens, where Lexer goes a bit ahead of
> the parser and chews up input characters in advance.
>
> The places where this unstructured text appears are such that there's no
> need to use look-ahead tokens to decide which grammar rule to apply.
>
> I used ANTLR for some simpler things. I also used Bison and Flex before
> and used Flex states to control grabbing characters when places with
> unstructured text are approached. But I'm not that familiar with ANTLR
> to know how to do it, or whether it's possible at all.
>
> I don't know if it would be possible to get the text of look-ahead
> tokens, discard them, and force lexer to continue from a different
> position in the input stream.
>
> Any help/hints/ideas are very much welcome.
>
> Milan
>