[antlr-interest] Context-sensitive lexer

Fri Jun 17 14:20:34 PDT 2011

Hi Jonas,

On Fri, Jun 17, 2011 at 11:09 PM, Jonas <jonas.hagmar at gmail.com> wrote:

> Hi Bart,
>
> Thank you for the excellent input on the problem. I hope your approach
> can be adapted to overcome all the difficulties coming from the
> context sensitivity of the file format I have to deal with. For
> example, the title text can be any character sequence, leading to a
> definition of your WORD token that I fear might clash with patterns
> needed to pick out identifiers in, e.g., algebraic expressions later
> in the file. Moreover, the whitespace in the title text is actually
> significant. If the title text is "foo$3        bar__!" (without the
> quotes), that is exactly what the user expects to see when using the
> program reading the file. In other places, whitespace acts like a list
> separator, and in some places it should just be ignored. With your
> approach, wouldn't that mean that I have to include the whitespace in
> all relevant parser rules, even when it should be ignored?

I'm not sure what you all mean by all that, sorry. My post was more meant to
emphasize my point of _not_ doing so much inside the lexer.

Perhaps you'd like to post a more detailed explanation of the language
you're trying to parse?

Regards,

Bart.