[antlr-interest] Another parsing question

Mon Aug 4 17:38:53 PDT 2008

Greetings!

Carter Cheng continued an ongoing thread by writting:
>The difficulty is with the language I am working with in the first
>case it should be two tokens ']' ')' but the second case it should be
>one token '])' without intervening whitespace between the ']' and
>')'.
>
>The only way I can see of solving this problem is to make white space
>explicit in the grammar. I.e. litter my rules with whitespace tokens
>and omit a whitespace token in the case when i expect a '])'. Is this
>the correct way to do this with ANTLRv3?
>

Perhaps making Whitespace significant to the parser is your only
choice, but I am sure your grammar will get really ugly and probably
be ambiguous requiring lots of expensive lookahead in predicates.

But...

Are the `([` `])` and `[` `]` (and maybe `(` `)` ) tokens properly
nested in your language?

e.g. is `([` .. `[` .. `])` .. `]` legal (where the .. is some
other legal construct)?

if not, ie these things do follow a proper (usual?) nested stucture,
then I think you can keep a state within the lexer itself regarding
how to interpret the `])` pair of characters.

so, depending on your answer to this nesting question, the rest of
this message may be helpful or may be just a bunch of junk.
(maybe it is just a bunch of junk always ;-)

under the requirement of proper nesting I believe you could create a
stack of expected closing brackets inside the lexer.

when you lex a `[` you push a `]` on the to expect closing form stack.

when you lex a `([` you push a `])` on the to expect closing form stack.

when you lex a `(` you push a `)` on the to expect closing form stack.
and possibly any other bracketing pair your language has ( `{` `}` ?).

and then when you encounter a `]` you can examine the top of the stack
in order to decide whether or not a `)` immediately following that `]`
should be treated as the `])` or not; and then, of course, pop the
stack.

so I think the above sketch will work for sentences in your language
that have correct syntax.

I am not so sure about how well the above will work for sentences that
contain syntax errors.

if the user enters something like `([` .. `)` (ie. forgetting the `]`)
then you can use what is on the stack to provide a better error
message?

but if the user enters something similar to `(` .. `([` .. `]` .. `)`
-- not sure the above will recover from these kinds of syntax
errors. Might have to peek below the top of the stack to try to
resolve, and having all the bracketing forms push/pop the stack may be
necessary for that...

But, anyway, hope this may help lead to a proper solution.
   -jbb