[antlr-interest] Another parsing question

Loring Craymer lgcraymer at yahoo.com
Mon Aug 4 19:08:49 PDT 2008


Better is to use a sempred to look for whitespace on hidden channels.  That requires subclassing CommonTokenStream to get access to channel (or add a method to look for "next token" on a specific channel).

--Loring



----- Original Message ----
> From: John B. Brodie <jbb at acm.org>
> To: carter_cheng at yahoo.com
> Cc: antlr-interest at antlr.org
> Sent: Monday, August 4, 2008 5:38:53 PM
> Subject: Re: [antlr-interest] Another parsing question
> 
> Greetings!
> 
> Carter Cheng continued an ongoing thread by writting:
> >The difficulty is with the language I am working with in the first
> >case it should be two tokens ']' ')' but the second case it should be
> >one token '])' without intervening whitespace between the ']' and
> >')'.
> >
> >The only way I can see of solving this problem is to make white space
> >explicit in the grammar. I.e. litter my rules with whitespace tokens
> >and omit a whitespace token in the case when i expect a '])'. Is this
> >the correct way to do this with ANTLRv3?
> >
> 
> Perhaps making Whitespace significant to the parser is your only
> choice, but I am sure your grammar will get really ugly and probably
> be ambiguous requiring lots of expensive lookahead in predicates.
> 
> But...
> 
> Are the `([` `])` and `[` `]` (and maybe `(` `)` ) tokens properly
> nested in your language?
> 
> e.g. is `([` .. `[` .. `])` .. `]` legal (where the .. is some
> other legal construct)?
> 
> if not, ie these things do follow a proper (usual?) nested stucture,
> then I think you can keep a state within the lexer itself regarding
> how to interpret the `])` pair of characters.
> 
> so, depending on your answer to this nesting question, the rest of
> this message may be helpful or may be just a bunch of junk.
> (maybe it is just a bunch of junk always ;-)
> 
> under the requirement of proper nesting I believe you could create a
> stack of expected closing brackets inside the lexer.
> 
> when you lex a `[` you push a `]` on the to expect closing form stack.
> 
> when you lex a `([` you push a `])` on the to expect closing form stack.
> 
> when you lex a `(` you push a `)` on the to expect closing form stack.
> and possibly any other bracketing pair your language has ( `{` `}` ?).
> 
> and then when you encounter a `]` you can examine the top of the stack
> in order to decide whether or not a `)` immediately following that `]`
> should be treated as the `])` or not; and then, of course, pop the
> stack.
> 
> so I think the above sketch will work for sentences in your language
> that have correct syntax.
> 
> I am not so sure about how well the above will work for sentences that
> contain syntax errors.
> 
> if the user enters something like `([` .. `)` (ie. forgetting the `]`)
> then you can use what is on the stack to provide a better error
> message?
> 
> but if the user enters something similar to `(` .. `([` .. `]` .. `)`
> -- not sure the above will recover from these kinds of syntax
> errors. Might have to peek below the top of the stack to try to
> resolve, and having all the bracketing forms push/pop the stack may be
> necessary for that...
> 
> But, anyway, hope this may help lead to a proper solution.
>    -jbb



      



More information about the antlr-interest mailing list