[antlr-interest] Optional spaces question

Wed Jan 18 06:08:32 PST 2012

On Wed, Jan 18, 2012 at 8:17 AM, Thomas Thomsen <thomas at t-t.dk> wrote:

> I am pretty new to ANTLR, doing a DSL language. I like ANTLR a lot, but I
> am struggling with a problem regarding optional whitespaces. My problem is
> that I need to distinguish between "f(x)" and "f  (x)" -- note the space
> between "f" and "(x)" in the latter (I am putting whitespace on the hidden
> channel, and I want to continue to do that). The former is a function call,
> the latter something different.
>
> I found a post on this list from 2007 ("Handling optional spaces") which
> addresses the exact same question. One suggestion was to have the lexer
> absorb the left parenthesis if there is no space in between:
>
> ID : ('a'..'z') + ;
> FUNCTION_CALL: ID '(' ;
>
> Then the lexer would return "f(" as a FUNCTION_CALL-token if there is not
> space in between. This works, but it is not too pretty and complicates
> things elsewhere in my code. The other suggestion was to check the hidden
> channel for whitespace-tokens by means of Java code (actually C# in my
> case). But since I am not yet too familiar with the inner workings of
> ANTLR, this scares me a bit.
>
> So I was thinking of a third strategy: Have a simple preprocessor look
> through the input file, and if a letter is directly followed by a left
> parenthesis, put some special character in between. So the preprocessor
> transforms "f(x)" into "f&(x)", where "&" is a (glue) character not used
> elsewhere in the grammar. And afterwards, it would be much easier to
> distinguish between "f&(x)" and "f  (x)" in ANTLR.
>
> Is this question or strategy completely stupid for some reason?
>

Personally, I think avoiding the inner workings of ANTLR because it is
scary is a bad trait to pick up.

When I started using ANTLR I spent lots of hours learning how it worked by
using the debugger. While I am not an expert at everything ANTLR, I don't
fear it.

One thing I have learned is that while the lexer and parser are probably
capable of determining if an input is acceptable, that doesn't mean that
the lexer and parser should do all of the work of accepting the input.

If you think of accepting an input as
1. Use the lexer to convert the input to tokens.
2. Use the parser to accept unambiguous input.
3. Use tree manipulation to validate and accept valid input.
then you can let the parser pass input that may not be valid but that is
unambiguous onto the next step and sort out the meaning and validity there.

For me, once the input is converted to a tree, it is easier to analyze and
manipulate because you can
1. search backward and forward
2. change the structure of the branches
3. change the info in the nodes
4. add and remove nodes and branches

Hope this sheds some light on the problem.

Eric

>
> Best regards, and thanks for all the good work on ANTLR,
>
> -Thomas Thomsen
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>