[antlr-interest] Optional spaces question

Wed Jan 18 05:17:15 PST 2012

I am pretty new to ANTLR, doing a DSL language. I like ANTLR a lot, but I
am struggling with a problem regarding optional whitespaces. My problem is
that I need to distinguish between "f(x)" and "f  (x)" -- note the space
between "f" and "(x)" in the latter (I am putting whitespace on the hidden
channel, and I want to continue to do that). The former is a function call,
the latter something different.

I found a post on this list from 2007 ("Handling optional spaces") which
addresses the exact same question. One suggestion was to have the lexer
absorb the left parenthesis if there is no space in between:

ID : ('a'..'z') + ;
FUNCTION_CALL: ID '(' ;

Then the lexer would return "f(" as a FUNCTION_CALL-token if there is not
space in between. This works, but it is not too pretty and complicates
things elsewhere in my code. The other suggestion was to check the hidden
channel for whitespace-tokens by means of Java code (actually C# in my
case). But since I am not yet too familiar with the inner workings of
ANTLR, this scares me a bit.

So I was thinking of a third strategy: Have a simple preprocessor look
through the input file, and if a letter is directly followed by a left
parenthesis, put some special character in between. So the preprocessor
transforms "f(x)" into "f&(x)", where "&" is a (glue) character not used
elsewhere in the grammar. And afterwards, it would be much easier to
distinguish between "f&(x)" and "f  (x)" in ANTLR.

Is this question or strategy completely stupid for some reason?

Best regards, and thanks for all the good work on ANTLR,

-Thomas Thomsen