[antlr-interest] Optional spaces question

Eric researcher0x00 at gmail.com
Wed Jan 18 06:48:28 PST 2012


On Wed, Jan 18, 2012 at 9:08 AM, Eric <researcher0x00 at gmail.com> wrote:

>
>
> On Wed, Jan 18, 2012 at 8:17 AM, Thomas Thomsen <thomas at t-t.dk> wrote:
>
>> I am pretty new to ANTLR, doing a DSL language. I like ANTLR a lot, but I
>> am struggling with a problem regarding optional whitespaces. My problem is
>> that I need to distinguish between "f(x)" and "f  (x)" -- note the space
>> between "f" and "(x)" in the latter (I am putting whitespace on the hidden
>> channel, and I want to continue to do that). The former is a function
>> call,
>> the latter something different.
>>
>> I found a post on this list from 2007 ("Handling optional spaces") which
>> addresses the exact same question. One suggestion was to have the lexer
>> absorb the left parenthesis if there is no space in between:
>>
>> ID : ('a'..'z') + ;
>> FUNCTION_CALL: ID '(' ;
>>
>> Then the lexer would return "f(" as a FUNCTION_CALL-token if there is not
>> space in between. This works, but it is not too pretty and complicates
>> things elsewhere in my code. The other suggestion was to check the hidden
>> channel for whitespace-tokens by means of Java code (actually C# in my
>> case). But since I am not yet too familiar with the inner workings of
>> ANTLR, this scares me a bit.
>>
>> So I was thinking of a third strategy: Have a simple preprocessor look
>> through the input file, and if a letter is directly followed by a left
>> parenthesis, put some special character in between. So the preprocessor
>> transforms "f(x)" into "f&(x)", where "&" is a (glue) character not used
>> elsewhere in the grammar. And afterwards, it would be much easier to
>> distinguish between "f&(x)" and "f  (x)" in ANTLR.
>>
>> Is this question or strategy completely stupid for some reason?
>>
>
> Personally, I think avoiding the inner workings of ANTLR because it is
> scary is a bad trait to pick up.
>
> When I started using ANTLR I spent lots of hours learning how it worked by
> using the debugger. While I am not an expert at everything ANTLR, I don't
> fear it.
>
> One thing I have learned is that while the lexer and parser are probably
> capable of determining if an input is acceptable, that doesn't mean that
> the lexer and parser should do all of the work of accepting the input.
>
> If you think of accepting an input as
> 1. Use the lexer to convert the input to tokens.
> 2. Use the parser to accept unambiguous input.
> 3. Use tree manipulation to validate and accept valid input.
> then you can let the parser pass input that may not be valid but that is
> unambiguous onto the next step and sort out the meaning and validity there.
>
> For me, once the input is converted to a tree, it is easier to analyze and
> manipulate because you can
> 1. search backward and forward
> 2. change the structure of the branches
> 3. change the info in the nodes
> 4. add and remove nodes and branches
>
> Hope this sheds some light on the problem.
>
> Eric
>
>

Another option, though I don't use it, would be looking into using the
stream rewrite API, you should be able to pick up the tokens from the lexer
with the space not on the hidden channel, then when you see the pattern ID
SPACE RIGHT_PAREN, you could rewrite it to SOMETHING_DIFFERENT, before
passing onto the parser. If you don't want the parser to see a SPACE token,
you could also use the stream rewrite to remove them.

Additionally,

Once the tree is available after the parser, one can create tables, cross
references and other data structures to assist in the final goal, there is
no requirement limiting one to using only the tree.

One way to make a grammar easier to write is to make the rules less
stringent. If you think of a input value as a dog, but don't know how to
define a dog using grammar rules, try creating a rule for animals and then
sort out of if the animal is a dog once you have the tree.

Or in your case, I would avoid putting the space onto the hidden channel
and pass the space all the way back to the tree and then sort it out there.

A third option might be to try using Syntactic Predicates, but again I
suspect that you will have to pass the SPACE to the parser, which requires
parser rules deal with spaces everywhere.

Eric



>
>
>

>
>
>
>
>
>>
>> Best regards, and thanks for all the good work on ANTLR,
>>
>> -Thomas Thomsen
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
>


More information about the antlr-interest mailing list