[antlr-interest] Whitespace question

Graham Wideman gwlist at grahamwideman.com
Fri Oct 9 12:47:03 PDT 2009


At 10/9/2009 12:35 PM, Reid Rivenburgh wrote:
>Indhu Bharathi wrote:
>> Try something like
>> 
>> r     :     lbl=searchTerm ({spaceFollows($lbl.stop)}?=> lbl=searchTerm)*
>>       ;
>> 
>> @members {
>>       public boolean spaceFollows(Token tkn) {
>>             return input.get(tkn.getTokenIndex()+1).getType()==WS;
>>       }
>> }
>
>Interesting, thanks for the tip.  Since you offered a fix, then, I 
>assume this isn't a situation that indicates a fundamental design flaw? 

What Indu provided was not a "fix" for a "fundamental design flaw" per se, it was one way to handle a tricky language problem.

You are free to *not* send whitespace to an invisible channel -- then it will appear in the stream of tokens that you can make rules about. Your grammar can then specify explicitly where whitespace is required or permitted.  But in most languages that means that you have to handle whitespace in almost every rule, which is tedious.  Throwing the whitespace away is usually the desirable strategy, and greatly simplifies the grammar.

But that means that you end up needing to "special case" the places where your grammar *does* require whitespace.  There are several alternatives, including, for example, checking that the beginning of the subsequent token is two or more character positions after the end of your number token.

But in fact, that may not be sufficient either. For example you may want to permit a closing parenthesis to directly follow your numbers, and thus not required whitespace in that case.

So it's a matter of clarifying some of the fiddly details allowed/disallowed in your language, and deciding whether it's worth dealing with whitespace using grammar, or generally ignoring the whitespace and adding code (ACTIONS) where needed to handle a special case.

-- Graham



More information about the antlr-interest mailing list