[antlr-interest] Whitespace: More than meets the eye?

Wed Aug 5 22:59:39 PDT 2009

Graham Wideman wrote:
> Ah-hah -- OK, time for slap on the forehead. (Mine! It must be the >100 
> degree weather here.)
> 
> Thanks for your answers. Yes, of COURSE it works as you say. Somehow, 
> after not really worrying about how the lexer works, my brain got stuck 
> thinking that lexer and parser work more analogously than they actually do.
> 
> Whereas at any juncture the parser only tries certain rules as predicted 
> by the grammar and the current state, the lexer effectively "tries all 
> its rules" every time it's starting to discern the next token.
> 
> So in that process, if the next characters match a rule that discards 
> the characters, (a la whitespace), then that pattern functions as an 
> optional separator.
> 
> And I also see that in order to do anything with whitespace at the 
> parser level, either whitespace has to not be discarded (in which case 
> many parser rules will have to deal with it) or custom code will need to 
> be included in the relevant rules to look at the hidden channel etc.

Don't forget that 'whitespace' is arbitrary - you could consider spaces 
to be whietspace, but not, say, tabs or newlines. I believe there are 
languages where this is the case - spaces are never significant, but 
some other types of whitespace are.

I'm curious as to why you want to sometimes consider whitespace, though. 
Is this a self-designed language, or a specification you're working from 
that makes whitespace 'sometimes' significant?

You example was a function call or declaration. You can always get help 
from the lexer here if there are situations where there *must* be a 
space, and situations where there *mustn't* be a space, and nothing 
else... have tokens that include the lparen.

-- 
Sam Barnett-Cormack