[antlr-interest] Whitespace: More than meets the eye?

Wed Aug 5 19:00:07 PDT 2009

Hi Loring

Thanks for your comments. So when you say that "ANTLR makes no assumptions", I take that to mean ANTLR makes these assumptions:

a) whitespace is permitted between any tokens.

b) whitespace can be omitted between tokens so long as the beginning of the subsequent string doesn't make the end of the previous token anbiguous.

Is this correct?  

Further, is it correct to assume that the "WhiteSpace" lexer token name is special -- ie: recognized by ANTLR as the special pattern to be matched between tokens (and discarded by virtue of the hidden channel specifier)? (Except elsewhere I see "WS" used.)

Thanks,

Graham

At 8/5/2009 05:41 PM, Loring Craymer wrote:
>ANTLR makes no assumptions about whitespace; ANTLR lexers just slice up a 
>stream (array) of characters into tokens according to a lexer grammar.  For 
>most programming language grammars, whitespace can be ignored (and lexer 
>grammars tend to put whitespace tokens on a "hidden" channel) since it 
>basically serves to separate meaningful tokens.
>
>If you want to require or forbid whitespace in parser rules, you either have 
>to not ignore whitespace or insert semantic predicates to look for 
>whitespace or its absence.  This gets messy fast, so modern languages avoid 
>the old FORTRAN "interesting" lexical behavior, and you will not find much 
>expertise in dealing with really outre lexer problems here--it is unusual to 
>find a need to solve such problems.
>
>--Loring