[antlr-interest] Whitespace: More than meets the eye?

Graham Wideman gwlist at grahamwideman.com
Thu Aug 6 01:52:06 PDT 2009


Hi Sam,

Thanks for your comments. More below on your questions:

>I'm curious as to why you want to sometimes consider whitespace, though. 
>Is this a self-designed language, or a specification you're working from 
>that makes whitespace 'sometimes' significant?
>
>You example was a function call or declaration. You can always get help 
>from the lexer here if there are situations where there *must* be a 
>space, and situations where there *mustn't* be a space, and nothing 
>else... have tokens that include the lparen.

Yes, I am considering the least-messy way to tackle a few of these issues in PHP. (And the function example I gave was just a simple example, not a problem in PHP.)

One example that PHP has is the use of "$" as a prefix to identifiers, sometimes.

An ordinary variable:

    $myvar    = 'hello';
    $othervar = $myvar;

Everywhere that such a variable appears, the dollar prefix is required, and no space is allowed. Now it's tempting to write the grammar as:

variableName 
    : Dollar Identifier ...
...
Identifier
    : ('a'..'z' | 'A'..'Z' | '_')  ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')*

This Identifier rule is good for all named things in PHP, but the parser rule would allow whitespace between $ and Identifier, which is not accepted by the actual PHP parser.  

So, maybe it's better to stick the "$" at the beginning of the lexer rule for Identifier (call it DollarIdentifier or something).

But then you get to variables that are members of a class/object. 

    class C {
        var $mymember = 'Hello';
        ...
    }
    $c = new C();
    print $c->mymember;

Note how the declaration uses a $ prefix, but the usage does not (the only $ is on the object variable, not the id of the member variable).  But I'm somewhat loath to handle the $ sometimes in lexer rules, and sometimes in parser rules, as this seems apt to confuse later. (Maybe not... I haven't assessed how messy it gets going down this path.)

I do indeed see ways to lex/parse this more strictly, I'm just exploring for the least messy way.

-- Graham








More information about the antlr-interest mailing list