[antlr-interest] Whitespace: More than meets the eye?
Graham Wideman
gwlist at grahamwideman.com
Thu Aug 6 01:52:06 PDT 2009
Hi Sam,
Thanks for your comments. More below on your questions:
>I'm curious as to why you want to sometimes consider whitespace, though.
>Is this a self-designed language, or a specification you're working from
>that makes whitespace 'sometimes' significant?
>
>You example was a function call or declaration. You can always get help
>from the lexer here if there are situations where there *must* be a
>space, and situations where there *mustn't* be a space, and nothing
>else... have tokens that include the lparen.
Yes, I am considering the least-messy way to tackle a few of these issues in PHP. (And the function example I gave was just a simple example, not a problem in PHP.)
One example that PHP has is the use of "$" as a prefix to identifiers, sometimes.
An ordinary variable:
$myvar = 'hello';
$othervar = $myvar;
Everywhere that such a variable appears, the dollar prefix is required, and no space is allowed. Now it's tempting to write the grammar as:
variableName
: Dollar Identifier ...
...
Identifier
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')*
This Identifier rule is good for all named things in PHP, but the parser rule would allow whitespace between $ and Identifier, which is not accepted by the actual PHP parser.
So, maybe it's better to stick the "$" at the beginning of the lexer rule for Identifier (call it DollarIdentifier or something).
But then you get to variables that are members of a class/object.
class C {
var $mymember = 'Hello';
...
}
$c = new C();
print $c->mymember;
Note how the declaration uses a $ prefix, but the usage does not (the only $ is on the object variable, not the id of the member variable). But I'm somewhat loath to handle the $ sometimes in lexer rules, and sometimes in parser rules, as this seems apt to confuse later. (Maybe not... I haven't assessed how messy it gets going down this path.)
I do indeed see ways to lex/parse this more strictly, I'm just exploring for the least messy way.
-- Graham
More information about the antlr-interest
mailing list