[antlr-interest] Emitting (additional) imaginary tokens in the Ctarget
Wincent Colaiuta
win at wincent.com
Thu Jun 14 11:09:41 PDT 2007
El 14/6/2007, a las 16:15, Jim Idle escribió:
> However, I would be surprised if you actually did need to do this.
> I am
> not even sure that Ter did this on the Python example because it
> was the
> only way to deal with the stupid indent (I have not really looked at
> that problem), but what makes you think that you need to emit two
> tokens
> from a single rule rather than have two rules?
There are some elements of the grammar in which whitespace has
syntactic importance, like in Python. I think you're probably right
that this can be done without emitting multiple tokens. I just saw
the example in the book and thought that if I could emit a couple of
extra imaginary tokens (like the INDENT and DEDENT tokens in the
Python example then it might make some of the rules in the parser
simpler.
You know, it would make parsing things like:
# a Python-like language
def method
in this grammar methods have no "end" delimiter
As easy as parsing:
# a Ruby-like language
def method
this grammar's methods do have an "end" delimiter
end
But if this starts getting too messy then I'll forget about it and
just try to handle all the possibilities in the parser. One of the
things I find myself going back and forth with is keeping the balance
of complexity between the parser and lexer appropriate: I try to keep
the parser simple and the lexer starts getting too complicated, so I
wipe the slate clean and start with a real simple lexer and then
parser gets out of control... Or I start with both really simple and
try to incremental add the ability to parse more and more input and
the ambiguity warnings and non-LL(*) errors start to mount and I hit
a road block... Will dominate this beast eventually, I hope!
Cheers,
Wincent
More information about the antlr-interest
mailing list