[antlr-interest] Emitting (additional) imaginary tokens in the Ctarget

Thu Jun 14 11:09:41 PDT 2007

El 14/6/2007, a las 16:15, Jim Idle escribió:

> However, I would be surprised if you actually did need to do this.  
> I am
> not even sure that Ter did this on the Python example because it  
> was the
> only way to deal with the stupid indent (I have not really looked at
> that problem), but what makes you think that you need to emit two  
> tokens
> from a single rule rather than have two rules?

There are some elements of the grammar in which whitespace has  
syntactic importance, like in Python. I think you're probably right  
that this can be done without emitting multiple tokens. I just saw  
the example in the book and thought that if I could emit a couple of  
extra imaginary tokens (like the INDENT and DEDENT tokens in the  
Python example then it might make some of the rules in the parser  
simpler.

You know, it would make parsing things like:

   # a Python-like language
   def method
      in this grammar methods have no "end" delimiter

As easy as parsing:

   # a Ruby-like language
   def method
     this grammar's methods do have an "end" delimiter
   end

But if this starts getting too messy then I'll forget about it and  
just try to handle all the possibilities in the parser. One of the  
things I find myself going back and forth with is keeping the balance  
of complexity between the parser and lexer appropriate: I try to keep  
the parser simple and the lexer starts getting too complicated, so I  
wipe the slate clean and start with a real simple lexer and then  
parser gets out of control... Or I start with both really simple and  
try to incremental add the ability to parse more and more input and  
the ambiguity warnings and non-LL(*) errors start to mount and I hit  
a road block... Will dominate this beast eventually, I hope!

Cheers,
Wincent