[antlr-interest] Changing/affecting the Lexer from the Parser?

Bernard Kaiflin bkaiflin.ruby at gmail.com
Sat Nov 10 08:49:54 PST 2012


Yes, only the Ruby parser (the one I wrote by hand) knows if it is in the
middle of an expression and if the / is a division. If it is expecting an
atom, it knows that the / starts a regexp and can ask the lexer to rewind
and recreate a token with the whole regexp.

If I well understand, you have a grammar

array     : 'ARR' '(' index '.' dimension ')' ;
dimension : start ':' stop ;

(index, start and stop are probably replaced by INT, by I give them names
for clarity). As the file is tokenized in advance, the lexer has created

ARR or ID
LPAR
FLOAT
COLON
INT
RPAR

instead of

ARR or ID
LPAR
INT
DOT
INT
COLON
INT
RPAR

And now the token stream mismatches the grammar. Before going further,
please tell me if it's correct.


2012/11/10 Juancarlo Añez <apalala at gmail.com>

> Bernard,
>
> On Sat, Nov 10, 2012 at 10:48 AM, Bernard Kaiflin
> <bkaiflin.ruby at gmail.com>wrote:
>
> > I still don't see the relationship between 2 ARR(1:5) ARR(1.2:4)
> ARR(1.#I:#J)
> > and a Python CommonTokenStream. Is it a special version of Natural ? Do
> > you have the specifications for this language ?
> >
>
> With the existing CommonTokenStream, the 1.2 in ARR(1.2:4) has been lexed
> as a float before the parser started, which is way before the parser gets
> to the expression. The Python CommonTokenStream bootstraps itself by
> tokenizing all input on the first call to any of the methods that return a
> token.
>
> I built the grammar for Natural from the reference material, which includes
> sort-of grammar descriptions.
>
> I think that a language like Ruby requires a parser-guided lexer. I've
> built some of those by hand before, and they are very efficient. But
> Natural's grammar was too big (~3000 lines) to try to approach it by hand.
>
> Cheers,
>
> --
> Juancarlo *Añez*
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list