[antlr-interest] Changing/affecting the Lexer from the Parser?

Juancarlo Añez apalala at gmail.com
Sat Nov 10 12:48:57 PST 2012


Bernard,

That is correct.

I know that the solution is what you describe for Ruby. I wanted to know if
someone had done the likes with ANTLR.

-- Juancarlo


On Sat, Nov 10, 2012 at 12:19 PM, Bernard Kaiflin
<bkaiflin.ruby at gmail.com>wrote:

> Yes, only the Ruby parser (the one I wrote by hand) knows if it is in the
> middle of an expression and if the / is a division. If it is expecting an
> atom, it knows that the / starts a regexp and can ask the lexer to rewind
> and recreate a token with the whole regexp.
>
> If I well understand, you have a grammar
>
> array     : 'ARR' '(' index '.' dimension ')' ;
> dimension : start ':' stop ;
>
> (index, start and stop are probably replaced by INT, by I give them names
> for clarity). As the file is tokenized in advance, the lexer has created
>
> ARR or ID
> LPAR
> FLOAT
> COLON
> INT
> RPAR
>
> instead of
>
> ARR or ID
> LPAR
> INT
> DOT
> INT
> COLON
> INT
> RPAR
>
> And now the token stream mismatches the grammar. Before going further,
> please tell me if it's correct.
>
>
> 2012/11/10 Juancarlo Añez <apalala at gmail.com>
>
>> Bernard,
>>
>> On Sat, Nov 10, 2012 at 10:48 AM, Bernard Kaiflin
>> <bkaiflin.ruby at gmail.com>wrote:
>>
>> > I still don't see the relationship between 2 ARR(1:5) ARR(1.2:4)
>> ARR(1.#I:#J)
>> > and a Python CommonTokenStream. Is it a special version of Natural ? Do
>> > you have the specifications for this language ?
>> >
>>
>> With the existing CommonTokenStream, the 1.2 in ARR(1.2:4) has been lexed
>> as a float before the parser started, which is way before the parser gets
>> to the expression. The Python CommonTokenStream bootstraps itself by
>> tokenizing all input on the first call to any of the methods that return a
>> token.
>>
>> I built the grammar for Natural from the reference material, which
>> includes
>> sort-of grammar descriptions.
>>
>> I think that a language like Ruby requires a parser-guided lexer. I've
>> built some of those by hand before, and they are very efficient. But
>> Natural's grammar was too big (~3000 lines) to try to approach it by hand.
>>
>> Cheers,
>>
>> --
>>
>> Juancarlo *Añez*
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
>


-- 
Juancarlo *Añez*


More information about the antlr-interest mailing list