[antlr-interest] A proposal for keywords

shmuel siegel antlr at shmuelhome.mine.nu
Wed May 24 03:36:13 PDT 2006


I have a problem with the simple reading of your suggestion. My lexer 
rule for STRING strips off quotes because I don't need or want them in 
my later processing. Therefore the textual test for LITERAL_0 will  not 
only match the token 0 but it will also match the token "0". I don't 
want it to match the token "0". I therefore wanted the matching rule for 
LITERAL_0 to take into account that the token type is INTEGER.

This is not a hypothetical scenario. I have had the problem of STRINGs 
being interpreted as keyword. I have added the following code to my 
2.7.6 lexer.
    /**
     * Don't want to override strings
     * The strings were meant to stay strings
     */
    public int testLiteralsTable(int ttype)
    {
        if(ttype == STRING)
            return ttype;
        else
            return super.testLiteralsTable(ttype);
    }

Shmuel


Loring Craymer wrote:
> The example fits, although the mechanism does quite not match your
> description.  The lexer would return a token for 0 with type INTEGER; the
> parser would try to match "0", logically as type LITERAL_0; since the text
> matches, the token would be matched as a LITERAL_0 and the type field
> changed.
>
> --Loring
>
>
>
>   
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of shmuel siegel
>> Sent: Tuesday, May 23, 2006 3:50 PM
>> To: antlr-interest at antlr.org
>> Subject: Re: [antlr-interest] A proposal for keywords
>>
>> Loring Craymer wrote:
>>     
>>> ...
>>> For option 2, literal types should be bound in the parser:  that is, the
>>> lexer binds the generic type to the token (TEXT or NUMBER, for example)
>>> and the dynamically looks up the next token in the literals table
>>> whenever attempting to match a literal.  That is, "if" would be first
>>> typed as TEXT but matched (and retyped) as LITERAL_if when matching an
>>> occurrence of "if" in the parser.
>>>
>>>       
>> Why are you limiting yourself to this type of situation. Why don't we
>> expand the concept. Why not let the lexer return a set of token types
>> that match the TEXT. The parser rule would have to resolve the
>> ambiguity. The grammar rule would be considered ambiguous if the parser
>> would accept two different types that had the same TEXT.
>>
>> Let's say that I have a positional parameter in a function call that can
>> take on the number zero or an empty string (I actually have such a
>> grammar). It would be nice to be able to specify this explicitly
>> (without predicates) even though "zero" is also an "integer" and "empty
>> string" is also a "string".
>>
>> Or is this what you are saying?
>>
>> Shmuel
>>     


More information about the antlr-interest mailing list