[antlr-interest] A proposal for keywords

Wed May 24 13:11:06 PDT 2006

Shmuel--

This is an implementation problem.  To specifically look for a token with
text "0" and type NUMBER would require running an interpreted version of the
lexer over keyword tokens in the parser grammar during ANTLR analysis.  You
can "manually" differentiate and identify already (ANTLR 3, not ANTLR 2)
with a semantic predicate.  As long as there is machinery for fine-tuning, I
would rather see a simple mechanism that handles most of the cases than
request significant extra machinery.  Ter has enough to do already.

--Loring

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of shmuel siegel
> Sent: Wednesday, May 24, 2006 3:36 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] A proposal for keywords
> 
> I have a problem with the simple reading of your suggestion. My lexer
> rule for STRING strips off quotes because I don't need or want them in
> my later processing. Therefore the textual test for LITERAL_0 will  not
> only match the token 0 but it will also match the token "0". I don't
> want it to match the token "0". I therefore wanted the matching rule for
> LITERAL_0 to take into account that the token type is INTEGER.
> 
> This is not a hypothetical scenario. I have had the problem of STRINGs
> being interpreted as keyword. I have added the following code to my
> 2.7.6 lexer.
>     /**
>      * Don't want to override strings
>      * The strings were meant to stay strings
>      */
>     public int testLiteralsTable(int ttype)
>     {
>         if(ttype == STRING)
>             return ttype;
>         else
>             return super.testLiteralsTable(ttype);
>     }
> 
> Shmuel
> 
> 
> Loring Craymer wrote:
> > The example fits, although the mechanism does quite not match your
> > description.  The lexer would return a token for 0 with type INTEGER;
> the
> > parser would try to match "0", logically as type LITERAL_0; since the
> text
> > matches, the token would be matched as a LITERAL_0 and the type field
> > changed.
> >
> > --Loring
> >
> >
> >
> >
> >> -----Original Message-----
> >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> >> bounces at antlr.org] On Behalf Of shmuel siegel
> >> Sent: Tuesday, May 23, 2006 3:50 PM
> >> To: antlr-interest at antlr.org
> >> Subject: Re: [antlr-interest] A proposal for keywords
> >>
> >> Loring Craymer wrote:
> >>
> >>> ...
> >>> For option 2, literal types should be bound in the parser:  that is,
> the
> >>> lexer binds the generic type to the token (TEXT or NUMBER, for
> example)
> >>> and the dynamically looks up the next token in the literals table
> >>> whenever attempting to match a literal.  That is, "if" would be first
> >>> typed as TEXT but matched (and retyped) as LITERAL_if when matching an
> >>> occurrence of "if" in the parser.
> >>>
> >>>
> >> Why are you limiting yourself to this type of situation. Why don't we
> >> expand the concept. Why not let the lexer return a set of token types
> >> that match the TEXT. The parser rule would have to resolve the
> >> ambiguity. The grammar rule would be considered ambiguous if the parser
> >> would accept two different types that had the same TEXT.
> >>
> >> Let's say that I have a positional parameter in a function call that
> can
> >> take on the number zero or an empty string (I actually have such a
> >> grammar). It would be nice to be able to specify this explicitly
> >> (without predicates) even though "zero" is also an "integer" and "empty
> >> string" is also a "string".
> >>
> >> Or is this what you are saying?
> >>
> >> Shmuel
> >>