[antlr-interest] Q: move from v2 to v3 parser grammar. Rewrite tree rule

Jim Idle jimi at temporal-wave.com
Wed Mar 23 10:32:06 PDT 2011


Check the article as it should tell you this, but it is only the MATCH
that is done in uppercase, and the text for the tokens is taken directly
from the input stream. As you only have a single character to test and not
two for every character position, the lexer is smaller and should be
faster (though with the C target, the compiler can often compile away the
function calls).

The setUCaseLA method is just a convenience that installs an upper case
comparison method. Remember though that the built in one only handles
ASCII, so be careful with other character sets.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Justin Murray
> Sent: Wednesday, March 23, 2011 10:26 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Q: move from v2 to v3 parser grammar.
> Rewrite tree rule
>
> Jim,
>
> I have a question regarding your comment on case insensitivity. I have
> been using the "slowest" case insensitive lexer technique, as this is
> the first I have seen a viable alternative (on the page that you linked
> to). The grammar I am working with is a bit strange in that all of the
> keywords in the language are case insensitive, but some rules, such as
> variable names, are case sensitive. My question is, how far reaching is
> the setUcaseLA() function (I am using the C target)? My variable name
> rule accepts both uppercase and lowercase letters, and when I do
> $tok.text->chars, I need to get the string in the original case that
> was entered. So long as that is unaffected, I will be happy to get rid
> of all of my "fragment A : ('A'|'a');" rules.
>
> Thanks,
>
> - Justin
>
> On 3/22/2011 5:27 PM, Jim Idle wrote:
> >> -----Original Message-----
> >> From: Ruslan Zasukhin [mailto:ruslan_zasukhin at valentina-db.com]
> >> Sent: Tuesday, March 22, 2011 2:21 PM
> >
> >>> However, using lower case literals in your parser directly is not a
> >>> good idea.  Use real tokens so that you error messages are better
> >> Simple example, please?
> > Instead of:
> >
> > rule : 'join' somerule;
> >
> > Use:
> >
> > rule : JOIN somerule;
> >
> > // Lexer rule to match:
> > //
> > JOIN : 'join';
> >
> > And for case insensitivity I specify the token specs all in UPPPER
> > rather than lower and then override the input stream as per:
> >
> > http://www.antlr.org/wiki/pages/viewpage.action?pageId=1782
> >
> > Although someone has added instructions for generating the slowest
> > case insensitive lexers in the world with individual letter rules.
> Use
> > the input stream override method in general.
> >
> >
> >
> >>
> >>> and remember
> >>> that SQL is generally case insensitive so you will need a [trivial]
> >>> custom input stream.
> >> Of course we do remember this :)
> >>
> >> And after grammar start to breath, we will yet work on
> >> * case-insensitive of SQL text
> >> * UTF-16 for input  -- clarify ..
> >
> > UTF-16 input encoding is just a matter of telling the Java input
> > stream to open the file in that encoding.
> >
> > Jim
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list