[antlr-interest] Q: move from v2 to v3 parser grammar. Rewrite tree rule

Justin Murray jmurray at aerotech.com
Wed Mar 23 10:26:15 PDT 2011


Jim,

I have a question regarding your comment on case insensitivity. I have 
been using the "slowest" case insensitive lexer technique, as this is 
the first I have seen a viable alternative (on the page that you linked 
to). The grammar I am working with is a bit strange in that all of the 
keywords in the language are case insensitive, but some rules, such as 
variable names, are case sensitive. My question is, how far reaching is 
the setUcaseLA() function (I am using the C target)? My variable name 
rule accepts both uppercase and lowercase letters, and when I do 
$tok.text->chars, I need to get the string in the original case that was 
entered. So long as that is unaffected, I will be happy to get rid of 
all of my "fragment A : ('A'|'a');" rules.

Thanks,

- Justin

On 3/22/2011 5:27 PM, Jim Idle wrote:
>> -----Original Message-----
>> From: Ruslan Zasukhin [mailto:ruslan_zasukhin at valentina-db.com]
>> Sent: Tuesday, March 22, 2011 2:21 PM
>
>>> However, using lower case literals in your parser directly is not a
>>> good idea.  Use real tokens so that you error messages are better
>> Simple example, please?
> Instead of:
>
> rule : 'join' somerule;
>
> Use:
>
> rule : JOIN somerule;
>
> // Lexer rule to match:
> //
> JOIN : 'join';
>
> And for case insensitivity I specify the token specs all in UPPPER rather
> than lower and then override the input stream as per:
>
> http://www.antlr.org/wiki/pages/viewpage.action?pageId=1782
>
> Although someone has added instructions for generating the slowest case
> insensitive lexers in the world with individual letter rules. Use the
> input stream override method in general.
>
>
>
>>
>>> and remember
>>> that SQL is generally case insensitive so you will need a [trivial]
>>> custom input stream.
>> Of course we do remember this :)
>>
>> And after grammar start to breath, we will yet work on
>> * case-insensitive of SQL text
>> * UTF-16 for input  -- clarify ..
>
> UTF-16 input encoding is just a matter of telling the Java input stream to
> open the file in that encoding.
>
> Jim
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list