[antlr-interest] New Guy Question...

Jim Idle jimi at temporal-wave.com
Thu Jun 9 11:50:15 PDT 2011


The article describes a general method, not a universal solution. If you
have a language where the such semantics apply, you will need a specific
solution. In general these semantics are ignored for programming languages
though, so this is somewhat pedantic.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of William Clodius
> Sent: Wednesday, June 08, 2011 10:44 PM
> To: antlr-interest interest
> Subject: Re: [antlr-interest] New Guy Question...
>
> Note that matching in terms of UPPER case is generally a bad idea.
> There are languages with characters that do not appear at the start of
> words. As upper case has come to be primarily used to indicate the
> start of words in selective contexts, such characters need not have a
> proper mapping to upper case. The German ß is the best known such
> character in languages with latin based character sets, but it is not
> the only such example. However if a language has a notion of case,
> there is always a mapping to lower case and for simple case folding
> that is to be preferred.
>
> In many ways the problem of dealing with case is similar to the problem
> of dealing with normalization, where the same character can be
> represented by more than one combination of code points. As part of its
> process of dealing with normalization, for programming languages the
> UNICODE consortium recommended a couple of straightforward means of
> dealing identifier uniqueness.These are covered in "Unicode Standard
> Annex #31, Unicode Identifier and Pattern Syntax"
> http://www.unicode.org/reports/tr31/
> These have a straightforward implementation in terms of the UNICODE
> character property tables, and it is a small matter of programming to
> implement their lexical classes for identifiers.
>
> On Jun 6, 2011, at 4:56 PM, Jim Idle wrote:
>
> > No, that is not correct, please look at the WIKI article. The input
> > stream merely MATCHES in upper case, it does NOT change the input
> > stream itself, hence both the keywords and anything else are case
> > preserved when you ask for their text; that is the whole point of
> > doing it that way. Then you specify the tokens in the lexer using
> > upper case only and it has the side effect of simplifying the lexer
> > rules as well as not creating a method call to match every letter of
> > every keyword (which is a bad idea even with JIT inlining).
> >
> > Jim
> >
> >> -----Original Message-----
> >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> >> bounces at antlr.org] On Behalf Of Douglas Godfrey
> >> Sent: Monday, June 06, 2011 12:41 PM
> >> To: Marco Hunsicker
> >> Cc: antlr-interest at antlr.org
> >> Subject: Re: [antlr-interest] New Guy Question...
> >>
> >> When you implement case insensitive keywords, you may still want
> case
> >> sensitive identifiers.
> >> If the input stream does case folding, you can't use case sensitive
> >> identifiers.
> >>
> >> On Sun, Jun 5, 2011 at 5:58 PM, Marco Hunsicker <devel at hunsicker.de>
> >> wrote:
> >>
> >>>> You have to handle case insensitivity the hard way:
> >>>>
> >>>> fragment A
> >>>>     :    'A' | 'a';
> >>>>
> >>>> [...]
> >>>
> >>> I don't think it's a necessity to do it this way. Actually, I think
> >> it
> >>> would be better using a specialized input stream that does any
> >>> necessary transformation. Your mileage may vary ;)
> >>>
> >>> Cheers,
> >>>
> >>> Marco
> >>>
> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>> Unsubscribe:
> >>> http://www.antlr.org/mailman/options/antlr-interest/your-email-
> >> address
> >>>
> >>
> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe:
> >> http://www.antlr.org/mailman/options/antlr-interest/your-
> >> email-address
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list