[antlr-interest] languages without reserved words

Fri Mar 10 01:45:13 PST 2006

If I understand you correctly, you might find Rob Colquhoun's DATABASIC
grammar interesting. Let's take the following line in databasic ...

REM: REM = REM(6,4) ; REM this assigns the remainder of 6,4 to the
variable called REM

As you can see, I've used REM four times, in four different states,
namely label, identifier, function reference, and keyword! Different
dialects of databasic allow all four variants!

The way I think Rob has tackled it, and I certainly would, is a token
filter. Unfortunately that's not easy with v2 - I gather it should be a
lot easier with v3.

Okay, with the databasic language it's fairly easy, but the lexer simply
marks everything it recognises as a keyword, and the token filter
maintains state, flipping keywords into identifiers everywhere that a
keyword is not a legal option.

Cheers,
Wol

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Martin Probst
Sent: 02 March 2006 11:04
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] languages without reserved words

> is there a standard way or a best practice to implement grammars for
languages 
> that allow identifiers to be anything, including the keywords of the
language 
> itself?

There are several ways, and all of them suck somehow.
      * stateful lexing - create a lexer that keeps track of a lexical
        state and only check for keywords in certain states - pro: you
        always know what your token is - con: pretty complicated and
        messy to implement in ANTLR at the moment (might get better soon
        with island grammars!)
      * disambiguation with predicates - always return tokens as
        keywords, have a big rules called "identifier" that lists NCNAME
        and all your keywords, and then use syntactical predicates in
        the parser to work around the ambiguities. This is a little ugly
        and may be slow.
      * Wait for ANTLR 3 - in ANTLR 3 your problem might get solved
        without the syntactical predicates

Stateful lexing is nice because it keeps the complexity within the
lexer. But it means that you cannot just stop lexing somewhere in the
source file and restart lexing later - also handling lexical errors is a
lot more difficult.

Martin

* ************************************************************************ *

This transmission is intended for the named recipient only. It may contain private and confidential information. If this has come to you in error you must not act on anything disclosed in it, nor must you copy it, modify it, disseminate it in any way, or show it to anyone. Please e-mail the sender to inform us of the transmission error or telephone ECA International immediately and delete the e-mail from your information system.

Telephone numbers for ECA International offices are: Sydney +61 (0)2 8272 5300, Hong Kong + 852 2121 2388, London +44 (0)20 7351 5000 and New York +1 212 582 2333.

* ************************************************************************ *