[antlr-interest] Excluding words as tokens

Fri May 22 07:23:26 PDT 2009

Greetings!
On Friday 22 May 2009 05:07:55 am schlpbch at gmx.ch wrote:
> I'm trying to write a grammar to check the correctness of package names in
> respect to our meta modell. Thereby I ran into to following problems:
>
> ABSTRACTOBJECT:
>   SERVICEOBJECT | QUERYOBJECT
>
> SERVICEOBJECT: UPPERCASE (LOWERCASE|UPPERCASE)+ DIGIT*
>
> QUERYOBJECT: UPPERCASE (LOWERCASE|UPPERCASE)+ DIGIT* "Query"
>
> Of course this production is ambiguous, i.e.
>
>   DateQuery
>
> can be SERVICEOBJECT token as well as a QUERYOBJECT token.
>
> What I would like to express is that a QUERYOBJECT has to end with "Query"
> otherwise it is a SERVICEOBJECT which would resolve the ambiguity. How can
> this be expressed in ANTLR?

fragment QUERYOBJECT:; // or define in a tokens{} section
SERVICEOBJECT: UPPERCASE (LOWERCASE|UPPERCASE)+ DIGIT* ("Query" { $type = QUERYOBJECT; } )? ;

> A related problem I have is that I would like to express that COMPONENTNAME
> can consists of LOWERCASE+ DIGIT* except 'model' or 'impl'.
>
> I tried expressing these constraints as a regular expression, playing
> around with '~' but somehow didn't get it working.
>
> Does anybody have an example how this can be expressed?

to me this is very similar to the natural way Keywords are distinguished from
Identifiers in a lexer for a programming language such as Java or C or ....

how will you handle the strings "model" and "impl" when they appear in your
input? I assume that they are some kind of Keywords in your language. So:

MODEL_KEYWORD : "model";
IMPL_KEYWORD : "impl";
COMPONENTNAME : LOWERCASE+ DIGIT* ;

and then your parser rules must now properly deal with these two Keywords.

> Thank you very much for your input.
> Andreas

Hope I helped...
-----
  -jbb