[antlr-interest] define two tokens with the same allowed characters

Austin Hastings Austin_Hastings at Yahoo.com
Wed Oct 3 02:16:19 PDT 2007


It's not clear what you're doing, but you need to get your head around 
separate layers of analysis.

Your token for ID seems weird - do you really allow things like plus, 
minus, and equals (+-=) in your identifier names?
 
That's something for you to answer. Let's look at the bigger picture:

You want to recognize a function that takes three args. Presumably it's 
a language builtin. So make a special token for it, and move on:

KW_ISEQUAL: 'IsEqual';   /* You could do this in-line, but it's easier 
to read the parser code this way. */
ID: 'a'..'z'+ ;

So what does your language look like? Well, part of it looks like this:

fn_isequal
    : KW_ISEQUAL '(' arg1=ID ',' arg2=ID ',' arg3=ID ')'
      
       {System.out.println("You called IsEqual - I know because I'm 
inside the rule - with three args: " + $arg1.text + ", " + $arg2.text + 
", and " + $arg3.text);}

    ;


Notice how we changed from UPPERCASE to lowercase production names? 
That's because recognizing a function call is a Parser, not a Lexer, 
function. The Lexer does lexical analysis - it recognizes keywords, 
identifiers, and the like. The Parser does syntactic analysis. It 
recognizes combinations of tokens as being valid, or not.
 
In this case, the keyword "IsEqual" followed by a '(' and three 
identifiers separated by commas and a ')' is a valid phrase in the 
language. It's up to you to recognize other valid phrases, and you can 
create intermediate productions to do that, like:

call_builtin_func
    : fn_isequal
    | fn_notequal
    | fn_lessthan
    | fn_greaterthan
    ;

This presumes you've built those other function calls, as you did for 
isequal.
 
At some point, you need to figure out what your language looks like at 
the high level. It may well be that there is a grammar already available 
for the language. But the high-level view of what the language should be 
like will inform how you break it down into productions - just like your 
high level view of a program informs how you create functions or methods 
to write the program.

=Austin


OJAY78 at gmx.de wrote:
> Hi,
>
> thanks for the answer. I tried it your way so that I have only one ID token.
>
> ID: ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'.'|'+'|'-'|'#'|'=')+
>
> my funtion should distinguish between the different tokens, so my function looks like this ISEQUAL'('FIELDTYPE','ID','ID')' but then I get an error that ID is an non unique reference.
>
> What I want is the the function know how to handle such calls: IsEqual(formattrib,test#enabled,=enabled)
>
> I thougt that I can take the second ID token in my function and check if it starts with the '=' chararacter and if it is its is the second ID
>
> any advise where is my failure?
>
> Thanks
>   



More information about the antlr-interest mailing list