[antlr-interest] Fwd: Lexer doesn't agree with me (gives other tokens than I need)

Sun Apr 12 05:54:44 PDT 2009

---------- Forwarded message ----------
From: Bill Mayfield <antlrinterest at gmail.com>
Date: Sun, Apr 12, 2009 at 2:36 PM
Subject: Re: Lexer doesn't agree with me (gives other tokens than I need)
To: Alexander Brown <abrown at analytics8.com>

Hi Alexander,

Here is the definition of my "function" rule

function
    :    // LEFT and RIGHT keywords are also function names
        ( NonQuotedIdentifier | LEFT | RIGHT ) LPAREN ( functionArgument (
COMMA functionArgument )* )? RPAREN
    ;

I had to add the LEFT & RIGHT tokens because otherwise the parser doesn't
recognize that LEFT(functionArgument, functionArgument) is also a
function...

Prior to this I had defined another rule called "joinOn" that need the LEFT
& RIGHT token types, see:

joinOn
    :    ( INNER    | ( LEFT | RIGHT | FULL {} ) ( OUTER )? )? JOIN
    ;

My AST subtree for function would ideally be: ^(FUNCTION NonQuotedIdentifier
^( FUNCTIONARGUMENTS functionArgument+ )? ) but now I have to make it
^(FUNCTION
NonQuotedIdentifier? LEFT? RIGHT? ^( FUNCTIONARGUMENTS functionArgument+ )?
) which messes up the AST in my opinion...

I hope this makes my situation and problem more clear? Any help would be
appreciated :o)

Regards,
Bill

On Sun, Apr 12, 2009 at 12:51 PM, Alexander Brown <abrown at analytics8.com>wrote:

>   Hi,
>
> It's sort of an odd question in the sense that LEFT or RIGHT (either as
> outer join type specifiers or as character value functions in TSQL) are
> legtimate keywords rather than identifiers (like column and table names
> or schema qualifiers, etc).  There's no ambiguity at a parser level for
> those two scenarios though, so there isn't any need to force the lexer to
> generate an identifier in one scenario and a keyword in another.
>
> I can only imagine that you want to identify the keywords as
> identifiers for two reasons- either the DB doesn't constrain users from
> using keywords as identifiers (CREATE TABLE TABLE, for example) or that what
> you want in your AST is to produce as generic character function node for
> all character functions with a specific signature (function_name LPAREN
> character_value_expression COMMA numeric_value_expression RPAREN, for
> example).  Even in the latter scenario I don't think you really want to
> identify the function 'RIGHT' or 'LEFT' as an identifier.
>
> All this being said, you could probably could rewrite the AST to do what
> you want (haven't tried it though).  Maybe if you provide some more detail
> about what you are trying to achieve at the AST level perhaps I could
> suggest a way to achieve it?
>
> Alex
>   *Alexander Brown*
> Partner | Analytics 8 | Tel +61 2 9299 4430 | Mob +61 424 043 485|
> abrown at analytics8.com | www.analytics8.com
>
>
> ------------------------------
>
> Hi,
>
> I'm creating a parser for a SQL dialect (sue me :oP) and I'm facing a
> problem regarding the lexer generating the wrong kind of token in a certain
> context.
>
> Basically I have defined two tokens called LEFT & RIGHT which are needed to
> parse SQL joins (left outer join, right outer join, etc...)
>
> LEFT : 'left' ; RIGHT : 'right' ;
>
> The problem occurs when I'm matching the SQL *functions* LEFT & RIGHT.
>
> LEFT (functionArgs) RIGHT (functionArgs)
>
> I want the function name to be an IDENTIFIER token but no can do due to the
> lexer... It gives me a LEFT or RIGHT token obviously :'o(
>
> What are the general recommendations you experienced ANTLR buffs can give
> me? The parser is generating an AST so I don't really care how it matches as
> long as I can keep my AST neat 'n tidy :o/
>
> Thanks! Bill
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090412/a5dd02aa/attachment.html