[antlr-interest] Oracle SQL Parser

Bart Kiers bkiers at gmail.com
Thu Jun 14 02:54:00 PDT 2012


Hi Rick,

You can test the fact that the comments work by running the following
grammar through ANTLRWorks' debugger:

grammar T;

parse
 : select EOF
 ;

select
 : Select column=Id From table=Id {System.out.println("column=" +
$column.text + ", table=" + $table.text);}
 ;

Select : 'select';
>From   : 'from';
Id     : ('a'..'z' | 'A'..'Z')+;

Comment
 : '--' ~('\n'|'\r')* '\r'? '\n' { $channel=HIDDEN; }
 | '//' ~('\n'|'\r')* '\r'? '\n' { $channel=HIDDEN; }
 | '/*' ( options {greedy=false;} : . )* '*/' { $channel=HIDDEN; }
 ;

Space
 : (' ' | '\t' | '\r' | '\n')+ {skip();}
 ;

Simply press CTRL+D in ANTLRWorks and use the following input to debug
(also make sure ANTLRWorks starts with the 'parse' rule!):

select
// comment 1
dt
-- comment 2
from
/* comment 3 */
dual

You will then see the following output (you need to select the "output" tab
to see it):

column=dt, table=dual

Regards,

Bart.


On Wed, Jun 13, 2012 at 9:31 PM, Brown, Rick <RBROWN at allstate.com> wrote:

> **
> Bart,
>
> Thanks very much for the reply!  To provide some more context, I am trying
> to parse Oracle SQL files using a combination of PowerShell and a parser
> from a project by MacroScope.  The basic approach I am using is described
> in this article: http://sev17.com/2010/04/parsing-sql-for-table-names/.
>
> The MacroScope parser is based on Anltr3, and the project includes the
> grammar file as well as the generated parser and lexer files.  I
> contacted MacroScope, and they confirmed that they did not include support
> for comments at all in their implementation (it is rather old).  So, I set
> about trying to find a replacement Oracle SQL grammar file that included
> comments, which led me to the file I mentioned in my original post.
>
> I am generating the lexer and parser files using the above grammar, and
> then adding them to the build of the MacroScope parser.  I then use the
> resulting MacroScope classes in a PowerShell script to extract the names
> of the tables and columns in a given sql file.
>
> Here are a few examples of what happens when I use this method for parsing
> a sql file:  The first file looks like this:
>
> select sysdate dt
> /* This is a comment */
>   from dual
>
> Running my PowerShell/MacroScope script, I get the column name as 'dt'
> and the table name as 'dual', which is what I would expect.
>
> If I run the same script using this sql file:
>
> select sysdate dt
> // This is a comment
>   from dual
>
> I get the column name as 'dt', but I do not get any table name.
>
> If I run the script using this sql file:
>
> -- This is a comment
> select sysdate dt from dual
>
> The script generates an error indicating that there is no data to parse.
>
> My assumption is that everything after the start of the comment is being
> included as part of the comment, all the way through the end of the file,
> rather than stopping at the end of the line.  Again, I am not an expert, so
> this is only my assumption based on the behavior I am seeing.  It could
> be that the problem is something else entirely, but it seems odd that the
> multi-line style of comments (/* */) is processed correctly, but the
> single-line style (// or --) is not.
>
> Thanks again for your help!
>
> ~ Rick
>
>  ------------------------------
> *From:* Bart Kiers [mailto:bkiers at gmail.com]
> *Sent:* Wednesday, June 13, 2012 3:30 AM
> *To:* Brown, Rick
> *Cc:* antlr-interest at antlr.org
> *Subject:* Re: [antlr-interest] Oracle SQL Parser
>
>  On Wed, Jun 13, 2012 at 12:20 AM, Brown, Rick <RBROWN at allstate.com>wrote:
>
>> ... but the problem seems to be that the
>> comment is not being ended at the end of a line; instead, it seems to
>> consider everything that follows, the entire rest of the file, as part
>> of the comment. ...
>>
>
> I don't see anything wrong with the rule. The Comment rule has 3
> alternatives:
>
> Comment
>   :  '--' ~('\n'|'\r')* '\r'? '\n' { $channel=HIDDEN;
> }                         // alternative 1
>   |  '//' ~('\n'|'\r')* '\r'? '\n' { $channel=HIDDEN;
> }                         // alternative 2
>   |  '/*' ( options {greedy=false;} : . )* '*/' { $channel=HIDDEN; }  //
> alternative 3
>   ;
>
> Alternative 1 and 2 must end with a line break, and alternative 3 must end
> with '*/'.
>
> Could you tell how you came to the conclusion that there's a problem with
> the rule? Do you have input that gets tokenized incorrectly? If so, could
> you send it to the list as well?
>
> Regards,
>
> Bart.
>


More information about the antlr-interest mailing list