[antlr-interest] Oracle SQL Parser
Bart Kiers
bkiers at gmail.com
Thu Jun 14 02:54:00 PDT 2012
Hi Rick,
You can test the fact that the comments work by running the following
grammar through ANTLRWorks' debugger:
grammar T;
parse
: select EOF
;
select
: Select column=Id From table=Id {System.out.println("column=" +
$column.text + ", table=" + $table.text);}
;
Select : 'select';
>From : 'from';
Id : ('a'..'z' | 'A'..'Z')+;
Comment
: '--' ~('\n'|'\r')* '\r'? '\n' { $channel=HIDDEN; }
| '//' ~('\n'|'\r')* '\r'? '\n' { $channel=HIDDEN; }
| '/*' ( options {greedy=false;} : . )* '*/' { $channel=HIDDEN; }
;
Space
: (' ' | '\t' | '\r' | '\n')+ {skip();}
;
Simply press CTRL+D in ANTLRWorks and use the following input to debug
(also make sure ANTLRWorks starts with the 'parse' rule!):
select
// comment 1
dt
-- comment 2
from
/* comment 3 */
dual
You will then see the following output (you need to select the "output" tab
to see it):
column=dt, table=dual
Regards,
Bart.
On Wed, Jun 13, 2012 at 9:31 PM, Brown, Rick <RBROWN at allstate.com> wrote:
> **
> Bart,
>
> Thanks very much for the reply! To provide some more context, I am trying
> to parse Oracle SQL files using a combination of PowerShell and a parser
> from a project by MacroScope. The basic approach I am using is described
> in this article: http://sev17.com/2010/04/parsing-sql-for-table-names/.
>
> The MacroScope parser is based on Anltr3, and the project includes the
> grammar file as well as the generated parser and lexer files. I
> contacted MacroScope, and they confirmed that they did not include support
> for comments at all in their implementation (it is rather old). So, I set
> about trying to find a replacement Oracle SQL grammar file that included
> comments, which led me to the file I mentioned in my original post.
>
> I am generating the lexer and parser files using the above grammar, and
> then adding them to the build of the MacroScope parser. I then use the
> resulting MacroScope classes in a PowerShell script to extract the names
> of the tables and columns in a given sql file.
>
> Here are a few examples of what happens when I use this method for parsing
> a sql file: The first file looks like this:
>
> select sysdate dt
> /* This is a comment */
> from dual
>
> Running my PowerShell/MacroScope script, I get the column name as 'dt'
> and the table name as 'dual', which is what I would expect.
>
> If I run the same script using this sql file:
>
> select sysdate dt
> // This is a comment
> from dual
>
> I get the column name as 'dt', but I do not get any table name.
>
> If I run the script using this sql file:
>
> -- This is a comment
> select sysdate dt from dual
>
> The script generates an error indicating that there is no data to parse.
>
> My assumption is that everything after the start of the comment is being
> included as part of the comment, all the way through the end of the file,
> rather than stopping at the end of the line. Again, I am not an expert, so
> this is only my assumption based on the behavior I am seeing. It could
> be that the problem is something else entirely, but it seems odd that the
> multi-line style of comments (/* */) is processed correctly, but the
> single-line style (// or --) is not.
>
> Thanks again for your help!
>
> ~ Rick
>
> ------------------------------
> *From:* Bart Kiers [mailto:bkiers at gmail.com]
> *Sent:* Wednesday, June 13, 2012 3:30 AM
> *To:* Brown, Rick
> *Cc:* antlr-interest at antlr.org
> *Subject:* Re: [antlr-interest] Oracle SQL Parser
>
> On Wed, Jun 13, 2012 at 12:20 AM, Brown, Rick <RBROWN at allstate.com>wrote:
>
>> ... but the problem seems to be that the
>> comment is not being ended at the end of a line; instead, it seems to
>> consider everything that follows, the entire rest of the file, as part
>> of the comment. ...
>>
>
> I don't see anything wrong with the rule. The Comment rule has 3
> alternatives:
>
> Comment
> : '--' ~('\n'|'\r')* '\r'? '\n' { $channel=HIDDEN;
> } // alternative 1
> | '//' ~('\n'|'\r')* '\r'? '\n' { $channel=HIDDEN;
> } // alternative 2
> | '/*' ( options {greedy=false;} : . )* '*/' { $channel=HIDDEN; } //
> alternative 3
> ;
>
> Alternative 1 and 2 must end with a line break, and alternative 3 must end
> with '*/'.
>
> Could you tell how you came to the conclusion that there's a problem with
> the rule? Do you have input that gets tokenized incorrectly? If so, could
> you send it to the list as well?
>
> Regards,
>
> Bart.
>
More information about the antlr-interest
mailing list