[antlr-interest] Oracle SQL Parser
Bart Kiers
bkiers at gmail.com
Thu Jun 14 23:24:30 PDT 2012
Hi Rick,
You're welcome.
I took the liberty to CC the list for you.
Cheers,
Bart.
On Fri, Jun 15, 2012 at 12:18 AM, Brown, Rick <RBROWN at allstate.com> wrote:
> **
> Hi Bart,
>
> Thanks again for the reply, and thanks for taking the time to create a
> test grammar using the "comment" definition. As you ably demonstrate, the
> grammar is fine, so I did some more digging to find the problem. It turns
> out to be related to my PowerShell script. I was using the Get-Content
> cmdlet to obtain the text from a sql file. However, this cmdlet does not
> preserve newlines; it in fact creates an array of strings, one for each
> line of text. When this is passed to the parser, the parser ends up seeing
> just one long string, with no newlines included, so the end of the comment
> is never found until it reaches end-of-file.
>
> The fix was to change this line:
> $sql = Get-Content $sqlFileName
>
> to this line:
> $sql = [system.io.file]::readalltext($sqlFileName)
>
> and everything now works fine.
>
> Thank you very much for your help!
>
> ~ Rick
>
> p.s. I would post this response to the group, but I am not sure how to
> properly send the email in a way that will associate it as being a response
> to a particular message. Do I just use the same Subject line above and
> send it, or do I need to do something else?
>
> ------------------------------
> *From:* Bart Kiers [mailto:bkiers at gmail.com]
> *Sent:* Thursday, June 14, 2012 4:54 AM
> *To:* Brown, Rick
> *Cc:* antlr-interest at antlr.org interest
>
> *Subject:* Re: [antlr-interest] Oracle SQL Parser
>
> Hi Rick,
>
> You can test the fact that the comments work by running the following
> grammar through ANTLRWorks' debugger:
>
> grammar T;
>
> parse
> : select EOF
> ;
>
> select
> : Select column=Id From table=Id {System.out.println("column=" +
> $column.text + ", table=" + $table.text);}
> ;
>
> Select : 'select';
> From : 'from';
> Id : ('a'..'z' | 'A'..'Z')+;
>
> Comment
> : '--' ~('\n'|'\r')* '\r'? '\n' { $channel=HIDDEN; }
> | '//' ~('\n'|'\r')* '\r'? '\n' { $channel=HIDDEN; }
> | '/*' ( options {greedy=false;} : . )* '*/' { $channel=HIDDEN; }
> ;
>
> Space
> : (' ' | '\t' | '\r' | '\n')+ {skip();}
> ;
>
> Simply press CTRL+D in ANTLRWorks and use the following input to debug
> (also make sure ANTLRWorks starts with the 'parse' rule!):
>
> select
> // comment 1
> dt
> -- comment 2
> from
> /* comment 3 */
> dual
>
> You will then see the following output (you need to select the "output"
> tab to see it):
>
> column=dt, table=dual
>
> Regards,
>
> Bart.
>
>
> On Wed, Jun 13, 2012 at 9:31 PM, Brown, Rick <RBROWN at allstate.com> wrote:
>
>> **
>> Bart,
>>
>> Thanks very much for the reply! To provide some more context, I am
>> trying to parse Oracle SQL files using a combination of PowerShell and a
>> parser from a project by MacroScope. The basic approach I am using is
>> described in this article:
>> http://sev17.com/2010/04/parsing-sql-for-table-names/.
>>
>> The MacroScope parser is based on Anltr3, and the project includes the
>> grammar file as well as the generated parser and lexer files. I
>> contacted MacroScope, and they confirmed that they did not include support
>> for comments at all in their implementation (it is rather old). So, I set
>> about trying to find a replacement Oracle SQL grammar file that included
>> comments, which led me to the file I mentioned in my original post.
>>
>> I am generating the lexer and parser files using the above grammar, and
>> then adding them to the build of the MacroScope parser. I then use the
>> resulting MacroScope classes in a PowerShell script to extract the names
>> of the tables and columns in a given sql file.
>>
>> Here are a few examples of what happens when I use this method for
>> parsing a sql file: The first file looks like this:
>>
>> select sysdate dt
>> /* This is a comment */
>> from dual
>>
>> Running my PowerShell/MacroScope script, I get the column name as 'dt'
>> and the table name as 'dual', which is what I would expect.
>>
>> If I run the same script using this sql file:
>>
>> select sysdate dt
>> // This is a comment
>> from dual
>>
>> I get the column name as 'dt', but I do not get any table name.
>>
>> If I run the script using this sql file:
>>
>> -- This is a comment
>> select sysdate dt from dual
>>
>> The script generates an error indicating that there is no data to parse.
>>
>> My assumption is that everything after the start of the comment is being
>> included as part of the comment, all the way through the end of the file,
>> rather than stopping at the end of the line. Again, I am not an expert, so
>> this is only my assumption based on the behavior I am seeing. It could
>> be that the problem is something else entirely, but it seems odd that the
>> multi-line style of comments (/* */) is processed correctly, but the
>> single-line style (// or --) is not.
>>
>> Thanks again for your help!
>>
>> ~ Rick
>>
>> ------------------------------
>> *From:* Bart Kiers [mailto:bkiers at gmail.com]
>> *Sent:* Wednesday, June 13, 2012 3:30 AM
>> *To:* Brown, Rick
>> *Cc:* antlr-interest at antlr.org
>> *Subject:* Re: [antlr-interest] Oracle SQL Parser
>>
>> On Wed, Jun 13, 2012 at 12:20 AM, Brown, Rick <RBROWN at allstate.com>wrote:
>>
>>> ... but the problem seems to be that the
>>> comment is not being ended at the end of a line; instead, it seems to
>>> consider everything that follows, the entire rest of the file, as part
>>> of the comment. ...
>>>
>>
>> I don't see anything wrong with the rule. The Comment rule has 3
>> alternatives:
>>
>> Comment
>> : '--' ~('\n'|'\r')* '\r'? '\n' { $channel=HIDDEN;
>> } // alternative 1
>> | '//' ~('\n'|'\r')* '\r'? '\n' { $channel=HIDDEN;
>> } // alternative 2
>> | '/*' ( options {greedy=false;} : . )* '*/' { $channel=HIDDEN; } //
>> alternative 3
>> ;
>>
>> Alternative 1 and 2 must end with a line break, and alternative 3 must
>> end with '*/'.
>>
>> Could you tell how you came to the conclusion that there's a problem with
>> the rule? Do you have input that gets tokenized incorrectly? If so, could
>> you send it to the list as well?
>>
>> Regards,
>>
>> Bart.
>>
>
>
More information about the antlr-interest
mailing list