Fwd: [antlr-interest] parsing incomplete syntax

Bans VGLab bans.vglab at gmail.com
Fri Jun 30 06:33:57 PDT 2006


From: Bans VGLab <bans.vglab at gmail.com>
To: Michiel Vermandel <Michiel_Vermandel at axi.be>

Hello Michiel

I have no definite solution. But I can give some views, which might be of
some help.

For this particular problem of showing columns (of course, this is not
always applicable for a general problem of this kind), We can have a
coarse-grain grammar, that is more forgiving in nature.

For instance, in your case, accept anything that conforms to something like
this:

"SELECT" select_list "FROM" table_list "WHERE" condition

Now, don't force select_list, table_list and condition to conform to their
(respective) exact format. Instead allow them to be a generic string-token.
This will accept any crappy input like:

SELECT  i have a black pen FROM tt pp qq uu WHERE i am good

But you see a good thing about this. That is, you can walk this tree using a
coarse-grain tree-parser and then subject the strings "i have a black pen",
"tt pp qq uu" and "i am good" for further parsing, specific to the form
expected. I'd term this as fine-grained parsing.

The best part of it is, you can now pin-point the error-message to the user.
For example, when you try parsing "i have a black pen" to obtain
column-list, you can easily print an error like:
   "Expecting ',' but found 'a' "

Similarly, when you parse the string "i am good" to obtain the condition
AST, you can easily pin-point the error like:
    "Unexpected token 'am' "

You see, this kind of hierarchical (multi-level parsing, with next lower
level being finer and finer) allows you to parse ahead, even if there is an
error in the beginning.

Now, how can this help you in the problem you have - text-completion? This
can be outlined as:

1. Say the user inputs:
     SELECT * FROM customer T, employee E WHERE T.

2. You can easily parse it using level-1 grammar to obtain strings
       "*",
      "customer T, employee E" and
      "T."

3. parse "*". This is a valid input.

4. parse "customer T, employee E". This is a valid input and parsing results
into two tables aliased as T and E. Now, behind the scenes, you can run a
separate thread to fetch and cache the colums of T and E from their schemas.

5. parse "T.". Now here's the tricky part. You now have the option to build
some AI that looks up the list of tables and their columns. Display the
cached columns of the table T.

Hmmmmm...looks like a very specific solution. But it indeed can be applied
in general to any problem of similar kind.

Another consideration is of performance. Bringing in more hierarchy in
grammar might slow things down, if not programmed with anticipation. Need to
strike a proper deal!

Cheers
Sujeet
---------------------
PS: My real name is Sujeet Banerjee. I can be reached at
sujeet.banerjee at gmail.com
I have deep interest in lexers/parsers. I have worked for BEA systems, in
getting their Liquiddata JDBC driver out in the market. Refer to the
publication:
http://doi.ieeecomputersociety.org/10.1109/ICDE.2006.147
---------------------



On 6/30/06, Michiel Vermandel <Michiel_Vermandel at axi.be> wrote:
>
>
> Hi,
>
> We have our own Oracle Designer/developer environment and I have written a
> Sql/Plsql/Forms Plsql code parser which is linked to our repository.
> The parser looks up all used objects in the repository. This part works
> just fine.
> But ( as in many other dev tools) the user wants to lookup data while
> typing: eg: "select  * from  drtable t where t*.*" and then pressing a key
> combination to lookup the available columns of table drtable.
> The problem is that at this point the statement is incomplete and no
> grammer rule can be applied.
> What should be the best approach to solve these kinds of issues (not only
> for looking up fields of tables but in general).
> If the AST should say (simplified)
>
> select_statement
>    select_list
>      *
>    from
>     table_list
>        table
>          drtable
>          t
>     where_condition
>         t
>        dot
>
> then I would be saved.
>
> I need the AST tree completely up to the last token.
> (of course other statements can follow this one)
>
> Any suggestions, best practices?
> Anyone done this before (in other language)?
>
> Thanks!
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060630/2e91dd74/attachment-0001.html


More information about the antlr-interest mailing list