Fwd: [antlr-interest] parsing incomplete syntax

Bans VGLab bans.vglab at gmail.com
Fri Jun 30 08:06:44 PDT 2006


Hello Michiel

#1
Would you parse "customer T, employee E" as one single token? If so, I guess
at "level 1"  I need to build a Lexer rule consuming anything but literals
(testLiterals)?

Yes.


#2
Would you create a new parser instance to parse these "level 1" strings?
Then these "level 1+" parsers should be very specific for their goal, right?

Yes. In fact, you may not create a different Parser instance. You can write
the rules for L2 grammar in the same Parser. The rules can take args, which
could be the strings passed on from L1 gramar rules. These L2 rules can be
invoked as methods/functions within the action blocks of L1 rules.

You won't have to share the vocab (as you'd do in case of different
parsers). This might bring in some efficiency.

#3
If (a > object1.

Again, you can have something like
    "if" condition
And, in L2 you can have the condition tree, which can make use of
symbol-table (that contains list of Objects visible in the scope of the
block).

Note ANTLR allows you to write rules and you can use them as
argumented-method calls. For instance, in L2 you may need to have some AI to
remove the unwanted stuff. Say for the following string:
   "(((22 *obj1.val)+ obj2."
can be processed to return two information -
     1. List of objects, say, obj1 and obj2
     2. Which of the objects is currently in focus (ie. the position of the
last '.')
These two info should be helpful enough to compute the data needed, or bring
it from a precalculated cache.

Thus, at L2 itself you can have different grammars (set of co-related
rules). Again, they can be clubbed in the same parser instance, and can be
used as utility methods.

So, the generic approach for L1 could be to fragment the input structure
pivoted on the keywords like "IF" ... A "divide and conquer" strategy. Now
at subsequent levels L2, L3...so  on, be more specific.

More levels, more accuracy in pointing errors, and more precise info an be
shown to the users. But slower the processing would be.

Hope this helps. Please do ask question if you need more clarification :)

Cheers
Sujeet

On 6/30/06, Michiel Vermandel <Michiel_Vermandel at axi.be> wrote:
>
>
> Hi Sujeet
>
> Interesting approach!
>
> How do you see a level-2 parse?
> I mean, let's assume we have parsed in a coarse-grain way and thus have
> got the strings
>
>        "*",
>      "customer T, employee E" and
>      "T."
>
> 1) Would you parse "customer T, employee E" as one single token? If so, I
> guess at "level 1"  I need to build a Lexer rule consuming anything but
> literals (testLiterals)?
>
> 2) Would you create a new parser instance to parse these "level 1"
> strings?
> Then these "level 1+" parsers should be very specific for their goal,
> right?
>
> 3) back to "level 1", other example:
>
> If (a > object1.
>
> So now I want a list of methods of object 1.
> Here I have a different case than the select example because even the
> basic structure of the if statement is incomplete.
> How do you see this?
> Should I create my coarse-grain if rule so that it even accepts "if"
> (condition)?
> So without the "then" "end if" literals?
> But then in the end i need to know if the "level 1" rules are complete...
> Seems complecated...
> In the same way you can see the problem in a "level 2" parse:
> " (a > object1."
> This again is not a valid condition...
> So I need a sort of level-2 coarse-grain rule...
> ...
>
> Or... I create two parsers. 1 fine-grain, as I have know, just to see if
> the syntax is complete and  a second coarse-grain parser for the syntax
> completion.
> Might by the way...
> Then when the fine-grain parser failes on an incomplete or malformed
> statement, I try to collect only that statement (text of start of statement
> upto start of next statement) and pass this to the coarse grained parser?!
> Then at max this one will have to parse 1 statement, and only on request
> (when the user wants to have syntax completion).
>
> Well, guess you have pointed me a direction... Isn't 100% clear to me
> though how to build the coarse-grain parser without having to rewrite the
> grammar a second time (maintenance you know...) but I might find something.
>
> Thanks for the direction, any additional ideas are welcome!
>
> Michiel
>
>
>
>
>
>  *"Bans VGLab" <bans.vglab at gmail.com>*
> Sent by: antlr-interest-bounces at antlr.org
>
> 30/06/2006 15:33
>   To
> antlr-interest at antlr.org  cc
>
>  Subject
> Fwd: [antlr-interest] parsing incomplete syntax
>
>
>
>
>
>
>
> From: *Bans VGLab* <*bans.vglab at gmail.com* <bans.vglab at gmail.com>>
> To: Michiel Vermandel <*Michiel_Vermandel at axi.be *<Michiel_Vermandel at axi.be>
> >
>
> Hello Michiel
>
> I have no definite solution. But I can give some views, which might be of
> some help.
>
> For this particular problem of showing columns (of course, this is not
> always applicable for a general problem of this kind), We can have a
> coarse-grain grammar, that is more forgiving in nature.
>
> For instance, in your case, accept anything that conforms to something
> like this:
>
> "SELECT" select_list "FROM" table_list "WHERE" condition
>
> Now, don't force select_list, table_list and condition to conform to their
> (respective) exact format. Instead allow them to be a generic string-token.
> This will accept any crappy input like:
>
> SELECT  i have a black pen FROM tt pp qq uu WHERE i am good
>
> But you see a good thing about this. That is, you can walk this tree using
> a coarse-grain tree-parser and then subject the strings "i have a black
> pen", "tt pp qq uu" and "i am good" for further parsing, specific to the
> form expected. I'd term this as fine-grained parsing.
>
> The best part of it is, you can now pin-point the error-message to the
> user. For example, when you try parsing "i have a black pen" to obtain
> column-list, you can easily print an error like:
>   "Expecting ',' but found 'a' "
>
> Similarly, when you parse the string "i am good" to obtain the condition
> AST, you can easily pin-point the error like:
>    "Unexpected token 'am' "
>
> You see, this kind of hierarchical (multi-level parsing, with next lower
> level being finer and finer) allows you to parse ahead, even if there is an
> error in the beginning.
>
> Now, how can this help you in the problem you have - text-completion? This
> can be outlined as:
>
> 1. Say the user inputs:
>     SELECT * FROM customer T, employee E WHERE T.
>
> 2. You can easily parse it using level-1 grammar to obtain strings
>       "*",
>      "customer T, employee E" and
>      "T."
>
> 3. parse "*". This is a valid input.
>
> 4. parse "customer T, employee E". This is a valid input and parsing
> results into two tables aliased as T and E. Now, behind the scenes, you can
> run a separate thread to fetch and cache the colums of T and E from their
> schemas.
>
> 5. parse "T.". Now here's the tricky part. You now have the option to
> build some AI that looks up the list of tables and their columns. Display
> the cached columns of the table T.
>
> Hmmmmm...looks like a very specific solution. But it indeed can be applied
> in general to any problem of similar kind.
>
> Another consideration is of performance. Bringing in more hierarchy in
> grammar might slow things down, if not programmed with anticipation. Need to
> strike a proper deal!
>
> Cheers
> Sujeet
> ---------------------
> PS: My real name is Sujeet Banerjee. I can be reached at *
> sujeet.banerjee at gmail.com* <sujeet.banerjee at gmail.com>
> I have deep interest in lexers/parsers. I have worked for BEA systems, in
> getting their Liquiddata JDBC driver out in the market. Refer to the
> publication:*
> **http://doi.ieeecomputersociety.org/10.1109/ICDE.2006.147 *<http://doi.ieeecomputersociety.org/10.1109/ICDE.2006.147>
> ---------------------
>
>
>
> On 6/30/06, *Michiel Vermandel* <*Michiel_Vermandel at axi.be *<Michiel_Vermandel at axi.be>>
> wrote:
>
> Hi,
>
> We have our own Oracle Designer/developer environment and I have written a
> Sql/Plsql/Forms Plsql code parser which is linked to our repository.
> The parser looks up all used objects in the repository. This part works
> just fine.
> But ( as in many other dev tools) the user wants to lookup data while
> typing: eg: "select  * from  drtable t where t*.*" and then pressing a key
> combination to lookup the available columns of table drtable.
> The problem is that at this point the statement is incomplete and no
> grammer rule can be applied.
> What should be the best approach to solve these kinds of issues (not only
> for looking up fields of tables but in general).
> If the AST should say (simplified)
>
> select_statement
>   select_list
>     *
>   from
>    table_list
>       table
>         drtable
>         t
>    where_condition
>        t
>       dot
>
> then I would be saved.
>
> I need the AST tree completely up to the last token.
> (of course other statements can follow this one)
>
> Any suggestions, best practices?
> Anyone done this before (in other language)?
>
> Thanks!
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060630/2b366c29/attachment-0001.html


More information about the antlr-interest mailing list