[antlr-interest] Is it possible to get FOLLOW set from antlr runtime? (longish)

Wed Jul 7 05:08:27 PDT 2004

Hi,

Sorry if this is a newbie question, but let me explain:

- Pre-requisites: I have a relatively uncomplicated gramma that can
verify input and put it into a usable data structure for me to work
with. This part works.

- Problem: To facilitate input, i would like to build autocompletion
into the user interface. At various stages, i have the different lists
that the user should be able to pick from, if [s]he does not want to
type it all by hand. To do this, I imagined I would be able to re-use
the parser that takes the final output, to read out where the user was
in the language, and by checking what the next possible tokens was, be
able to pick the correct list of possibilities to show.

Is this at all possible? Or even practical? 

I can see two basic approaches I can use if I cant query antlr at runtime:

- For each keystroke, parse the whole thing putting char+lineno into
the tokens, comparing to the actual position of the caret in the
edit-field, combined with an manually updated "NextValidToken" list on
every successfully read token.

- Work with exceptions, parsing the whole data every keystroke
recording where the exception came, calculating NextValidToken from that.

The problem with both approaches is that i have to manually create and
update the NextValidToken on each rule.

Example (pseudocode):

begin : Rule1 Rule2

Rule1 : "Foo"|"Fuu"
Rule2 : "Bar"|"Baz"

This parser would allow me to write either Foo Bar Foo Baz Fuu Bar Fuu
Baz.

Assume the user has written Foo . Now i would like to show Bar and Baz
 to the user to allow him to pick one. Running the parser on Foo would
give a syntax error, but not before Rule1 was evaluated.
So if I rewrite it like this:

Example (pseudocode):

begin : Rule1 {NextValidToken="Rule2"} Rule2

Rule1 : "Foo"|"Fuu"
Rule2 : "Bar"|"Baz"

I should be able to pick up (when the syntax error exception occurs),
that the user has come to a point where he must choose from Rule2
(which then must be available in a copy from the program).

Is my understanding correct on this subject?

This is all good, and definetly possible in my smallish grammar, but
it seems such a waste that I have to re-ivent the wheel to obtain what
the parser system already nows, i.e. the FOLLOW set of Rule1.

Ofcourse, I cant always rely on exceptions to achieve my goal.
Consider the following example:

Example (pseudocode):

begin : Rule1 {NextValidToken="Rule2"} [Rule2]?

Rule1 : "Foo"|"Fuu"
Rule2 : "Bar"|"Baz"

In this case the parser would return "success" from parsing input
"Foo", since Rule2 is optional. 

Does anybody have some insights into how this is best handled? Should
I handle it in the lexer to only provide data up to the point of the
edit-box caret, and so starve the parser for input after the Rule1
terminal? Would this allow me to have introspection into the parsetree
at that point to see where what the last state was, and thus read out
the NextValid stages?

Sry if this reads a bit rambly. 

Any help would be much appreciated.

Regards
Asger Jensen

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/