[antlr-interest] Incremental Parsing: a simple practical solution
Christian Boitet
Christian.Boitet at imag.fr
Sat Jul 21 03:19:54 PDT 2007
Hi, 21/7/07
At 18:44 -0700 16/07/07, Sohail Somani wrote:
>Well my possibly flawed assumption was that the grammar would look
>something like this:
>
>// csv parser
>startRule: singleRecord*;
>
>...
>
>If you start your parser at startRule then it will exhaust all input
>before returning, but if you start it at singleRecord, it should only
>care about enough tokens to satisfy the one rule.
As Jean-Claude Durand & I wrote earlier, there is a problem here
because the LA seem to be constructed with the first non-terminal as
axiom only.
Then, if you call your parser with another non-terminal as axiom,
your LA may not work the way you expect.
In other words, the FOLLOW sets associated with each NT are in
general different if you change the axiom. For example, if you write
// csv parser
startRule: singleRecord* "ENDRECORDS";
...
-"ENDRECORDS" and singleRecord but not EOF are in
FOLLOW(singleRecord) if the axiom is startRule,
- but not if the axiom is singleRecord, in which case EOF only is expected.
If you parse a single record followed by "ENDRECORDS" with
singleRecords as axiom, it will work while it should not. And if you
have associated some action to ENDRECORDS, like performing terminal
actions on the internal representation of your list of records, you
would perform it for every incremental input of the form singleRecord
"ENRECORDS".
Our solution in practice is simply to build as many parsers as we
have increments, using the necessary rules only, or putting the
"increment rule" first. If some NTs become unaccessible, that is OK,
even expected.
That limits the set of increments to a predefined subset, true
enough, but in practice it is not so interesting to go down to the
smallest possible increments.
For example, if we compile decorated trees (in our case, linguistic
trees), there is no need to go down to take each attribute as an
increment: simply modify the whole list (which looks like a list of
attributes in an XML tree), and recompile it after local
modification(s).
Finally, Terence (I think) mentioned that grammar inheritance would
be sometime be included in ANTLR-3, which would give a better
solution.
Best,
Christian Boitet & Jean-Claude Durand
>On Tue, 2007-07-17 at 11:30 +1000, Jonathan Thomas wrote:
>> How would one go about doing that, Sohail?
>> I'm also mildly interested in such a solution.
>>
>> Sohail Somani said the following on 17/07/2007 11:25 AM:
>> > On Mon, 2007-07-16 at 20:51 -0400, Benji Smith wrote:
>> >
>> > > The JavaCC parser works well, except that I can't parse the file
>> > > record-by-record. I have to parse the entire file, returning a
>> > > collection of Record objects at the end.
>> > >
>> >
>> > Set up your grammar to only do one record at a time. Then keep parsing
>> > until you get back a null record (or whatever)
>> >
> > > Should work ;-)
More information about the antlr-interest
mailing list