[antlr-interest] Incremental Parsing: a simple practical solution

Christian Boitet Christian.Boitet at imag.fr
Sat Jul 21 03:19:54 PDT 2007


Hi,				21/7/07

At 18:44 -0700 16/07/07, Sohail Somani wrote:
>Well my possibly flawed assumption was that the grammar would look
>something like this:
>
>// csv parser
>startRule: singleRecord*;
>
>...
>
>If you start your parser at startRule then it will exhaust all input
>before returning, but if you start it at singleRecord, it should only
>care about enough tokens to satisfy the one rule.

As Jean-Claude Durand & I wrote earlier, there is a problem here 
because the LA seem to be constructed with the first non-terminal as 
axiom only.

Then, if you call your parser with another non-terminal as axiom, 
your LA may not work the way you expect.

In other words, the FOLLOW sets associated with each NT are in 
general different if you change the axiom. For example, if you write

// csv parser
startRule: singleRecord* "ENDRECORDS";
...

-"ENDRECORDS" and singleRecord but not EOF are in 
FOLLOW(singleRecord) if the axiom is startRule,
- but not if the axiom is singleRecord, in which case EOF only is expected.

If you parse a single record followed by "ENDRECORDS" with 
singleRecords as axiom, it will work while it should not. And if you 
have associated some action to ENDRECORDS, like performing terminal 
actions on the internal representation of your list of records, you 
would perform it for every incremental input of the form singleRecord 
"ENRECORDS".

Our solution in practice is simply to build as many parsers as we 
have increments, using the necessary rules only, or putting the 
"increment rule" first. If some NTs become unaccessible, that is OK, 
even expected.

That limits the set of increments to a predefined subset, true 
enough, but in practice it is not so interesting to go down to the 
smallest possible increments.

For example, if we compile decorated trees (in our case, linguistic 
trees), there is no need to go down to take each attribute as an 
increment: simply modify the whole list (which looks like a list of 
attributes in an XML tree), and recompile it after local 
modification(s).

Finally, Terence (I think) mentioned that grammar inheritance would 
be sometime be included in ANTLR-3, which would give a better 
solution.

Best,

Christian Boitet & Jean-Claude Durand

>On Tue, 2007-07-17 at 11:30 +1000, Jonathan Thomas wrote:
>>  How would one go about doing that, Sohail?
>>  I'm also mildly interested in such a solution.
>>
>>  Sohail Somani said the following on 17/07/2007 11:25 AM:
>>  > On Mon, 2007-07-16 at 20:51 -0400, Benji Smith wrote:
>>  >  
>>  > > The JavaCC parser works well, except that I can't parse the file
>>  > > record-by-record. I have to parse the entire file, returning a
>>  > > collection of Record objects at the end.
>>  > >    
>>  >
>>  > Set up your grammar to only do one record at a time. Then keep parsing
>>  > until you get back a null record (or whatever)
>>  >
>  > > Should work ;-)



More information about the antlr-interest mailing list