[antlr-interest] finish/stop parsing without closing reader

Sat Oct 24 17:48:15 PDT 2009

Well, investing further my Œproblem¹ I come to understand that passing on
the input stream from the lexer/parser to another component for storing
binary data occurring after a certain rule was matched is not such a good
idea. Mainly because the lexer loads a big chunk from the stream to an
internal buffer for further processing.

What I am now looking into is something like that:

d_data
  : oigId syncpointId uncompressedLength (BINARY { store/append
$BINARY.text.getBytes() to file; })+
  ;

// zlib deflated data containing 0x00 and 0xff :-(
BINARY
  : (Œ\u0000¹..¹\u00ff¹) [match at least 1 time, at most n * 1024 times]
  ;

But how could that ever work? I mean at least 0x00 and 0xff have special
meanings. Which characters exactly would be matched by (.)+ for that matter?
Probably not EOF.

Maybe another approach would be to have some global state shared by the
parser and lexer with which the lexer could be switched to write the
remaining bytes to a file (directly from the input stream). But as far as I
understand things (and that¹s not very much) the lexer would not be called
another time after the parser is done recognizing. Problem is, I can¹t
determine the time to switch to byte processing in the lexer.

Honestly I¹m kinda stuck... Any ideas?

- Horst

On 24.10.09 15:07, "Horst Dehmer" <horst.dehmer at inode.at> wrote:

> Hello!
> 
> For one special parser rule in my grammar I have to stop parsing and read the
> rest of the input from the reader as (compressed) binary data. Is there any
> way to instruct the parser to stop reading further tokens when the top most
> rule was successfully recognized?
> 
> Any help is highly appreciated!
> 
> - Horst
> 
> More details:  The rule recognizes header information for the then following
> binary data:
> 
> d_data returns [D_DATA pdu]
> @init {
>   pdu = null;
> }
>   : 'D4|' oigId = oig_id '|' syncpointId = syncpoint_id '|' uncompressedLength
> = length '|'
>     {
>       SyncpointDescriptor syncpoint = ...
>       pdu = new D_DATA(syncpoint);
>     }
>   ;
> 
> The size of the binary data following the last Œ|¹ can become quite big and I
> have to store them as a file to disk.
> After the header is recognized the parser returns but the input reader is
> closed. Without using EOF in the rule, it seems the additional bytes are
> consumed from ANTLRReaderStream:
> 
> CharStream charStream = new ANTLRReaderStream(reader);
> PduLexer lexer = new PduLexer(charStream);
> TokenStream tokenStream = new CommonTokenStream(lexer);
> PduParser parser = new PduParser(tokenStream);
> ...
> 
> With a trailing EOF in the rule the parser naturally complains about the
> additional information: line 1:36 extraneous input '<binary data>' expecting
> EOF
> The test case shows proper recognition of the T2/D4 along with syncpoint
> token, but reading Œ<binary data>¹ fails due to a closed reader/stream.
> 
> @Test
> public void parse_whole() throws RecognitionException, IOException {
>     final BigDecimal OIG = new BigDecimal("10903008203000000001");
>     final String SYNCPOINT = "82737";
>     final long LENGTH = 40;
>     final String FORMAT = "T2|D4|%20.0f|%s|%d|<binary data>";
>     final String MESSAGE = String.format(FORMAT, OIG, SYNCPOINT, LENGTH);
> 
>     StringReader reader = new StringReader(MESSAGE);
>     T_PDU t_pdu = parserDriver.parse(reader); // OK.
>     D_PDU d_pdu = ((T_DATA) t_pdu).getPdu(); // OK.
>     D_DATA d_data = (D_DATA) d_pdu; // OK.
>     SyncpointDescriptor token = d_data.getToken();
>     
>     assertEquals(OIG, token.oigId); // OK.
>     assertEquals(SYNCPOINT, token.syncpointId); // OK.
>     assertEquals(LENGTH, token.uncompressedLength); // OK.
> 
>     try {
>         // reader should be positioned at the rest of the message, i.e.
> Œ<binary data>¹.
>         char[] buffer = new char["<binary data>".length()];
>         reader.read(buffer);
>     }
>     catch (IOException exception) {
>         // NOT OK:
>         // java.io.IOException: Stream closed
>     }
> }
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20091025/37d81ff5/attachment.html