[antlr-interest] contiuation of parsing from a stream

rhalin rhalin at yahoo.com
Sun Aug 18 15:51:26 PDT 2002


Heyyas,

I'm attempting to convert a scanner and parser I've been using in
flex/bison to ANTLR. My main reason for doing so is that I require
multiple instances(C++ classes) of the system and flex/bison doesn't
do a good job of supporting this.

So far I've been pleased with what I've found in ANTLR, but one
problem still eludes me.  The string I'm trying to parse is a
stream-based protocol read from a socket.  The protocol is loosely
XML-based.  

The problem is that the protocol has no terminating character to say
"I'm done giving you new data."  Taking into account things like
internet lag, the string most likely will not be recieved all at one
time.  For instance, the client may recieve these two strings,

"<tag><tag"
"><tag>"

To make matters worse, the last character in the output data does not
-have- to be ">"

What I need is a way for the scanner and parser to remember where they
left off, and not just throw away that information because they've
reached the end of the string, or have been called again.  With flex /
bison it was simple: replace the first string, with the second string,
and continue parsing where the first one left off.  Is there maybe an
option for doing this in the lexer and parser classes that I can use?

Below is my current (skeletal) code, what the output is, and what I
want it to be, for a better example.

I'm fairly new to ANTLR, any help or suggestions would really be
appreciated.  Thanks in advance.

--Rhalin

<pre>
/********    XMLParser.g ********/
header {
#include <iostream>
}

options {
	language="Cpp";
}

class XMLParser extends Parser;
options {
  k=2;
}


parsing
:
(rule)?
			{
				std::cout << "done parse!\n";
			}

		;

rule
:
(rulage)+ END
			{
				std::cout << "done string!\n";
			}
		
		;

rulage
:
OPEN (UNDEFINED_TOKEN)* CLOSE
			{
				std::cout << "<> found!\n";
				/* skip */
			}
		;

class XMLLexer extends Lexer;
options {
	filter=UNDEFINED_TOKEN;
	k=2;
}

WS
:
(' '
	|	'\t'
	|	'\n'
	|	'\r')
		{ _ttype = ANTLR_USE_NAMESPACE(antlr)Token::SKIP; }
	;

OPEN
:
'<'
		;

CLOSE
:
'>'
		;

END
	:	';'
		;


protected
UNDEFINED_TOKEN
:
.
				{
					std::cout << "Bad Token! No cookie!\n";
					/* skip */
				}
		;


/********  main.cpp  ********/
#include <iostream>
#include <sstream>
#include "XMLLexer.hpp"
#include "XMLParser.hpp"

void main()
{
  	ANTLR_USING_NAMESPACE(std)
	ANTLR_USING_NAMESPACE(antlr)
	istringstream data;
	data.str("<><");
	XMLLexer myXMLLexer(data);
	XMLParser myXMLParser(myXMLLexer);
    	myXMLParser.parsing();
	data.str("><>< k >;");
    	myXMLParser.parsing();
}
/********* output **********/
<> found!
line 1: expecting CLOSE, found ''
line 1: expecting END, found ''
done parse!
done parse!
/********* desired output **********/
<> found!
<> found!
<> found!
<> found!
Bad Token! No cookie!
<> found!
done string!
done parse!
/********* end **********/
</pre>


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list