[antlr-interest] yet another syntactic predicate problem

pcristip pcristip at yahoo.com
Wed Jun 4 06:47:43 PDT 2003


Hi,

I'd really appreciate some help on the following problem:

I have a grammar that is supposed to parse the body of a html file, 
and everything is ok except the fact that the data between the tags 
should be of two kinds:
1. a topic which is normal text except that it should start with a 
letter (in fact only a through e letters) followed by a '.' char 
(e.g." A.")
2. normal cdata which is the case if the text is not a topic

I managed to make this work except for the case when the text starts 
with spaces.

Now I have something like:
(in the parser)
topic		:	TOPICID^ topicbody
			;

topicbody	:
				(	options { greedy=true; }
					:
					text | font
				)*
			;
text		:	PCDATA ;


(in the lexer)

TOPICID
			:
				('a' | 'b' | 'c' | 'd' | 'e') '.'
			;


PCDATA
			:
				({ LA(2)!='.' }? 
('a'|'b'|'c'|'d'|'e'))	| ~('a'|'b'|'c'|'d'|'e'|'<'|'>')
				(
					options {
					
	generateAmbigWarnings=false;
					}
				:	'\r' '\n'	
	{newline();}
				|	'\r'		
	{newline();}
				|	'\n'		
	{newline();}
				|	~('<'|'\n'|'\r'|'"'|'>')
				)*
			;


which works ok for texts like "A. some text here" and "A normal text"
but if there are spaces in front like "   B. title" then the text is 
matched as data not as a topic.

I tried to solve this by modifing the topic rule but no luck. And 
thought that the best solution would be to use syntactic predicates 
because the lookahead is not fixed in this case (the number of spaces 
can be arbitrary before you can tell which rule to match).
So I got to this construct:

topic_or_answer : ((spaces)? TOPICID) => topic
                |  text
                ;

spaces          : WS ;

(and in lexer)
protected
WS			:	(
					options {
					
	generateAmbigWarnings=false;
					}
				:	' '
				|	'\t'
				|	'\n'	{ newline(); }
				|	"\r\n"	{ newline(); }
				|	'\r'	{ newline(); }
				)+
			;


which doesn't work (otherwise you wouldn't read these lines :) ).

Can someone give me a hint ? What did I do wrong ?

Thanks,
Chris




 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list