[antlr-interest] newbie greedy option question

stephane richard kabnot at gmail.com
Fri Aug 28 01:42:44 PDT 2009


Hi all.

I'm trying to build a simple xhtml recognizer (for whitespace
compression) in the purpose of learning antlr. Here's a sample of what
I like to match :

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
        <title>
            XHTML
            Example
        </title>
    </head>
    <body>
        <p>
            Please Choose a Day:
            <br /><br />
            <select name="day">
                <option selected="selected">Monday</option>
                <option>Tuesday</option>
                <option>Wednesday</option>
            </select>
        </p>
    </body>
</html>


This is the grammar :

grammar Html;

options {
    output=AST;
    ASTLabelType=CommonTree;
}

prog	: 	tag_element *
	;

element	:	tag_element
	| 	text_element
	;

tag_element
	: 	open_tag element* close_tag
	|	empty_tag
	;

open_tag:	OPEN_TAG name attribute* CLOSE_TAG
	;
	
close_tag
	:	OPEN_TAG '/' name CLOSE_TAG
	;

empty_tag
	:	OPEN_TAG name '/' CLOSE_TAG
	;

attribute
	:	namespace? ID '=' '"' (options{greedy=false;}: .)* '"'
	;

namespace
	:	ID ':'
	;


name	:	ID
	;
text_element	
	:	(~(OPEN_TAG) | WS)+
	;


ID  		: ('a'..'z'|'A'..'Z')+ ;
INT		: '0'..'9'+ ;
NEWLINE		: '\r'? '\n' ;
WS 		: (' '|'\t'|'\n'|'\r')+ {skip();} ;
OPEN_TAG	: '<';
CLOSE_TAG	: '>';


My problem is with the text_element rule. I'd like to match everything
until the recognizer find a OPEN_TAG, including whitespaces. While the
actuale rule work, this give me this error :

[10:32:12] warning(200): Html.g:43:21: Decision can match input such
as "WS" using multiple alternatives: 1, 2, 3
As a result, alternative(s) 3,2 were disabled for that input
[10:32:12] warning(200): Html.g:43:21: Decision can match input such
as "{CLOSE_TAG..ID, INT..':'}" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
[10:32:12] error(201): Html.g:43:21: The following alternatives can
never be matched: 2

How could I handle this case properly ?

Regard,
Kabnot


More information about the antlr-interest mailing list