[antlr-interest] xml grammar

Terence Parr parrt at cs.usfca.edu
Wed Nov 16 09:44:45 PST 2005


On Nov 15, 2005, at 3:34 PM, Martin Probst wrote:

> Hi,
>
>> Sure.  From my html.g file:
>>
>> OTABLE	
>> 	:	"<table"
>> 	;
>
> But that's a parser for a particular XML application, e.g. it returns
> OTABLE as a token, and not QNAME vs. ELEMENT_CONTENT.

True, but imagine

TAG : '<' ID  (WS (ATTR)*)? '>' ;

and you have XML tags don't you?

> If you want a
> generic XML parser, you'll run into that problem, as "foo" is a valid
> QNAME after a "<" or as an attribute name, but also valid
> ELEMENT_CONTENT in other places.

Yeah, I remember the PCDATA part being a hassle.

> Are there any plans on enhancing ANTLR Lexers by providing support for
> stateful lexers? I'm aware that it's doable using semantic predicates,
> but that's quite cumbersome. It could be really easy, e.g.
>
> state DEFAULT:
>   ELEMENT_START -> ELEMENT_STATE:
>     '<';
> state ELEMENT_STATE:
>   QNAME -> ATTR_LIST_STATE:
>     ...;

Well, yes, I've considered allowing you to specify a start rule for  
the lexer so you can do context-sensitive lexing.  Pretty cool, eh?   
Only issue is, how do you call a random method in Java w/o function  
pointers?  Reflection is SLOOOOW and not supported in all targets...

> and so on. I guess it would be not so much work to implement that in
> ANTLR, and it would be a really big improvement for people that  
> have to
> implement stateful lexers. Do you have any ideas on this?

Sure do! :)  Use the start rule idea, but also you can simply invoke  
another lexer to handle everything for PCDATA or do it the other way:  
have a special lexer for inside the tag.  Just call like an island  
grammar, right?

Ter


More information about the antlr-interest mailing list