[antlr-interest] xml grammar

Martin Probst mail at martin-probst.com
Tue Nov 15 15:34:53 PST 2005


Hi,

> Sure.  From my html.g file:
> 
> OTABLE	
> 	:	"<table" (WS (ATTR)*)? '>'
> 	;

But that's a parser for a particular XML application, e.g. it returns
OTABLE as a token, and not QNAME vs. ELEMENT_CONTENT. If you want a
generic XML parser, you'll run into that problem, as "foo" is a valid
QNAME after a "<" or as an attribute name, but also valid
ELEMENT_CONTENT in other places.

Are there any plans on enhancing ANTLR Lexers by providing support for
stateful lexers? I'm aware that it's doable using semantic predicates,
but that's quite cumbersome. It could be really easy, e.g.

state DEFAULT:
  ELEMENT_START -> ELEMENT_STATE:
    '<';
state ELEMENT_STATE:
  QNAME -> ATTR_LIST_STATE:
    ...;

and so on. I guess it would be not so much work to implement that in
ANTLR, and it would be a really big improvement for people that have to
implement stateful lexers. Do you have any ideas on this? I would be
interested in doing that, but I certainly won't have time until April
next year ...

Martin



More information about the antlr-interest mailing list