[antlr-interest] xml grammar
Terence Parr
parrt at cs.usfca.edu
Wed Nov 16 09:44:45 PST 2005
On Nov 15, 2005, at 3:34 PM, Martin Probst wrote:
> Hi,
>
>> Sure. From my html.g file:
>>
>> OTABLE
>> : "<table"
>> ;
>
> But that's a parser for a particular XML application, e.g. it returns
> OTABLE as a token, and not QNAME vs. ELEMENT_CONTENT.
True, but imagine
TAG : '<' ID (WS (ATTR)*)? '>' ;
and you have XML tags don't you?
> If you want a
> generic XML parser, you'll run into that problem, as "foo" is a valid
> QNAME after a "<" or as an attribute name, but also valid
> ELEMENT_CONTENT in other places.
Yeah, I remember the PCDATA part being a hassle.
> Are there any plans on enhancing ANTLR Lexers by providing support for
> stateful lexers? I'm aware that it's doable using semantic predicates,
> but that's quite cumbersome. It could be really easy, e.g.
>
> state DEFAULT:
> ELEMENT_START -> ELEMENT_STATE:
> '<';
> state ELEMENT_STATE:
> QNAME -> ATTR_LIST_STATE:
> ...;
Well, yes, I've considered allowing you to specify a start rule for
the lexer so you can do context-sensitive lexing. Pretty cool, eh?
Only issue is, how do you call a random method in Java w/o function
pointers? Reflection is SLOOOOW and not supported in all targets...
> and so on. I guess it would be not so much work to implement that in
> ANTLR, and it would be a really big improvement for people that
> have to
> implement stateful lexers. Do you have any ideas on this?
Sure do! :) Use the start rule idea, but also you can simply invoke
another lexer to handle everything for PCDATA or do it the other way:
have a special lexer for inside the tag. Just call like an island
grammar, right?
Ter
More information about the antlr-interest
mailing list