[antlr-interest] languages without reserved words
Martin Probst
mail at martin-probst.com
Wed Mar 8 04:47:01 PST 2006
> > The problem is where to check for NCNAMEs and where not to. You
> > certainly have some parts in your grammar where you expect NCNAMEs and
> > some parts where you have to test for the operators. The knowledge where
> > that is appropriate and where not is not available to ANTLR.
>
> To be honest, I don't follow you here. The check for NCNAMEs is correct in
> every case, and I do expect NCNAMEs in all those places.
Sorry, my wording was a little bad. What I meant is "when to interpret a
character sequence which is both a valid NCNAME and a valid keyword like
an NCNAME or a keyword, respectively".
> I guess you also meant s/operators/keywords/? Then yes, this knowledge is
> available, since the grammar contains e.g. (simplified)
>
> data_term : "declare" ("ns-prefix" NCNAME ASSIGN STRING)* LP (data_term)* RP
> | (...) => NCNAME (AT NCNAME)? (COLON NCNAME) LP (data_term)* RP
> | NCNAME
> ;
So, at the start of data_term, "declare" is a keyword, and the optional
NCNAME cannot be "declare". For input "declare" "ns-prefix" then the
input "declare" is legal and an NCNAME. I think we'll end up with the
conclusion we already had - keyword free grammars are messy, and the
distinction between parser and lexer doesn't really hold for them.
> The big problem with this is that guessing won't pick it up and will thus
> fail. In the above grammar I need the guessing in the second line (left
> empty) to see if there is at least an LP coming up.
I think the only way to keep guessing working with such a language in
ANTLR 2 is to have a stateful lexer.
> But thanks for your help anyway!
>
> Actually, I just had an idea: what about switching testLiterals off completely
> in the lexer and (in the generated parser code) in every place where there's
> an
> LA(1)==SOME_KEYWORD
> replace the code with
> LA(1)==NCNAME && testLiteralsTable(LA(1))==SOME_KEYWORD
> ?
> That would work, no? And it could be done with a little script?
You could also write:
data_term: { LA(1).getText().equals("declare") }? NCNAME "ns-prefix" ...
| NCNAME
;
Martin
More information about the antlr-interest
mailing list