[antlr-interest] languages without reserved words

Martin Probst mail at martin-probst.com
Wed Mar 8 04:47:01 PST 2006


> > The problem is where to check for NCNAMEs and where not to. You
> > certainly have some parts in your grammar where you expect NCNAMEs and
> > some parts where you have to test for the operators. The knowledge where
> > that is appropriate and where not is not available to ANTLR.
>
> To be honest, I don't follow you here. The check for NCNAMEs is correct in 
> every case, and I do expect NCNAMEs in all those places. 

Sorry, my wording was a little bad. What I meant is "when to interpret a
character sequence which is both a valid NCNAME and a valid keyword like
an NCNAME or a keyword, respectively".

> I guess you also meant s/operators/keywords/? Then yes, this knowledge is 
> available, since the grammar contains e.g. (simplified)
> 
> data_term : "declare" ("ns-prefix" NCNAME ASSIGN STRING)* LP (data_term)* RP
>           | (...) => NCNAME (AT NCNAME)? (COLON NCNAME) LP (data_term)* RP
>           | NCNAME
>           ;

So, at the start of data_term, "declare" is a keyword, and the optional
NCNAME cannot be "declare". For input "declare" "ns-prefix" then the
input "declare" is legal and an NCNAME. I think we'll end up with the
conclusion we already had - keyword free grammars are messy, and the
distinction between parser and lexer doesn't really hold for them.

> The big problem with this is that guessing won't pick it up and will thus 
> fail. In the above grammar I need the guessing in the second line (left 
> empty) to see if there is at least an LP coming up.

I think the only way to keep guessing working with such a language in
ANTLR 2 is to have a stateful lexer.

> But thanks for your help anyway!
> 
> Actually, I just had an idea: what about switching testLiterals off completely 
> in the lexer and (in the generated parser code) in every place where there's 
> an 
>   LA(1)==SOME_KEYWORD 
> replace the code with 
>   LA(1)==NCNAME && testLiteralsTable(LA(1))==SOME_KEYWORD
> ?
> That would work, no? And it could be done with a little script?

You could also write:

data_term: { LA(1).getText().equals("declare") }? NCNAME "ns-prefix" ...
	| NCNAME
	;

Martin



More information about the antlr-interest mailing list