[antlr-interest] languages without reserved words

Michael Brade brade at informatik.uni-muenchen.de
Wed Mar 8 03:42:45 PST 2006


On Wednesday 08 March 2006 11:33, Martin Probst wrote:
> > Not quite. There's a line "if (LA(1)==NCNAME)" in the code "to know if an
> > NCNAME comes up". The trick I would need is simply a
> > lexer.testLiterals=false; call right before it. That's all. I could
> > modify the generated code by hand, but that's something I'd rather avoid
> > for now.
>
> The problem is where to check for NCNAMEs and where not to. You
> certainly have some parts in your grammar where you expect NCNAMEs and
> some parts where you have to test for the operators. The knowledge where
> that is appropriate and where not is not available to ANTLR.
To be honest, I don't follow you here. The check for NCNAMEs is correct in 
every case, and I do expect NCNAMEs in all those places. 

I guess you also meant s/operators/keywords/? Then yes, this knowledge is 
available, since the grammar contains e.g. (simplified)

data_term : "declare" ("ns-prefix" NCNAME ASSIGN STRING)* LP (data_term)* RP
          | (...) => NCNAME (AT NCNAME)? (COLON NCNAME) LP (data_term)* RP
          | NCNAME
          ;

[LP = '(', RP = ')', guessing left empty for simplicity, see below]

and all NCNAMEs must not test for keywords (literals), although "declare" etc. 
must do so. The generated code could be modified to switch testLiterals on 
and off in each and every place where NCNAME is used.

> If it's really exactly one place, then this probably means your just
> referring there from exactly one place, i.e. the "identifier" rule is
> only accessed from one point. You can then switch off literal testing in
> the calling rule before the branching decision is made, e.g.
>
> fooRule:
>   BAR BAZ { lexer.testLiterals = off; } identifier;
The big problem with this is that guessing won't pick it up and will thus 
fail. In the above grammar I need the guessing in the second line (left 
empty) to see if there is at least an LP coming up.

But thanks for your help anyway!

Actually, I just had an idea: what about switching testLiterals off completely 
in the lexer and (in the generated parser code) in every place where there's 
an 
  LA(1)==SOME_KEYWORD 
replace the code with 
  LA(1)==NCNAME && testLiteralsTable(LA(1))==SOME_KEYWORD
?
That would work, no? And it could be done with a little script?

Cheers,
-- 
Michael Brade;                 KDE Developer, Student of Computer Science
  |-mail: echo brade !#|tr -d "c oh"|s\e\d 's/e/\@/2;s/$/.org/;s/bra/k/2'
  °--web: http://www.kde.org/people/michaelb.html

KDE 4: Beyond Your Expectations
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20060308/a90462eb/attachment-0001.bin


More information about the antlr-interest mailing list