[antlr-interest] Literals, Predicates and Actions
Jim Idle
jimi at temporal-wave.com
Fri Mar 6 11:16:24 PST 2009
Julian Mensch wrote:
> Hi,
>
> I'm a newbie to ANTLR working on translating a long ACCENT
> grammer to ANTLR under the C target, and have a few
> questions about the use of literals to define tokens in combined
> lexer-parser grammars. I understand that literals in parser rules
> create implicit lexer rules, and I find this to be a very useful
> feature for naturalistic languages that have a set of keywords of
> notable size which can increase frequently, and include frequent
> alternatives.
>
> What I'm wondering is if I can somehow apply global predicates
>
No, you cannot do this. I strongly advise that you do not use literals
in your grammar. While at first it seems more intuitive and is perfectly
fine for simple grammars, as soon as you want to provide good error
messages, or walk a tree, you will find that they get in the way. You
won't know what T42 actually is, and it will even change names when you
add and change literals.
Before too long, it is like the scene in the Matrix where the guy says
"I don't see the code any more, I just see blond, brunette..." Looking
at LCURLY will mean exactly the same thing as '{'.
> Predicates for literals would also be really useful, in
> the case, for example, where you have a limited set
> of keywords that are universal to the language, but
> your ever-expanding larger set is only valid in some
> lexical circumstances. For example:
>
> @literals
> { isUniversalKeyword(GETTEXT()->chars) || inFullKeywordMode }?
>
Don't use these macros directly, use $text otherwise you will be subject
to the vagaries of me changing my mind ;-)
> I know there's no such thing as the "@literals"
> construct I'm showing here, but I'm wondering if there's
> any way to duplicate the effect I'm going for with it.
>
Well, it is just re-inventing the wheel really. I understand where you
are coming from, but if you go with what the tool does now, you will
soon find it all second nature.
> Currently I'm matching all keywords as IDENT and
> using string tables, setType() and tokens with 'fragment
use $type where you can of course. I tend not to use the ident method, I
just use an identifier rule that allows the keywords. Which approach is
best depends on preference and circumstance of course.
Jim
More information about the antlr-interest
mailing list