[antlr-interest] Literals, Predicates and Actions

Fri Mar 6 11:16:24 PST 2009

Julian Mensch wrote:
>   Hi,
>
>   I'm a newbie to ANTLR working on translating a long ACCENT
> grammer to ANTLR under the C target, and have a few
> questions about the use of literals to define tokens in combined
> lexer-parser grammars. I understand that literals in parser rules
> create implicit lexer rules, and I find this to be a very useful
> feature for naturalistic languages that have a set of keywords of
> notable size which can increase frequently, and include frequent
> alternatives.
>
>   What I'm wondering is if I can somehow apply global predicates
>   
No, you cannot do this. I strongly advise that you do not use literals 
in your grammar. While at first it seems more intuitive and is perfectly 
fine for simple grammars, as soon as you want to provide good error 
messages, or walk a tree, you will find that they get in the way. You 
won't know what T42 actually is, and it will even change names when you 
add and change literals.

Before too long, it is like the scene in the Matrix where the guy says 
"I don't see the code any more, I just see blond, brunette..." Looking 
at LCURLY will mean exactly the same thing as '{'.

>   Predicates for literals would also be really useful, in
> the case, for example, where you have a limited set
> of keywords that are universal to the language, but
> your ever-expanding larger set is only valid in some
> lexical circumstances. For example:
>
> @literals
>   { isUniversalKeyword(GETTEXT()->chars) || inFullKeywordMode }?
>   
Don't use these macros directly, use $text otherwise you will be subject 
to the vagaries of me changing my mind ;-)
>   I know there's no such thing as the "@literals"
> construct I'm showing here, but I'm wondering if there's
> any way to duplicate the effect I'm going for with it.
>   
Well, it is just re-inventing the wheel really. I understand where you 
are coming from, but if you go with what the tool does now, you will 
soon find it all second nature.
> Currently I'm matching all keywords as IDENT and
> using string tables, setType() and tokens with 'fragment
use $type where you can of course. I tend not to use the ident method, I 
just use an identifier rule that allows the keywords. Which approach is 
best depends on preference and circumstance of course.

Jim