[antlr-interest] Tokens and literals: how to avoid conflics?

Tue Jul 15 11:11:24 PDT 2008

On Tue, 2008-07-15 at 17:09 +0200, Gioele Barabucci wrote:

> Jim Idle wrote:
> > I really , once again, cannot stress too much the fact that new users
> > should not use the inline 'quote' rules in the parser. They really send
> > you down the wrong streets until you are completely familiar with the
> > parser/lexer process. I look at your grammar and see the obvious
> > problems, but I just don't see how new users would.
> Could you please point me to guides or tutorials about the ANTLR lexer and
> the correct "style" I should use to write token rules? I could not find
> anything on the net.

You are probably best buying the book, but read the WIki articles that
come up if you search for 'lexer' and find an example grammar that dopes
something close to what you want - these will all help you.

> 
> > ID    : 'ID' ;
> > IDENT : ('a'..'z' | 'A'..'Z')+ ;
> > HASH  : '#'   // Many things prefix with HASH, differentiate them here
> >                (  (FIX)=>FIX  { $type = FIX; }
> >                   | (IMP)=>IMP {$type = IMP; }
> >                   | // Neither keyword, sometimes HASH is just HASH and
> > not pounds
> >                )
> >             ;
> > 
> > Now, in the parser use teh token names:
> > 
> > stmt: ID S idName S (IMP|FIX) EOF ;
> > idName : HASH IDENT;
> 
> Thank you for this solution: I'll use in many similar cases I have in my
> grammar.
> 
> Sadly, this solution solves the problem only where there is a precise char
> that one can use to discriminate. What about this example where the text of
> a keyword can be used in other rules:
> 
> stmt: ID S simple_name S ('#IMP'|'#FIX') EOF;
> 
> simple_name: NAME;
> ID: 'id';
> NAME: ('a'..'z')+
> S: (' '|'\n')+
> 
> This grammar will recognise 'id ix #FIX' but will fail on 'id id #FIX' with
> the usual MismatchedTokenException. They keyword 'id' cannot be recognized
> as a NAME token.

simple_name: NAME
 | ID // keyword as ID

    ->IDENT[$ID]            // If you want to rewrite it in a tree
;

Sometimes you can use hoisted predicates (see the example in the
downloadable example grammars). Sometimes you might need a predicate in
a rule that calls it, to say "definately see this as a variable", such
as:

... something ((simple_name)=>simple_name)? ID ID ID

> 
> Is there a way to tell ANTLR "look for the characters 'id' only when in the
> ID token, in all the other cases classify it as NAME (or whatever fits
> it)"?

Only if you can determine this fact from the lexer context. Remember
that the lexer runs first, then the parser runs afterwards.

> 
> This happens quite often in my grammar (obviously this is just a simple
> test-case for my problems): I have many keywords that lose their special
> meaning once they are not in a certain position.
> 

Just list them all in your simple name  (tip: don't add lots a t once,
add a few, build the grammar, resolve any ambiguities it causes, add a
few more etc). Build everything in small steps so you can identify
errors you introduce, as they happen.

Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080715/02e8dd04/attachment.html