[antlr-interest] Lexer context problem

Harald M. Müller harald_m_mueller at gmx.de
Sat Dec 1 01:27:44 PST 2007


Hi Keith -

you wrote:
> 
> Could someone explain what I'm doing wrong here

Forgive me being blunt - but what you are doing wrong is to use that
predicate at all. What you do is a BIG SMELL: Controlling the sequencing of
your LEXER by knowledge about the order of symbols on the PARSER level.
Lexing (tokenizing) is based on the assumption that you do it WITHOUT
knowing anything about the use of the tokens (words) in the grammar. Period.

(There are special cases where this does not work - some have been cited
here; and eoms way of sometimes(rarely!) passing in some information to the
lexer at a defined time would be helpful. But usual languages have just a
handful of gimmicks of that sort; C#, AFAIK, has 2 [>> and keyword
identifiers]; Pascal hat 1 [the .. from your example]).

Ok - to get inteo the details: The problem with your grammar is that

   ..

is a valid name (I'm not sure you want this - but maybe it has do be like
that; also ...... is a valid name). That's the reason why "NAME jumps in".
There are at least four ways out - in the first three, throw away that
brackets flag:

a) Define that sequences of periods are not valid names (e.g. by removing
the dot in the first parenthesis of NAME).
b) Define that .. is not name by moving DOTDOT *before* rule NAME.
c) If .. must be name at some places, use my "lexing parser" idea - so
define

     DOTDOT : .. ;
     NAME: ...as you have it...

In the parser, say

     name : NAME | DOTDOT ;

and now continue e.g. with

     tuple : name OSB INT DOTDOT INT CSB ;

d) If you are not yet happy then, first go to the definer of your language
and ask what the heck this '..' name is supposed to be and mean, and
convince him or her that you want to do a) or b) or c). If this does not
work: Use that brackets flag WITH =>:

NAME : {!brackets}? => ...what you have...
     | '..' { $type = DOTDOT; }
     ;

- and expect to test for a day; and to write a long comment on why you did
not use a) or b) or c).

Hope this helps!

Regards
Harald M.


> 
> Giving the input "Abc[1..2]" to the following
> -----
> grammar Test;
> 
> @lexer::members {
> 	boolean brackets;
> }
> 
> tuple	:	NAME OSB INT DOTDOT INT CSB;
> 
> NAME	
> 	:	{!brackets}?  
> ('A'..'Z'|'a'..'z'|'.'|'_')('0'..'9'|'A'..'Z'|'a'..'z'|'.'|'_')+
> 	;
> 
> OSB
> 	:	'[' {brackets=true;}
> 	;
> 
> DOTDOT
> 	:	{brackets}? '..'
> 	;
> 
> CSB
> 	:	']' {brackets=false;}
> 	;
> 
> INT
> 	:	'0' | ('1'..'9')('0'..'9')*
> 	;
> -----
> I always get an error "ine 1:6 rule NAME failed predicate: 
> {!brackets}?"
> 
> NAME seems to be jumping in when (as far as my understanding 
> goes (which isn't very far) it shouldn't. The Open Square 
> Brackets (OSB) sets 'bracket' true so NAME should not be 
> recognised. Strangely if I reduce 'tuple' to:
> 
> 	tuple	:	NAME OSB INT DOTDOT CSB ; / with the 
> input "Abc[1..]" works
> 
> but..
> 
> 	tuple	:	NAME OSB DOTDOT INT CSB ; // with the 
> input "Abc[..1]" fails??
> 
> I've just bought the book and scanned that for an answer but 
> can't find anything.
> 
> Thanks in advance
> Keith
> 



More information about the antlr-interest mailing list