[antlr-interest] lexer question from newbie

Stuart smcg2297 at frii.com
Thu Dec 22 16:40:29 PST 2005


I am trying to use Antlr for the first time.  I have used 
yacc/lex only a couple times, and this is my first time 
using an LL parser so I am basically clueless... :-)

Below is my attempt at a lexer for a simple LaTeX-like
language.  It was fine until I added the ECHR rule.  
Now, I get a warning:
  latex.g: warning:lexical nondeterminism between rules CMD and TEXT upon
  latex.g:     k==1:'\\'
  latex.g:     k==2:'A'..'Z','a'..'z'
 
I know I do not understand LL parsing but I thought
that with k=2 the lexer could decide between an ECHR
and a CMD when it sees a '\\', and that any [a-zA-Z]'s 
that follow would be either TCHRs (ending up in a TEXT 
token) or part of a CMD, according to whether a CMD 
or TEXT/TCHR/ECHR was already being parsed.  
So I can't see where the ambiguity is coming from.
How can "\\A" be anything but a CMD?
Obviously I am missing something (probably very obvious).  
Can someone enlighten me, and more importantly, give 
me some hints on how to fix this? 

//------------------------------------
class LatexLexer extends Lexer;
options {
    k = 2; 
    charVocabulary='\u0000'..'\u007F'; // ascii 
    }
LCB : '{' ;
RCB : '}' ;
CMD : '\\' ( 'a'..'z' | 'A'..'Z' )+ ('*')? ;
protected
ECHR : '\\' (' ' | '&' | '$' | '%' | '{' | '}') ;
protected
TCHR : (~( '\\' | '{' | '}' | '[' | ']' | '\n')) ;
TEXT : (TCHR | ECHR | '\n' {$nl})+ ;
//-----------------------------------




More information about the antlr-interest mailing list