[antlr-interest] lexing expression ('a'..'z')+ not matching single character input

Matt Harrison matt at ebi.ac.uk
Tue Dec 12 09:10:31 PST 2006


Salute, fellow antlers.

I'm a recent convert to the world of language recognition/parsing using 
ANTLR, although I have used Perl /python for "simple" parsing tasks for 
many many man-months.

I am having trouble diagnosing why the (common) lexer expression 
"('a'..'z')+" is not matching any single character input (eg: "n") in my 
grammar. Is there any situations under which this expression should not 
match a single character in the range 'a' - 'z'?

Thanks for your time.
Matt

~~~
The offending parser rule is as following:

substituent

        :   IDENTIFIER 

            (HYPHEN IDENTIFIER)*

        ;


The lexer is pretty basic:

class FooBarLexer extends Lexer;

options {

    k=3;        /*  lookahead  */

}

//~~~~~~~~~~~~~~~~~~~~  token separators & delimiters  ~~~~~~~~~~~~~~~~~~~~~~//

    

COLON

        options { paraphrase="a colon separator"; }

        :   ':'

        ;

    

COMMA                

        options { paraphrase="a comma"; }

        :     ','

        ;

HYPHEN            

        options { paraphrase="an internal linkage delimiter '-'"; }

        :     '-' 

        ;

PIPE                

        options { paraphrase="a residue substitution delimiter"; }

        :     '|'

        ;

SEMICOLON

        options { paraphrase="a residue/linkage token separator"; }

        :   ';'

        ;

        

LPARENTHESIS

        options { paraphrase="a linkage delimiter"; }

        :   '('

        ;

        

RPARENTHESIS

        options { paraphrase="a linkage delimiter"; }

        :   ')'

        ;
 

//~~~~~~~~~~~~~~~~~~~~~~~~~~~ identifiers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~//

INTEGER

        options { paraphrase="a positive integer or zero"; }

        :     ('1'..'9')  ('0'..'9')*  

        |   '0'

        ;

        

IDENTIFIER                    

        options { paraphrase="a residue name/identifier"; }

        :     ('a'..'z')+

        ;

//~~~~~~~~~~~~~~~~~~~~~~~  section type identifiers  ~~~~~~~~~~~~~~~~~~~~~~~~//

RES

        options { paraphrase="a RES (residue) section start identifier"; }

        :   "RES"

        ;

        

LIN     

        options { paraphrase="a LIN (linkage) section start identifier"; }

        :   "LIN" 

        ;

        

PRO     

        options { paraphrase="a PRO (heterogeneity due to uncertainty) section start identifier"; }

        :   "PRO"

        ;

        

REP     

        options { paraphrase="a REP (repeat) section start identifier"; }

        :   "REP"

        ;

        

STA 

        options { paraphrase="a STA (heterogeneity due to a statistical distribution) section start identifier"; }

        :   "STA"

        ;

        

ISO

        options { paraphrase="an ISO (isotope) section start identifier"; }

        :   "ISO"

        ;

        

AGL

        options { paraphrase="an AGL (aglycon) section start identifier"; }

        :   "AGL"

        ;

    

CR

        : ( '\r' '\n' )

        | '\n'                                  {   newline(); $setType( Token.SKIP );  }

        ;

    

WS  

        : (' '| '\t' )                          {   $setType( Token.SKIP );  }

        ;

 


-- 
Dr Matt Harrison
BTech (Biotech) Hons PhD
Glycobiology Bioinformatician
European Bioinformatics Institute UK
http://www.ebi.ac.uk   +44 (0)1223 492533



More information about the antlr-interest mailing list