[antlr-interest] lexing expression ('a'..'z')+ not matching single character input

Matt Harrison matt at ebi.ac.uk
Wed Dec 13 03:03:29 PST 2006


Unfortunately, it doesn't. For some bizarre reason, ('a'..'z')+ 
stubbornly refuses to match any single alphabetic character, regardless 
of context; that is, I can call the rule 'substituent' below directly 
with a single character of input and it doesn't match, nor will it match 
if a single character 'substituent' occurs in the middle of a token stream.

Perhaps a bug in ANTLR? Surely this has got to be due to something else 
I am missing due to my inexperience with ANLTR, but I can't for the life 
of me discern what.

cheers,
Matt Harrison

ps: "identifiers" for my particular parsing problem are only lower-case, 
and indeed, allowing upper-case ids introduces non-determinism with all 
of the constant upper-case keywords defined elsewhere in the lexer.

Vinay Veeramachaneni wrote:
> Hi,
>  
> Your grammar seems to be fine. You must consider to include the 
> uppercase letters as identifiers too.
>  
> IDENTIFIER   options { paraphrase="a residue name/identifier"; }
>
>        :     ('a'..'z' | 'A'..'Z')+ ;
>
> This must solve the problem.
>  
> Regards,
> Vinay
>
>  
> On 12/12/06, *Matt Harrison* <matt at ebi.ac.uk <mailto:matt at ebi.ac.uk>> 
> wrote:
>
>     Salute, fellow antlers.
>
>     I'm a recent convert to the world of language recognition/parsing
>     using
>     ANTLR, although I have used Perl /python for "simple" parsing
>     tasks for
>     many many man-months.
>
>     I am having trouble diagnosing why the (common) lexer expression
>     "('a'..'z')+" is not matching any single character input (eg: "n")
>     in my
>     grammar. Is there any situations under which this expression
>     should not
>     match a single character in the range 'a' - 'z'?
>
>     Thanks for your time.
>     Matt
>
>     ~~~
>     The offending parser rule is as following:
>
>     substituent
>
>            :   IDENTIFIER
>
>                (HYPHEN IDENTIFIER)*
>
>            ;
>
>
>     The lexer is pretty basic:
>
>     class FooBarLexer extends Lexer;
>
>     options {
>
>        k=3;        /*  lookahead  */
>
>     }
>
>     //~~~~~~~~~~~~~~~~~~~~  token separators &
>     delimiters  ~~~~~~~~~~~~~~~~~~~~~~//
>
>
>
>     COLON
>
>            options { paraphrase="a colon separator"; }
>
>            :   ':'
>
>            ;
>
>
>
>     COMMA
>
>            options { paraphrase="a comma"; }
>
>            :     ','
>
>            ;
>
>     HYPHEN
>
>            options { paraphrase="an internal linkage delimiter '-'"; }
>
>            :     '-'
>
>            ;
>
>     PIPE
>
>            options { paraphrase="a residue substitution delimiter"; }
>
>            :     '|'
>
>            ;
>
>     SEMICOLON
>
>            options { paraphrase="a residue/linkage token separator"; }
>
>            :   ';'
>
>            ;
>
>
>
>     LPARENTHESIS
>
>            options { paraphrase="a linkage delimiter"; }
>
>            :   '('
>
>            ;
>
>
>
>     RPARENTHESIS
>
>            options { paraphrase="a linkage delimiter"; }
>
>            :   ')'
>
>            ;
>
>
>     //~~~~~~~~~~~~~~~~~~~~~~~~~~~ identifiers
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~//
>
>     INTEGER
>
>            options { paraphrase="a positive integer or zero"; }
>
>            :     ('1'..'9')  ('0'..'9')*
>
>            |   '0'
>
>            ;
>
>
>
>     IDENTIFIER
>
>            options { paraphrase="a residue name/identifier"; }
>
>            :     ('a'..'z')+
>
>            ;
>
>     //~~~~~~~~~~~~~~~~~~~~~~~  section type
>     identifiers  ~~~~~~~~~~~~~~~~~~~~~~~~//
>
>     RES
>
>            options { paraphrase="a RES (residue) section start
>     identifier"; }
>
>            :   "RES"
>
>            ;
>
>
>
>     LIN
>
>            options { paraphrase="a LIN (linkage) section start
>     identifier"; }
>
>            :   "LIN"
>
>            ;
>
>
>
>     PRO
>
>            options { paraphrase="a PRO (heterogeneity due to
>     uncertainty) section start identifier"; }
>
>            :   "PRO"
>
>            ;
>
>
>
>     REP
>
>            options { paraphrase="a REP (repeat) section start
>     identifier"; }
>
>            :   "REP"
>
>            ;
>
>
>
>     STA
>
>            options { paraphrase="a STA (heterogeneity due to a
>     statistical distribution) section start identifier"; }
>
>            :   "STA"
>
>            ;
>
>
>
>     ISO
>
>            options { paraphrase="an ISO (isotope) section start
>     identifier"; }
>
>            :   "ISO"
>
>            ;
>
>
>
>     AGL
>
>            options { paraphrase="an AGL (aglycon) section start
>     identifier"; }
>
>            :   "AGL"
>
>            ;
>
>
>
>     CR
>
>            : ( '\r' '\n' )
>
>            | '\n'                                  {   newline();
>     $setType( Token.SKIP );  }
>
>            ;
>
>
>
>     WS
>
>            : (' '| '\t' )                          {   $setType(
>     Token.SKIP );  }
>
>            ;
>
>
>
>
>     --
>     Dr Matt Harrison
>     BTech (Biotech) Hons PhD
>     Glycobiology Bioinformatician
>     European Bioinformatics Institute UK
>     http://www.ebi.ac.uk <http://www.ebi.ac.uk>   +44 (0)1223 492533
>
>


More information about the antlr-interest mailing list