[antlr-interest] lexing expression ('a'..'z')+ not matching single character input

Wed Dec 13 08:09:38 PST 2006

Hi.  What is the error message?  Note you'll need to have A..Z in  
IDENTIFIER if it is to match the keywords (upper case you have).

Ter
On Dec 13, 2006, at 3:03 AM, Matt Harrison wrote:

>
> Unfortunately, it doesn't. For some bizarre reason, ('a'..'z')+  
> stubbornly refuses to match any single alphabetic character,  
> regardless of context; that is, I can call the rule 'substituent'  
> below directly with a single character of input and it doesn't  
> match, nor will it match if a single character 'substituent' occurs  
> in the middle of a token stream.
>
> Perhaps a bug in ANTLR? Surely this has got to be due to something  
> else I am missing due to my inexperience with ANLTR, but I can't  
> for the life of me discern what.
>
> cheers,
> Matt Harrison
>
> ps: "identifiers" for my particular parsing problem are only lower- 
> case, and indeed, allowing upper-case ids introduces non- 
> determinism with all of the constant upper-case keywords defined  
> elsewhere in the lexer.
>
> Vinay Veeramachaneni wrote:
>> Hi,
>>  Your grammar seems to be fine. You must consider to include the  
>> uppercase letters as identifiers too.
>>  IDENTIFIER   options { paraphrase="a residue name/identifier"; }
>>
>>        :     ('a'..'z' | 'A'..'Z')+ ;
>>
>> This must solve the problem.
>>  Regards,
>> Vinay
>>
>>  On 12/12/06, *Matt Harrison* <matt at ebi.ac.uk  
>> <mailto:matt at ebi.ac.uk>> wrote:
>>
>>     Salute, fellow antlers.
>>
>>     I'm a recent convert to the world of language recognition/parsing
>>     using
>>     ANTLR, although I have used Perl /python for "simple" parsing
>>     tasks for
>>     many many man-months.
>>
>>     I am having trouble diagnosing why the (common) lexer expression
>>     "('a'..'z')+" is not matching any single character input (eg:  
>> "n")
>>     in my
>>     grammar. Is there any situations under which this expression
>>     should not
>>     match a single character in the range 'a' - 'z'?
>>
>>     Thanks for your time.
>>     Matt
>>
>>     ~~~
>>     The offending parser rule is as following:
>>
>>     substituent
>>
>>            :   IDENTIFIER
>>
>>                (HYPHEN IDENTIFIER)*
>>
>>            ;
>>
>>
>>     The lexer is pretty basic:
>>
>>     class FooBarLexer extends Lexer;
>>
>>     options {
>>
>>        k=3;        /*  lookahead  */
>>
>>     }
>>
>>     //~~~~~~~~~~~~~~~~~~~~  token separators &
>>     delimiters  ~~~~~~~~~~~~~~~~~~~~~~//
>>
>>
>>
>>     COLON
>>
>>            options { paraphrase="a colon separator"; }
>>
>>            :   ':'
>>
>>            ;
>>
>>
>>
>>     COMMA
>>
>>            options { paraphrase="a comma"; }
>>
>>            :     ','
>>
>>            ;
>>
>>     HYPHEN
>>
>>            options { paraphrase="an internal linkage delimiter  
>> '-'"; }
>>
>>            :     '-'
>>
>>            ;
>>
>>     PIPE
>>
>>            options { paraphrase="a residue substitution delimiter"; }
>>
>>            :     '|'
>>
>>            ;
>>
>>     SEMICOLON
>>
>>            options { paraphrase="a residue/linkage token  
>> separator"; }
>>
>>            :   ';'
>>
>>            ;
>>
>>
>>
>>     LPARENTHESIS
>>
>>            options { paraphrase="a linkage delimiter"; }
>>
>>            :   '('
>>
>>            ;
>>
>>
>>
>>     RPARENTHESIS
>>
>>            options { paraphrase="a linkage delimiter"; }
>>
>>            :   ')'
>>
>>            ;
>>
>>
>>     //~~~~~~~~~~~~~~~~~~~~~~~~~~~ identifiers
>>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~//
>>
>>     INTEGER
>>
>>            options { paraphrase="a positive integer or zero"; }
>>
>>            :     ('1'..'9')  ('0'..'9')*
>>
>>            |   '0'
>>
>>            ;
>>
>>
>>
>>     IDENTIFIER
>>
>>            options { paraphrase="a residue name/identifier"; }
>>
>>            :     ('a'..'z')+
>>
>>            ;
>>
>>     //~~~~~~~~~~~~~~~~~~~~~~~  section type
>>     identifiers  ~~~~~~~~~~~~~~~~~~~~~~~~//
>>
>>     RES
>>
>>            options { paraphrase="a RES (residue) section start
>>     identifier"; }
>>
>>            :   "RES"
>>
>>            ;
>>
>>
>>
>>     LIN
>>
>>            options { paraphrase="a LIN (linkage) section start
>>     identifier"; }
>>
>>            :   "LIN"
>>
>>            ;
>>
>>
>>
>>     PRO
>>
>>            options { paraphrase="a PRO (heterogeneity due to
>>     uncertainty) section start identifier"; }
>>
>>            :   "PRO"
>>
>>            ;
>>
>>
>>
>>     REP
>>
>>            options { paraphrase="a REP (repeat) section start
>>     identifier"; }
>>
>>            :   "REP"
>>
>>            ;
>>
>>
>>
>>     STA
>>
>>            options { paraphrase="a STA (heterogeneity due to a
>>     statistical distribution) section start identifier"; }
>>
>>            :   "STA"
>>
>>            ;
>>
>>
>>
>>     ISO
>>
>>            options { paraphrase="an ISO (isotope) section start
>>     identifier"; }
>>
>>            :   "ISO"
>>
>>            ;
>>
>>
>>
>>     AGL
>>
>>            options { paraphrase="an AGL (aglycon) section start
>>     identifier"; }
>>
>>            :   "AGL"
>>
>>            ;
>>
>>
>>
>>     CR
>>
>>            : ( '\r' '\n' )
>>
>>            | '\n'                                  {   newline();
>>     $setType( Token.SKIP );  }
>>
>>            ;
>>
>>
>>
>>     WS
>>
>>            : (' '| '\t' )                          {   $setType(
>>     Token.SKIP );  }
>>
>>            ;
>>
>>
>>
>>
>>     --
>>     Dr Matt Harrison
>>     BTech (Biotech) Hons PhD
>>     Glycobiology Bioinformatician
>>     European Bioinformatics Institute UK
>>     http://www.ebi.ac.uk <http://www.ebi.ac.uk>   +44 (0)1223 492533
>>
>>