[antlr-interest] lexing expression ('a'..'z')+ not matching single character input
Matt Harrison
matt at ebi.ac.uk
Wed Dec 13 03:03:29 PST 2006
Unfortunately, it doesn't. For some bizarre reason, ('a'..'z')+
stubbornly refuses to match any single alphabetic character, regardless
of context; that is, I can call the rule 'substituent' below directly
with a single character of input and it doesn't match, nor will it match
if a single character 'substituent' occurs in the middle of a token stream.
Perhaps a bug in ANTLR? Surely this has got to be due to something else
I am missing due to my inexperience with ANLTR, but I can't for the life
of me discern what.
cheers,
Matt Harrison
ps: "identifiers" for my particular parsing problem are only lower-case,
and indeed, allowing upper-case ids introduces non-determinism with all
of the constant upper-case keywords defined elsewhere in the lexer.
Vinay Veeramachaneni wrote:
> Hi,
>
> Your grammar seems to be fine. You must consider to include the
> uppercase letters as identifiers too.
>
> IDENTIFIER options { paraphrase="a residue name/identifier"; }
>
> : ('a'..'z' | 'A'..'Z')+ ;
>
> This must solve the problem.
>
> Regards,
> Vinay
>
>
> On 12/12/06, *Matt Harrison* <matt at ebi.ac.uk <mailto:matt at ebi.ac.uk>>
> wrote:
>
> Salute, fellow antlers.
>
> I'm a recent convert to the world of language recognition/parsing
> using
> ANTLR, although I have used Perl /python for "simple" parsing
> tasks for
> many many man-months.
>
> I am having trouble diagnosing why the (common) lexer expression
> "('a'..'z')+" is not matching any single character input (eg: "n")
> in my
> grammar. Is there any situations under which this expression
> should not
> match a single character in the range 'a' - 'z'?
>
> Thanks for your time.
> Matt
>
> ~~~
> The offending parser rule is as following:
>
> substituent
>
> : IDENTIFIER
>
> (HYPHEN IDENTIFIER)*
>
> ;
>
>
> The lexer is pretty basic:
>
> class FooBarLexer extends Lexer;
>
> options {
>
> k=3; /* lookahead */
>
> }
>
> //~~~~~~~~~~~~~~~~~~~~ token separators &
> delimiters ~~~~~~~~~~~~~~~~~~~~~~//
>
>
>
> COLON
>
> options { paraphrase="a colon separator"; }
>
> : ':'
>
> ;
>
>
>
> COMMA
>
> options { paraphrase="a comma"; }
>
> : ','
>
> ;
>
> HYPHEN
>
> options { paraphrase="an internal linkage delimiter '-'"; }
>
> : '-'
>
> ;
>
> PIPE
>
> options { paraphrase="a residue substitution delimiter"; }
>
> : '|'
>
> ;
>
> SEMICOLON
>
> options { paraphrase="a residue/linkage token separator"; }
>
> : ';'
>
> ;
>
>
>
> LPARENTHESIS
>
> options { paraphrase="a linkage delimiter"; }
>
> : '('
>
> ;
>
>
>
> RPARENTHESIS
>
> options { paraphrase="a linkage delimiter"; }
>
> : ')'
>
> ;
>
>
> //~~~~~~~~~~~~~~~~~~~~~~~~~~~ identifiers
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~//
>
> INTEGER
>
> options { paraphrase="a positive integer or zero"; }
>
> : ('1'..'9') ('0'..'9')*
>
> | '0'
>
> ;
>
>
>
> IDENTIFIER
>
> options { paraphrase="a residue name/identifier"; }
>
> : ('a'..'z')+
>
> ;
>
> //~~~~~~~~~~~~~~~~~~~~~~~ section type
> identifiers ~~~~~~~~~~~~~~~~~~~~~~~~//
>
> RES
>
> options { paraphrase="a RES (residue) section start
> identifier"; }
>
> : "RES"
>
> ;
>
>
>
> LIN
>
> options { paraphrase="a LIN (linkage) section start
> identifier"; }
>
> : "LIN"
>
> ;
>
>
>
> PRO
>
> options { paraphrase="a PRO (heterogeneity due to
> uncertainty) section start identifier"; }
>
> : "PRO"
>
> ;
>
>
>
> REP
>
> options { paraphrase="a REP (repeat) section start
> identifier"; }
>
> : "REP"
>
> ;
>
>
>
> STA
>
> options { paraphrase="a STA (heterogeneity due to a
> statistical distribution) section start identifier"; }
>
> : "STA"
>
> ;
>
>
>
> ISO
>
> options { paraphrase="an ISO (isotope) section start
> identifier"; }
>
> : "ISO"
>
> ;
>
>
>
> AGL
>
> options { paraphrase="an AGL (aglycon) section start
> identifier"; }
>
> : "AGL"
>
> ;
>
>
>
> CR
>
> : ( '\r' '\n' )
>
> | '\n' { newline();
> $setType( Token.SKIP ); }
>
> ;
>
>
>
> WS
>
> : (' '| '\t' ) { $setType(
> Token.SKIP ); }
>
> ;
>
>
>
>
> --
> Dr Matt Harrison
> BTech (Biotech) Hons PhD
> Glycobiology Bioinformatician
> European Bioinformatics Institute UK
> http://www.ebi.ac.uk <http://www.ebi.ac.uk> +44 (0)1223 492533
>
>
More information about the antlr-interest
mailing list