[antlr-interest] lexing expression ('a'..'z')+ not matching single character input
Matt Harrison
matt at ebi.ac.uk
Tue Dec 12 09:10:31 PST 2006
Salute, fellow antlers.
I'm a recent convert to the world of language recognition/parsing using
ANTLR, although I have used Perl /python for "simple" parsing tasks for
many many man-months.
I am having trouble diagnosing why the (common) lexer expression
"('a'..'z')+" is not matching any single character input (eg: "n") in my
grammar. Is there any situations under which this expression should not
match a single character in the range 'a' - 'z'?
Thanks for your time.
Matt
~~~
The offending parser rule is as following:
substituent
: IDENTIFIER
(HYPHEN IDENTIFIER)*
;
The lexer is pretty basic:
class FooBarLexer extends Lexer;
options {
k=3; /* lookahead */
}
//~~~~~~~~~~~~~~~~~~~~ token separators & delimiters ~~~~~~~~~~~~~~~~~~~~~~//
COLON
options { paraphrase="a colon separator"; }
: ':'
;
COMMA
options { paraphrase="a comma"; }
: ','
;
HYPHEN
options { paraphrase="an internal linkage delimiter '-'"; }
: '-'
;
PIPE
options { paraphrase="a residue substitution delimiter"; }
: '|'
;
SEMICOLON
options { paraphrase="a residue/linkage token separator"; }
: ';'
;
LPARENTHESIS
options { paraphrase="a linkage delimiter"; }
: '('
;
RPARENTHESIS
options { paraphrase="a linkage delimiter"; }
: ')'
;
//~~~~~~~~~~~~~~~~~~~~~~~~~~~ identifiers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~//
INTEGER
options { paraphrase="a positive integer or zero"; }
: ('1'..'9') ('0'..'9')*
| '0'
;
IDENTIFIER
options { paraphrase="a residue name/identifier"; }
: ('a'..'z')+
;
//~~~~~~~~~~~~~~~~~~~~~~~ section type identifiers ~~~~~~~~~~~~~~~~~~~~~~~~//
RES
options { paraphrase="a RES (residue) section start identifier"; }
: "RES"
;
LIN
options { paraphrase="a LIN (linkage) section start identifier"; }
: "LIN"
;
PRO
options { paraphrase="a PRO (heterogeneity due to uncertainty) section start identifier"; }
: "PRO"
;
REP
options { paraphrase="a REP (repeat) section start identifier"; }
: "REP"
;
STA
options { paraphrase="a STA (heterogeneity due to a statistical distribution) section start identifier"; }
: "STA"
;
ISO
options { paraphrase="an ISO (isotope) section start identifier"; }
: "ISO"
;
AGL
options { paraphrase="an AGL (aglycon) section start identifier"; }
: "AGL"
;
CR
: ( '\r' '\n' )
| '\n' { newline(); $setType( Token.SKIP ); }
;
WS
: (' '| '\t' ) { $setType( Token.SKIP ); }
;
--
Dr Matt Harrison
BTech (Biotech) Hons PhD
Glycobiology Bioinformatician
European Bioinformatics Institute UK
http://www.ebi.ac.uk +44 (0)1223 492533
More information about the antlr-interest
mailing list