[antlr-interest] Missing something basic about lexer tokens
Sheila M. Morrissey
Sheila.Morrissey at ithaka.org
Fri Nov 19 15:58:12 PST 2010
Hello,
I am working on a recognizer that processes a text file, each line of which starts with one of short list of about 20 characters (mostly either upper case or lower case letters, a few special chars), immediately followed by a "name" (chars or dash), a space or 2, and then various space-delimited stretches of text comprised of arbitrarily any ASCII character Except newline, followed by newline.
The first letter is significant - it indicates what sort of "command" each line is.
Here's a simplified version of the grammar, with just one of these "commands" specified:
grammar ElementAttributes;
options {
language = Java;
}
@parser::header {}
@lexer::header {}
elementAttributes : elementAttributeCommand+ EOF;
/**
e.g.
Aname IMPLIED
*/
elementAttributeCommand : ACMD NAME SPACE+ ATTRTYPE NEWLINE;
ATTRTYPE : ('IMPLIED'|'CDATA'|'NOTATION'|'ENTITY'|'TOKEN'|'ID'|'DATA');
ACMD : 'A';
NEWLINE: '\r'? '\n';
SPACE: ' ';
NAME : (NAMESTARTCHAR NAMECHAR*);
fragment LOWERCASELETTER : ('a'..'z');
fragment UPPERCASELETTER : ('A'..'Z');
fragment DIGIT : ('0'..'9');
fragment DASH : ('-');
fragment NAMESTARTCHAR : (LOWERCASELETTER | UPPERCASELETTER);
fragment NAMECHAR : (NAMESTARTCHAR | DIGIT | DASH);
If run on a file consisting only of the line (terminated with NEWLINE)
Aname IMPLIED
I get the following error:
line 1:0 required (...)+ loop did not match anything at input 'Aname'
How should I be declaring the lexer rules so that 'A' at start of line is recognized as a command token, and yet still make it possible for the "NAME" immediately following it to be unambiguously recognized?
Thanks
sheila
More information about the antlr-interest
mailing list