[antlr-interest] Missing something basic about lexer tokens

Fri Nov 19 15:58:12 PST 2010

Hello,

I am working on a recognizer that processes a text file, each line of which starts with one of short list of about 20 characters (mostly either upper case or lower case letters, a few special chars), immediately followed by a "name" (chars or dash), a space or 2, and then various space-delimited stretches of text comprised of arbitrarily any ASCII character Except newline, followed by newline.

The first letter is significant - it indicates what sort of "command" each line is.

Here's a simplified version of the grammar, with just one of these "commands" specified:

grammar ElementAttributes;

options {
  language = Java;
}
@parser::header {}
@lexer::header {}

elementAttributes : elementAttributeCommand+ EOF;

/**
e.g.
Aname IMPLIED
*/

elementAttributeCommand : ACMD NAME SPACE+ ATTRTYPE NEWLINE;

ATTRTYPE : ('IMPLIED'|'CDATA'|'NOTATION'|'ENTITY'|'TOKEN'|'ID'|'DATA');
ACMD : 'A';
NEWLINE:    '\r'? '\n';
SPACE:      ' ';
NAME : (NAMESTARTCHAR NAMECHAR*);

fragment LOWERCASELETTER : ('a'..'z');
fragment UPPERCASELETTER : ('A'..'Z');
fragment DIGIT : ('0'..'9');
fragment DASH  : ('-');
fragment NAMESTARTCHAR : (LOWERCASELETTER | UPPERCASELETTER);
fragment NAMECHAR : (NAMESTARTCHAR | DIGIT | DASH);

If run on a file consisting only of the line (terminated with NEWLINE)
Aname IMPLIED

I get the following error:
line 1:0 required (...)+ loop did not match anything at input 'Aname'

 How should I be declaring the lexer rules so that 'A' at start of line is recognized as a command token, and yet still make it possible for the "NAME" immediately following it to be unambiguously recognized?

Thanks
sheila