[antlr-interest] howto ignore unknown tokenstreams/recordsets

Mon Jan 17 13:01:41 PST 2005

> This one matches everything that consists of only letters. The
> testLiterals option makes sure the items added in the tokens section
> get recognized as such (before returning from the ID rule antlr checks
> against entries in the tokens table). E.g. they get passed to the
> parser as A .. D and the unknown tokens get passed as ID. You could
> use that to make the catch all rule. At least that should be the
> general idea I think.

Hi,

thx it seams to work but I get some nondeterminism warnings. How can I
remove them? Are the warnings relevant?
thx
Oliver

-- input --
A 1 2;
B 1;
X 1 2;
C "abc";
d 1 "abc";
Y "test" 123;

-- grammar ---
...    
class MyLexer extends Lexer;

options 
{
    k=2;
    charVocabulary='\u0000'..'\u007F';
    caseSensitive=false;
    caseSensitiveLiterals=false;
    testLiterals = false;
}

tokens
{
	A="a";
	B="b";
	C="c";
	D="d";
}

NUM : ( '0'..'9' )+;

IDENT options { testLiterals = true; } : ( 'a'..'z' ) ( '0'..'9' | 'a'..'z'
)*;

STRING : '"'!
    ( ~( '\'' | '"' | '\n' | '\r' ) )*
    ( '"'!
    | // nothing -- write error message
    )
;

EODS : ';'; 

DELIM 
	: ( ' '
	| '\t'
	| '\f'
	|	( "\r\n"
		| '\r'
		| '\n'
		)
		{ newline(); }
	)
	{ $setType(antlr::Token::SKIP); }
;

class MyParser extends Parser;

parse : datasets EOF;

datasets : ( headers EODS )* ( section1 EODS )*;

headers : a | b	| ign;

section1 : c | d | ign;

a : A NUM NUM;
b : B NUM;
c : C STRING;
d : D NUM STRING;
ign : IDENT ( ~( EODS ) )* ;

-- 
Sparen beginnt mit GMX DSL: http://www.gmx.net/de/go/dsl