[antlr-interest] Stuck

Ryan Daum ryan at darksleep.com
Mon Aug 20 10:26:16 PDT 2007


Hi all,

I'm writing a fairly simple grammar for the following protocol:

http://www.belfry.com/fuzzball/trebuchet/mcp.html

However, I'm stuck on a problem at the lexer level that I can't seem to
solve.  I believe it's my final issue before I have a working parser.

Basically, I have a number of rules which can match a combination of
characters:

        fragment
        LINE 	:	(LINE_CHAR)* EOF;
        IDENT 	:	ALPHA (ALPHA|DIGIT|'-' ~(SPACE | COLON |
        OTHER_CHAR))* ;
        
        
        UNQUOTED_STRING 
        	:	SIMPLE_CHAR+;
        
        QUOTED_STRING 
        	:	'"' (SIMPLE_CHAR | SPACE | '\\' QUOTE_CHAR | COLON | STAR)*
        '"';
        
This is all fine, individually they work well.  However, in the rule: 

        messageContinue
          	:	STAR SPACE datatag SPACE IDENT COLON SPACE LINE 
          		-> ^(MESSAGE_CONTINUE datatag LINE);

Working against the following line:

        * 9b76 text: This is some sample text.

I always get a MismatchedTokenException because the parser seems to want
to turn everything after SPACE into an IDENT, rather than a line.  The
intention of "LINE" is just to collect all input after the SPACE in a
messageContinue; I do not want the rest of the lexer's rules to apply at
all.  

I'm not that adapt with this stuff, but I haven't had many problems
until now.  Can anybody help me with this?  Is this a token precedence
problem?  Is this even possible with Antlr v3?  I've attached the entire
grammar, which works inside AntlrWorks.  The test failure can be
replicated with the example line above.

Ryan

-------------- next part --------------
grammar Mcp;

options {
output=AST;
backtrack=true;
memoize=true;
backtrack=true;
filter=true;
}

tokens {
MESSAGE_START;
MESSAGE_CONTINUE;
MESSAGE_END;
KEY_VALUES;
KEY;
CONTINUE_KEY;
}

@header {
package com.thimbleware.cometMoo;
}

@lexer::header {
package com.thimbleware.cometMoo;
}


// Tree grammar

// Grammar

message        :	(messageStart  | messageContinue | messageEnd) ;
messageStart
	:	IDENT SPACE authKey keyVals? EOF -> ^(MESSAGE_START authKey keyVals?)
;

messageContinue
  	:	STAR SPACE datatag SPACE IDENT COLON SPACE LINE 
  		-> ^(MESSAGE_CONTINUE datatag LINE);

	
messageEnd	:	COLON SPACE datatag EOF;


authKey	:	UNQUOTED_STRING;
datatag	:	UNQUOTED_STRING;
keyVals	:	(SPACE keyval)+ -> ^(KEY_VALUES keyval+);
keyval	:	key !(COLON) !(SPACE) value ;

value	:	(UNQUOTED_STRING | QUOTED_STRING | IDENT)
	;


key	:	regularKey | continueKey;

regularKey
	:	IDENT -> ^(KEY IDENT);
continueKey
	:	IDENT STAR -> ^(CONTINUE_KEY IDENT);
	


// Lexer

fragment
LINE 	:	(LINE_CHAR)* EOF;
IDENT 	:	ALPHA (ALPHA|DIGIT|'-' ~(SPACE | COLON | OTHER_CHAR))* ;



UNQUOTED_STRING 
	:	SIMPLE_CHAR+;
	
QUOTED_STRING 
	:	'"' (SIMPLE_CHAR | SPACE | '\\' QUOTE_CHAR | COLON | STAR)* '"';


SPACE 	:	' ';

fragment
QUOTE_CHAR
	:	('"' | '\\');

fragment
SIMPLE_CHAR
	:	ALPHA | DIGIT | OTHER_CHAR;

fragment
OTHER_CHAR
	:	'-' | '~' | '`' | '!' | '@' |  '$' | '%' | '^'
        | '&' | '(' | ')' | '=' | '+' | '{' | '}' | '[' | ']' | '|' 
        | '\'' | ';' | '?' | '/' | '>' | '<' | '.' | ',' | '#'
	;

fragment
LINE_CHAR
	:	SIMPLE_CHAR | QUOTE_CHAR | SPACE | COLON | STAR;
	
STAR	:	'*';

fragment
ALPHA
	:	('A'..'Z'
	|	'a'..'z'
	|	'_')
	;
	



DIGIT	:	'0'..'9'
		;
		

COLON
	:	':'
	;



More information about the antlr-interest mailing list