[antlr-interest] Beginer Parsing wiki markup

pepone pepone pepone.onrez at gmail.com
Fri Apr 21 22:20:42 PDT 2006


I trying to build a lexer and parser for a wiki like language

i'm trying to parse links like [[http://www.google.com || google]]

the problem is that i don't know how to match www.google.com

i try
domain_name:
 (name(DOT)+ name
;


but when compile i get a warning like


ANTLR Parser Generator   Version 2.7.5 (20060420)   1989-2005 jGuru.com
wikigramar.g:28: warning:nondeterminism upon
wikigramar.g:28:     k==1:DIGIT,LETTER
wikigramar.g:28:     k==2:DIGIT,LETTER
wikigramar.g:28:     k==3:DIGIT,LETTER
wikigramar.g:28:     k==4:WS,OPTION_SEPARATOR,DIGIT,LETTER
wikigramar.g:28:     k==5:WS,OPTION_SEPARATOR,NEWLINE,DIGIT,LETTER
wikigramar.g:28:     k==6:WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
wikigramar.g:28:    
k==7:EOF,WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
wikigramar.g:28:    
k==8:EOF,WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
wikigramar.g:28:    
k==9:EOF,WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
wikigramar.g:28:    
k==10:EOF,WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
wikigramar.g:28:     between alt 1 and exit branch of block


/*===grammar begin====*/

header {
	#include <sstream>
	#include <iostream>
	#include <qdom.h>
}

options {
	language="Cpp";
}

class WikiParser extends Parser;

options {
	buildAST = true;
	exportVocab=WIKI;
	k = 10;
}
	
protocol:
	(HTTP_PROTOCOL)|(FTP_PROTOCOL)
	;

name:
	(word)+
	;
domain_name:
	(name(DOT))+(name)
	;
	
url:
	protocol(URL_SEPARATOR) (domain_name) (WS)? (OPTION_SEPARATOR)
	;

link:
	(WIKI_TAG_BEGIN^
	(url)?(word|NEWLINE|WS)+
	WIKI_TAG_END)
	;
	
word:
	((DIGIT)|(LETTER))
	;

/**
 * Lexer
 */
class WikiLexer extends Lexer;
options {	
	k = 7;
	exportVocab=WIKI;
}


DIGIT: ('0'..'9');

LETTER: ('a'..'z')|('A'..'Z');

NEWLINE
	options {
		generateAmbigWarnings=false;
	}
	:	'\r' | '\n';

WS: ' '|'\t';

WIKI_TAG_BEGIN:
	"[["
	;
	
WIKI_TAG_END:
	"]]"
	;

FTP_PROTOCOL:
	"ftp"
	;
	
HTTP_PROTOCOL:
	"http"
	;

URL_SEPARATOR:
	"://"
	;
DOT:
	'.'
	;
	
SLASH:
	'/'
	;
	
OPTION_SEPARATOR:
	"||"
	;

/*=======gramar end=================*/

--
play tetris http://pepone.on-rez.com/tetris
run gentoo http://gentoo-notes.blogspot.com/


More information about the antlr-interest mailing list