[antlr-interest] Is ANTLR suitable for wiki grammar parsing?

Wed May 23 11:26:16 PDT 2007

Hello

I'm trying to evaluate ANTLR to determine whether or not it would be  
a good fit for a wiki that we're currently developing.

Essentially, the question boils down to how elegantly it would handle  
a wide variety of somewhat unstructured input.  In other words, users  
are going to be entering in rather freeform content (i.e. copying and  
pasting form Word or some other character source), and I want ANTLR  
to be able to accept all of the input but match special sequences.

An example of this would be:

"This is some *bold* wiki content that might also be _underlined_ in  
places"

The special rules would simply output each character that doesn't  
fall into a special rule, and then to recognize *bold* and  
_underlined_ specially.

I've written a small ANTLR grammar which is able to parse this, but  
fails pretty quickly when you do things like:

"This is some *irregular** input_"

In the latter case, I'd really just like for the first *irregular* to  
be parsed as a bolded word, and since the other characters don't have  
closing symbols, to be able to just treat them as fairly regular  
characters like 'a', 'b', 'c', etc.

Is it possible and reasonable to use ANTLR for this purpose?  Can I  
create a grammar which will accept ANYTHING, and simply be able to  
parse out the bits and pieces that are interesting?

I'm pasting in the grammar I created.  I apologize in advance for the  
incorrectness of it.

-Collin

------------------

grammar WikiGrammar;

wiki
	: phrase+
	;

phrase
	: bolded
	| underlined
	| anychars
	;

bolded
	: ASTERISK phrase ASTERISK
	;

underlined
	: UNDERSCORE phrase UNDERSCORE
	;

anychars
	: (CHAR)+
	;

UNDERSCORE
	: '_'
	;	

ASTERISK
	: '*'
	;

CHAR
	: .
	;