[antlr-interest] Is ANTLR suitable for wiki grammar parsing?

Tue Jun 5 06:35:33 PDT 2007

Hi

Thanks for your reply. I'll admit, even after reading the PDF, I'm a  
little confused on how to accomplish what I want. I tried using your  
suggestion, and tried this grammar:

grammar WikiGrammar;

wiki
	: phrase+
	;

phrase
	: bolded
	| underlined
	| ( options {greedy=false;} : .)+
	;

bolded
	: '*' phrase '*'
	;

underlined
	: '_' phrase '_'
	;

With the input

"Hello"

And I got the NoViableAltException.

I'm a little confused as to figure out how to exactly accomplish  
this.  Essentially, I just want to be able to spit out whatever input  
I receive, and be able to recognize recursive markup patterns.  Any  
ideas on how I can get this example (with bold and underline) to do  
this?

Many thanks
Collin

> Hello,
>
> It is noteworthy to mention that my mail client (Mozilla Thunderbird)
> deals with it very well. Maybe having a look at their source could be
> useful (don't ask me where precisely though!).
>
> I see that you don't define any whitespace in your grammar. Maybe
> dealing with the input line by line could make things simpler?
>
> What about enabling backtracking? Why not define a non-greedy (.)+  
> rule
> for anychars? I think the latter would match when the other rules  
> don't.
> I'm not sure 100%, but it is my impression that the generated parser
> behaves a bit differently than when its in a different rule.
>
> Tell me what that gives:
>
>   phrase
>       : bolded
>       | underlined
>       | ( options {greedy=false;} : .)+ ;
>       ;
>
> MA
>
> Collin VanDyck wrote:
> > Hello
> >
> > I'm trying to evaluate ANTLR to determine whether or not it would  
> be a
> > good fit for a wiki that we're currently developing.
> >
> > Essentially, the question boils down to how elegantly it would  
> handle a
> > wide variety of somewhat unstructured input.  In other words,  
> users are
> > going to be entering in rather freeform content (i.e. copying and
> > pasting form Word or some other character source), and I want  
> ANTLR to
> > be able to accept all of the input but match special sequences.
> >
> > An example of this would be:
> >
> > "This is some *bold* wiki content that might also be _underlined_ in
> > places"
> >
> > The special rules would simply output each character that doesn't  
> fall
> > into a special rule, and then to recognize *bold* and _underlined_
> > specially.
> >
> > I've written a small ANTLR grammar which is able to parse this, but
> > fails pretty quickly when you do things like:
> >
> > "This is some *irregular** input_"
> >
> > In the latter case, I'd really just like for the first  
> *irregular* to be
> > parsed as a bolded word, and since the other characters don't have
> > closing symbols, to be able to just treat them as fairly regular
> > characters like 'a', 'b', 'c', etc.
> >
> > Is it possible and reasonable to use ANTLR for this purpose?  Can I
> > create a grammar which will accept ANYTHING, and simply be able  
> to parse
> > out the bits and pieces that are interesting?
> >
> > I'm pasting in the grammar I created.  I apologize in advance for  
> the
> > incorrectness of it.
> >
> > -Collin
> >
> > ------------------
> >
> > grammar WikiGrammar;
> >
> > wiki
> >     : phrase+
> >     ;
> >
> > phrase
> >     : bolded
> >     | underlined
> >     | anychars
> >     ;
> >
> > bolded
> >     : ASTERISK phrase ASTERISK
> >     ;
> >
> > underlined
> >     : UNDERSCORE phrase UNDERSCORE
> >     ;
> >
> > anychars
> >     : (CHAR)+
> >     ;
> >
> > UNDERSCORE
> >     : '_'
> >     ;
> >
> > ASTERISK
> >     : '*'
> >     ;
> >
> > CHAR
> >     : .
> >     ;
> >
> >
> >
> >

-----
Collin VanDyck
CTO - Hannon Hill