[antlr-interest] Is ANTLR suitable for wiki grammar parsing?

Wed Jun 6 15:21:27 PDT 2007

I think it depends on what you mean by structure, what is really meant,
in terms of say a parser is whether you can super-impose a structure
that allows you to parse it, without this being so tortuous that it is
indeed pointing out that that is the wrong hammer. Here, I am not
convinced that you cannot do this reasonably with ANTLR, though a hand
crafted piece of code may be the smallest.

I think first you have to work out what the rules are for the language
properly, and not try to discover them as you go. For instance, is:
*bold phrase* allowed, or must this be *bold* *phrase* or *"bold
phrase"* I bet there is some limitation like that, otherwise any code
that say '*' anywhere would have to scan to the end of the text and what
happens if there is a genuine typo, you will get out of sync.

Then you move to the parser I think. I will spend a little time on this,
but for instance, if you start very small here, does the following work
just for the valid definitions of embolden? Is it anywhere close?
(Remember you cannot run this in interpretive mode in ANTLRWorks).

grammar wiki;

body	: text? EOF
	;

text 	: (BOLD DROSS+ BOLD)=>BOLD DROSS* BOLD
	| .+
	;

WS 	:	 ' ' | '\t' | '\n' | '\r' 	;
BOLD	:	'*' 				;
DROSS	: . 					;

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Collin VanDyck
> Sent: Wednesday, June 06, 2007 2:56 PM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Is ANTLR suitable for wiki grammar
> parsing?
> 
> 
> On Jun 6, 2007, at 2:26 PM, Randall R Schulz wrote:
> 
> > On Wednesday 06 June 2007 11:16, Martin d'Anjou wrote:
> >>> However, I cannot match something like:
> >>>
> >>> *bold* abc*de
> >>>
> >>> As it fails because there is no following '*' after de.
> >>>
> >>> And I think that this is essentially my problem.  I do want
> >>> something like
> >>>
> >>> *bold* abc*de
> >>>
> >>> To be accepted, and i'd like for the *bold* to be matched in the
> >>> bolded parser rule, but since the rest of the line doesn't match,
> to
> >>> simply count abc*de as a regular phrase.
> >>>
> >>> Is this possible?
> >>
> >> I am very interested in knowing if this is possible as well. I have
> >> many problems where input is very unstructured, and I am not
> >> convinced ANTLR is the right solution for those problems.
> >
> > My original feeling about the OP's problem is just this.
Context-free
> > grammars are all about structured. Rigid structure, precisely
> defined.
> > I don't see a parser generator as the tool of choice for loosely
> > structured or imprecisely defined inputs.
> >
> > The problem is that the number of rules you'd need and the
> > difficulty in
> > preventing unwanted interactions between those rules make this a
> > problem that verges on the insoluble with what a CFG parser
generator
> > gives you.
> >
> > IMO, of course.
> 
> Yes, this is what I'm beginning to feel is true about my quest to use
> ANTLR for this purpose.  No shame on ANTLR of course, it's seeming
> like it's simply a case of a great tool, but for a different job.  If
> anyone does have any recommendations on tools to accomplish what I'm
> trying to do, I would certainly appreciate it, though it is not my
> intent to throw this list's traffic off-topic.
> 
> -Collin
> 
> 
> -----
> Collin VanDyck
> CTO - Hannon Hill
>