[antlr-interest] Is ANTLR suitable for wiki grammar parsing?
Collin VanDyck
collin.vandyck at hannonhill.com
Tue Jun 5 06:35:33 PDT 2007
Hi
Thanks for your reply. I'll admit, even after reading the PDF, I'm a
little confused on how to accomplish what I want. I tried using your
suggestion, and tried this grammar:
grammar WikiGrammar;
wiki
: phrase+
;
phrase
: bolded
| underlined
| ( options {greedy=false;} : .)+
;
bolded
: '*' phrase '*'
;
underlined
: '_' phrase '_'
;
With the input
"Hello"
And I got the NoViableAltException.
I'm a little confused as to figure out how to exactly accomplish
this. Essentially, I just want to be able to spit out whatever input
I receive, and be able to recognize recursive markup patterns. Any
ideas on how I can get this example (with bold and underline) to do
this?
Many thanks
Collin
> Hello,
>
> It is noteworthy to mention that my mail client (Mozilla Thunderbird)
> deals with it very well. Maybe having a look at their source could be
> useful (don't ask me where precisely though!).
>
> I see that you don't define any whitespace in your grammar. Maybe
> dealing with the input line by line could make things simpler?
>
> What about enabling backtracking? Why not define a non-greedy (.)+
> rule
> for anychars? I think the latter would match when the other rules
> don't.
> I'm not sure 100%, but it is my impression that the generated parser
> behaves a bit differently than when its in a different rule.
>
> Tell me what that gives:
>
> phrase
> : bolded
> | underlined
> | ( options {greedy=false;} : .)+ ;
> ;
>
> MA
>
> Collin VanDyck wrote:
> > Hello
> >
> > I'm trying to evaluate ANTLR to determine whether or not it would
> be a
> > good fit for a wiki that we're currently developing.
> >
> > Essentially, the question boils down to how elegantly it would
> handle a
> > wide variety of somewhat unstructured input. In other words,
> users are
> > going to be entering in rather freeform content (i.e. copying and
> > pasting form Word or some other character source), and I want
> ANTLR to
> > be able to accept all of the input but match special sequences.
> >
> > An example of this would be:
> >
> > "This is some *bold* wiki content that might also be _underlined_ in
> > places"
> >
> > The special rules would simply output each character that doesn't
> fall
> > into a special rule, and then to recognize *bold* and _underlined_
> > specially.
> >
> > I've written a small ANTLR grammar which is able to parse this, but
> > fails pretty quickly when you do things like:
> >
> > "This is some *irregular** input_"
> >
> > In the latter case, I'd really just like for the first
> *irregular* to be
> > parsed as a bolded word, and since the other characters don't have
> > closing symbols, to be able to just treat them as fairly regular
> > characters like 'a', 'b', 'c', etc.
> >
> > Is it possible and reasonable to use ANTLR for this purpose? Can I
> > create a grammar which will accept ANYTHING, and simply be able
> to parse
> > out the bits and pieces that are interesting?
> >
> > I'm pasting in the grammar I created. I apologize in advance for
> the
> > incorrectness of it.
> >
> > -Collin
> >
> > ------------------
> >
> > grammar WikiGrammar;
> >
> > wiki
> > : phrase+
> > ;
> >
> > phrase
> > : bolded
> > | underlined
> > | anychars
> > ;
> >
> > bolded
> > : ASTERISK phrase ASTERISK
> > ;
> >
> > underlined
> > : UNDERSCORE phrase UNDERSCORE
> > ;
> >
> > anychars
> > : (CHAR)+
> > ;
> >
> > UNDERSCORE
> > : '_'
> > ;
> >
> > ASTERISK
> > : '*'
> > ;
> >
> > CHAR
> > : .
> > ;
> >
> >
> >
> >
-----
Collin VanDyck
CTO - Hannon Hill
More information about the antlr-interest
mailing list