[antlr-interest] Is ANTLR suitable for wiki grammar parsing?
Collin VanDyck
collin.vandyck at hannonhill.com
Tue Jun 5 07:50:58 PDT 2007
Hi Tom,
Thanks for clarifying that! That helps out quite a bit. I have made
some progress with the grammar. Using lexer rules to define ASTERISK
('*'), UNDERLINE ('_'), and CHAR (.), I am able to parse
Hello
And it matches the "phrase" parser rule for each of the five
characters in the input stream.
Moving on to something like
*bold*
It fails here with ( options {greedy=false;} : CHAR)+, with a
mismatched token exception. It matches '*' for the start of the
'bolded' parser rule, and then 'o' for the last alternative in the
'phrase' rule, but then fails because the next character is not a '*'.
Changing the later alternative in the phrase rule to ( options
{greedy=true;} : CHAR)+ solves this.
However, I cannot match something like:
*bold* abc*de
As it fails because there is no following '*' after de.
And I think that this is essentially my problem. I do want something
like
*bold* abc*de
To be accepted, and i'd like for the *bold* to be matched in the
bolded parser rule, but since the rest of the line doesn't match, to
simply count abc*de as a regular phrase.
Is this possible?
grammar WikiGrammar;
wiki
: phrase+
;
phrase
: bolded
| underlined
| ( options {greedy=true;} : CHAR)+
;
bolded
: ASTERISK phrase ASTERISK
;
underlined
: UNDERLINE phrase UNDERLINE
;
ASTERISK
: '*'
;
UNDERLINE
: '_'
;
CHAR
: .
;
On Jun 5, 2007, at 10:27 AM, Thomas Brandon wrote:
> As phrase is a parser rule "." means any token rather than any
> character, as your only tokens are '*' and '_' this is all that
> will be matched. You need a lexer rule to deal with other characters.
>
> Tom.
>
> On 6/5/07, Collin VanDyck <collin.vandyck at hannonhill.com> wrote: Hi
>
> Thanks for your reply. I'll admit, even after reading the PDF, I'm a
> little confused on how to accomplish what I want. I tried using your
> suggestion, and tried this grammar:
>
> grammar WikiGrammar;
>
> wiki
> : phrase+
> ;
>
> phrase
> : bolded
> | underlined
> | ( options {greedy=false;} : .)+
> ;
>
> bolded
> : '*' phrase '*'
> ;
>
> underlined
> : '_' phrase '_'
> ;
>
>
> With the input
>
> "Hello"
>
> And I got the NoViableAltException.
>
> I'm a little confused as to figure out how to exactly accomplish
> this. Essentially, I just want to be able to spit out whatever input
> I receive, and be able to recognize recursive markup patterns. Any
> ideas on how I can get this example (with bold and underline) to do
> this?
>
> Many thanks
> Collin
>
>
> -----
> Collin VanDyck
> CTO - Hannon Hill
>
>
>
-----
Collin VanDyck
CTO - Hannon Hill
More information about the antlr-interest
mailing list