[antlr-interest] Is ANTLR suitable for wiki grammar parsing?

Collin VanDyck collin.vandyck at hannonhill.com
Tue Jun 5 07:50:58 PDT 2007


Hi Tom,

Thanks for clarifying that!  That helps out quite a bit.  I have made  
some progress with the grammar. Using lexer rules to define ASTERISK  
('*'), UNDERLINE ('_'), and CHAR (.), I am able to parse

Hello

And it matches the "phrase" parser rule for each of the five  
characters in the input stream.

Moving on to something like

*bold*

It fails here with ( options {greedy=false;}  : CHAR)+, with a  
mismatched token exception.  It matches '*' for the start of the  
'bolded' parser rule, and then 'o' for the last alternative in the  
'phrase' rule, but then fails because the next character is not a '*'.

Changing the later alternative in the phrase rule to ( options  
{greedy=true;} : CHAR)+ solves this.

However, I cannot match something like:

*bold* abc*de

As it fails because there is no following '*' after de.

And I think that this is essentially my problem.  I do want something  
like

*bold* abc*de

To be accepted, and i'd like for the *bold* to be matched in the  
bolded parser rule, but since the rest of the line doesn't match, to  
simply count abc*de as a regular phrase.

Is this possible?



grammar WikiGrammar;

wiki
	: phrase+
	;

phrase
	: bolded
	| underlined
	| ( options {greedy=true;} : CHAR)+
	;
	
bolded
	: ASTERISK phrase ASTERISK
	;
	
underlined
	: UNDERLINE phrase UNDERLINE
	;
	
ASTERISK
	: '*'
	;
	
UNDERLINE
	: '_'
	;

CHAR
	: .
	;
	
	


On Jun 5, 2007, at 10:27 AM, Thomas Brandon wrote:

> As phrase is a parser rule "." means any token rather than any  
> character, as your only tokens are '*' and '_' this is all that  
> will be matched. You need a lexer rule to deal with other characters.
>
> Tom.
>
> On 6/5/07, Collin VanDyck <collin.vandyck at hannonhill.com> wrote: Hi
>
> Thanks for your reply. I'll admit, even after reading the PDF, I'm a
> little confused on how to accomplish what I want. I tried using your
> suggestion, and tried this grammar:
>
> grammar WikiGrammar;
>
> wiki
>         : phrase+
>         ;
>
> phrase
>         : bolded
>         | underlined
>         | ( options {greedy=false;} : .)+
>         ;
>
> bolded
>         : '*' phrase '*'
>         ;
>
> underlined
>         : '_' phrase '_'
>         ;
>
>
> With the input
>
> "Hello"
>
> And I got the NoViableAltException.
>
> I'm a little confused as to figure out how to exactly accomplish
> this.  Essentially, I just want to be able to spit out whatever input
> I receive, and be able to recognize recursive markup patterns.  Any
> ideas on how I can get this example (with bold and underline) to do
> this?
>
> Many thanks
> Collin
>
>
> -----
> Collin VanDyck
> CTO - Hannon Hill
>
>
>



-----
Collin VanDyck
CTO - Hannon Hill




More information about the antlr-interest mailing list