[antlr-interest] Handling wiki-style plain text with optional markup?

Wed Apr 1 18:18:02 PDT 2009

I'm still struggling to get this working.

For example, a Wikipedia "title" looks like this:

STARTOFLINE == title text == WHITESPACE? EOL

There are a bunch of other control characters ('*', '<', etc...) as well that can go into a title string. I need to be able to parse this:

                STARTOFLINE == title about == signs and * codes == EOL

As a title with the text "title about == signs and * codes".  And the following as a plain-text paragraph:

                STARTOFLINE Some text about == signs EOL

The problem is that every grammar I have come up with either parses everything as a paragraph and never ends up creating a "title", or can parse simple titles but becomes stuck on a title with an embedded '=='.

You can try this behaviour in the Wiki sandbox: http://en.wikipedia.org/wiki/Wikipedia:SANDBOX

Basically, I want similar behaviour: If no other rules can be matched, the entire line is a plain-text paragraph.

Peter Bertok
Technical Consultant
Experteq IT Services

peter.bertok at experteq.com<mailto:frank.mulcahy at experteq.com>
Mobile: 61 402 994404
PO Box 445, St Leonards, NSW 2065
www.experteq.com<http://www.experteq.com>

From: Indhu Bharathi [mailto:indhu.b at s7software.com]
Sent: Wednesday, 1 April 2009 8:09 PM
To: Peter Bertok
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Handling wiki-style plain text with optional markup?

Take a look at 'filter' mode of ANTLR lexer. This will let you skip text you are not interested in.

Posting a more specific example of what you need might help.

Peter Bertok wrote:

I'm working on a trivial embedded "wiki" style content management system for a web project, and I'm trying to design an ANTLR grammar for it, but I'm getting stuck.

How does one write a grammar that can match a variety of rules, including things like nested brackets and the like, but if it can't match a section of the input, it always has a "fall through" rule called something like "plainText" such that parsing never fails. I'd like it to take anything it doesn't understand and simply pass it through as a text node.

________________________________

List: http://www.antlr.org/mailman/listinfo/antlr-interest

Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090401/25181b56/attachment.html