[antlr-interest] Handling wiki-style plain text with optional markup?
Peter Bertok
peter.bertok at experteq.com
Wed Apr 1 18:18:02 PDT 2009
I'm still struggling to get this working.
For example, a Wikipedia "title" looks like this:
STARTOFLINE == title text == WHITESPACE? EOL
There are a bunch of other control characters ('*', '<', etc...) as well that can go into a title string. I need to be able to parse this:
STARTOFLINE == title about == signs and * codes == EOL
As a title with the text "title about == signs and * codes". And the following as a plain-text paragraph:
STARTOFLINE Some text about == signs EOL
The problem is that every grammar I have come up with either parses everything as a paragraph and never ends up creating a "title", or can parse simple titles but becomes stuck on a title with an embedded '=='.
You can try this behaviour in the Wiki sandbox: http://en.wikipedia.org/wiki/Wikipedia:SANDBOX
Basically, I want similar behaviour: If no other rules can be matched, the entire line is a plain-text paragraph.
Peter Bertok
Technical Consultant
Experteq IT Services
peter.bertok at experteq.com<mailto:frank.mulcahy at experteq.com>
Mobile: 61 402 994404
PO Box 445, St Leonards, NSW 2065
www.experteq.com<http://www.experteq.com>
From: Indhu Bharathi [mailto:indhu.b at s7software.com]
Sent: Wednesday, 1 April 2009 8:09 PM
To: Peter Bertok
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Handling wiki-style plain text with optional markup?
Take a look at 'filter' mode of ANTLR lexer. This will let you skip text you are not interested in.
Posting a more specific example of what you need might help.
Peter Bertok wrote:
I'm working on a trivial embedded "wiki" style content management system for a web project, and I'm trying to design an ANTLR grammar for it, but I'm getting stuck.
How does one write a grammar that can match a variety of rules, including things like nested brackets and the like, but if it can't match a section of the input, it always has a "fall through" rule called something like "plainText" such that parsing never fails. I'd like it to take anything it doesn't understand and simply pass it through as a text node.
________________________________
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090401/25181b56/attachment.html
More information about the antlr-interest
mailing list