[antlr-interest] Beginer Parsing wiki markup
Prashant Deva
prashant.deva at gmail.com
Sat Apr 22 02:19:24 PDT 2006
>From a quick look, it seems you have not coded your lexer properly.
You are not ignoring whitespace.
also the 'Word' rule in your parser, i would put it inside the lexer.
I would suggest download an evaluation of antlr studio (placidsystems.com)
and use its lexer wizard to rapidly create the lexer without any errors.
--
Prashant Deva
Creator, ANTLR Studio
Founder, Placid Systems, www.placidsystems.com
On 4/22/06, pepone pepone <pepone.onrez at gmail.com> wrote:
>
> I trying to build a lexer and parser for a wiki like language
>
> i'm trying to parse links like [[http://www.google.com || google]]
>
> the problem is that i don't know how to match www.google.com
>
> i try
> domain_name:
> (name(DOT)+ name
> ;
>
>
> but when compile i get a warning like
>
>
> ANTLR Parser Generator Version 2.7.5 (20060420) 1989-2005 jGuru.com
> wikigramar.g:28: warning:nondeterminism upon
> wikigramar.g:28: k==1:DIGIT,LETTER
> wikigramar.g:28: k==2:DIGIT,LETTER
> wikigramar.g:28: k==3:DIGIT,LETTER
> wikigramar.g:28: k==4:WS,OPTION_SEPARATOR,DIGIT,LETTER
> wikigramar.g:28: k==5:WS,OPTION_SEPARATOR,NEWLINE,DIGIT,LETTER
> wikigramar.g:28:
> k==6:WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
> wikigramar.g:28:
> k==7:EOF,WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
> wikigramar.g:28:
> k==8:EOF,WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
> wikigramar.g:28:
> k==9:EOF,WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
> wikigramar.g:28:
> k==10:EOF,WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
> wikigramar.g:28: between alt 1 and exit branch of block
>
>
> /*===grammar begin====*/
>
> header {
> #include <sstream>
> #include <iostream>
> #include <qdom.h>
> }
>
> options {
> language="Cpp";
> }
>
> class WikiParser extends Parser;
>
> options {
> buildAST = true;
> exportVocab=WIKI;
> k = 10;
> }
>
> protocol:
> (HTTP_PROTOCOL)|(FTP_PROTOCOL)
> ;
>
> name:
> (word)+
> ;
> domain_name:
> (name(DOT))+(name)
> ;
>
> url:
> protocol(URL_SEPARATOR) (domain_name) (WS)? (OPTION_SEPARATOR)
> ;
>
> link:
> (WIKI_TAG_BEGIN^
> (url)?(word|NEWLINE|WS)+
> WIKI_TAG_END)
> ;
>
> word:
> ((DIGIT)|(LETTER))
> ;
>
> /**
> * Lexer
> */
> class WikiLexer extends Lexer;
> options {
> k = 7;
> exportVocab=WIKI;
> }
>
>
> DIGIT: ('0'..'9');
>
> LETTER: ('a'..'z')|('A'..'Z');
>
> NEWLINE
> options {
> generateAmbigWarnings=false;
> }
> : '\r' | '\n';
>
> WS: ' '|'\t';
>
> WIKI_TAG_BEGIN:
> "[["
> ;
>
> WIKI_TAG_END:
> "]]"
> ;
>
> FTP_PROTOCOL:
> "ftp"
> ;
>
> HTTP_PROTOCOL:
> "http"
> ;
>
> URL_SEPARATOR:
> "://"
> ;
> DOT:
> '.'
> ;
>
> SLASH:
> '/'
> ;
>
> OPTION_SEPARATOR:
> "||"
> ;
>
> /*=======gramar end=================*/
>
> --
> play tetris http://pepone.on-rez.com/tetris
> run gentoo http://gentoo-notes.blogspot.com/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060422/a4cc6284/attachment.html
More information about the antlr-interest
mailing list