[antlr-interest] Beginer Parsing wiki markup

Prashant Deva prashant.deva at gmail.com
Sat Apr 22 02:19:24 PDT 2006


>From a quick look, it seems you have not coded your lexer properly.
You are not ignoring whitespace.
also the 'Word' rule in your parser, i would put it inside the lexer.

I would suggest download an evaluation of antlr studio (placidsystems.com)
and use its lexer wizard to rapidly create the lexer without any errors.


--
Prashant Deva
Creator, ANTLR Studio
Founder, Placid Systems, www.placidsystems.com

On 4/22/06, pepone pepone <pepone.onrez at gmail.com> wrote:
>
> I trying to build a lexer and parser for a wiki like language
>
> i'm trying to parse links like [[http://www.google.com || google]]
>
> the problem is that i don't know how to match www.google.com
>
> i try
> domain_name:
> (name(DOT)+ name
> ;
>
>
> but when compile i get a warning like
>
>
> ANTLR Parser Generator   Version 2.7.5 (20060420)   1989-2005 jGuru.com
> wikigramar.g:28: warning:nondeterminism upon
> wikigramar.g:28:     k==1:DIGIT,LETTER
> wikigramar.g:28:     k==2:DIGIT,LETTER
> wikigramar.g:28:     k==3:DIGIT,LETTER
> wikigramar.g:28:     k==4:WS,OPTION_SEPARATOR,DIGIT,LETTER
> wikigramar.g:28:     k==5:WS,OPTION_SEPARATOR,NEWLINE,DIGIT,LETTER
> wikigramar.g:28:
> k==6:WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
> wikigramar.g:28:
> k==7:EOF,WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
> wikigramar.g:28:
> k==8:EOF,WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
> wikigramar.g:28:
> k==9:EOF,WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
> wikigramar.g:28:
> k==10:EOF,WS,OPTION_SEPARATOR,NEWLINE,WIKI_TAG_END,DIGIT,LETTER
> wikigramar.g:28:     between alt 1 and exit branch of block
>
>
> /*===grammar begin====*/
>
> header {
>         #include <sstream>
>         #include <iostream>
>         #include <qdom.h>
> }
>
> options {
>         language="Cpp";
> }
>
> class WikiParser extends Parser;
>
> options {
>         buildAST = true;
>         exportVocab=WIKI;
>         k = 10;
> }
>
> protocol:
>         (HTTP_PROTOCOL)|(FTP_PROTOCOL)
>         ;
>
> name:
>         (word)+
>         ;
> domain_name:
>         (name(DOT))+(name)
>         ;
>
> url:
>         protocol(URL_SEPARATOR) (domain_name) (WS)? (OPTION_SEPARATOR)
>         ;
>
> link:
>         (WIKI_TAG_BEGIN^
>         (url)?(word|NEWLINE|WS)+
>         WIKI_TAG_END)
>         ;
>
> word:
>         ((DIGIT)|(LETTER))
>         ;
>
> /**
> * Lexer
> */
> class WikiLexer extends Lexer;
> options {
>         k = 7;
>         exportVocab=WIKI;
> }
>
>
> DIGIT: ('0'..'9');
>
> LETTER: ('a'..'z')|('A'..'Z');
>
> NEWLINE
>         options {
>                 generateAmbigWarnings=false;
>         }
>         :       '\r' | '\n';
>
> WS: ' '|'\t';
>
> WIKI_TAG_BEGIN:
>         "[["
>         ;
>
> WIKI_TAG_END:
>         "]]"
>         ;
>
> FTP_PROTOCOL:
>         "ftp"
>         ;
>
> HTTP_PROTOCOL:
>         "http"
>         ;
>
> URL_SEPARATOR:
>         "://"
>         ;
> DOT:
>         '.'
>         ;
>
> SLASH:
>         '/'
>         ;
>
> OPTION_SEPARATOR:
>         "||"
>         ;
>
> /*=======gramar end=================*/
>
> --
> play tetris http://pepone.on-rez.com/tetris
> run gentoo http://gentoo-notes.blogspot.com/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060422/a4cc6284/attachment.html


More information about the antlr-interest mailing list