[antlr-interest] simple URL extractor

bace.spam at gmx.net bace.spam at gmx.net
Tue Apr 24 00:34:55 PDT 2007


Hi all,

I want to extract an URL from a text with antlr v3. To separate the URL from the remaining, I want to search for each occurrence of 'http://'.

So I defined the lexer rule:
HTTP_INDICATOR : 'http://';

and parser rule:
url : HTTP_INDICATOR host (port)? (SLASH path)*;


If I uses this definition, and input something like
'text http://www.goolge.com/index.html further text'
then the parser doesn't work as I imagined. The error message is that 't' was expected instead of 'm'. (The parser wants to match the 'html' with 'http://') But why?

Has anyone an idea how I can tell the lexer to search for the 'http://'?

And as I tried to put the 'http://' in this parser rule (instead of the HTTP_INDICATOR) I get an exception. Is it true that I cannot use literals in parser rules (I got every time an exception)? But in the examples for antlr v3 are literals in parser rules used?!

Best,
Markus
-- 
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail


More information about the antlr-interest mailing list