[antlr-interest] simple URL extractor

Tue Apr 24 03:33:11 PDT 2007

bace.spam at gmx.net wrote:
> Hi all,
> 
> I want to extract an URL from a text with antlr v3. To separate the URL from the remaining, I want to search for each occurrence of 'http://'.
> 
> So I defined the lexer rule:
> HTTP_INDICATOR : 'http://';
> 
> and parser rule:
> url : HTTP_INDICATOR host (port)? (SLASH path)*;
> 
> 
> If I uses this definition, and input something like
> 'text http://www.goolge.com/index.html further text'
> then the parser doesn't work as I imagined. The error message is that 't' was expected instead of 'm'. (The parser wants to match the 'html' with 'http://') But why?
> 
> Has anyone an idea how I can tell the lexer to search for the 'http://'?

I suppose, that you need to set the option filter=true; to implicitly
discard all text not of your interest. Otherwise your first grammar
looks fine.

> And as I tried to put the 'http://' in this parser rule (instead of
> the HTTP_INDICATOR) I get an exception. Is it true that I cannot use
> literals in parser rules (I got every time an exception)? But in the
> examples for antlr v3 are literals in parser rules used?!

Parser rules may contain literals BUT not exclusively! You have to call
another rule in a parser rule, otherwise it is a lexer rule.

Best regards,
Johannes Luber