[antlr-interest] simple URL extractor
Johannes Luber
jaluber at gmx.de
Tue Apr 24 03:33:11 PDT 2007
bace.spam at gmx.net wrote:
> Hi all,
>
> I want to extract an URL from a text with antlr v3. To separate the URL from the remaining, I want to search for each occurrence of 'http://'.
>
> So I defined the lexer rule:
> HTTP_INDICATOR : 'http://';
>
> and parser rule:
> url : HTTP_INDICATOR host (port)? (SLASH path)*;
>
>
> If I uses this definition, and input something like
> 'text http://www.goolge.com/index.html further text'
> then the parser doesn't work as I imagined. The error message is that 't' was expected instead of 'm'. (The parser wants to match the 'html' with 'http://') But why?
>
> Has anyone an idea how I can tell the lexer to search for the 'http://'?
I suppose, that you need to set the option filter=true; to implicitly
discard all text not of your interest. Otherwise your first grammar
looks fine.
> And as I tried to put the 'http://' in this parser rule (instead of
> the HTTP_INDICATOR) I get an exception. Is it true that I cannot use
> literals in parser rules (I got every time an exception)? But in the
> examples for antlr v3 are literals in parser rules used?!
Parser rules may contain literals BUT not exclusively! You have to call
another rule in a parser rule, otherwise it is a lexer rule.
Best regards,
Johannes Luber
More information about the antlr-interest
mailing list