[antlr-interest] Fast Token-Recognition (Possible in ANTLR?)

Sun Feb 24 02:08:34 PST 2008

Thanks for your replies!

>  IMHO it's much simpler to
>  use a standalone component implementing fast search for multiple strings
>  (there's a lot of them available, in various languages) than ANTLR.

Hm... I couldn't find any trie implementations for Java that would fit
my needs, maybe you could provide me with some links?

2008/2/24, Vaclav Barta <vbar at comp.cz>:
> On Sunday 24 February 2008 00:44:32 Timmy Turner wrote:
>  > I want to find occurences of certain strings in a rather large text.
>  > Like finding {"sdf","foo","bar"} in "asdffoosomerandomtextbar".
>  >
>  > Of course I could try myLongString.indexOf(myListOfKeywords.get(i))
>  > but I was hoping to find a faster way. Then I found this here:
>  > http://en.wikipedia.org/wiki/Trie were I couldn't help it but think of
>  > ANTLR (and that chapter about the maze in Terence Parr's book).
>  >
>  > So I thought of creating a grammar consiting just of tokens, and
>  > letting ANTLR to generate a lexer for it which then I could use to
>  > spot these strings.
>  >
>  > The question remaining is just wther ANTLR really uses tries - or
>  > maybe there's an even better approach?
>
> AFAIK it does (it's really the standard way), but it might be too heavy for
>  the task you describe. Consider the case when you're looking for
>  {"ab","abc"} - do you want to find the "ab" in "abc"? ANTLR by default doesn't
>  (because programming languages generally don't), and while it might be
>  possible to recognize common prefixes with ANTLR, IMHO it's much simpler to
>  use a standalone component implementing fast search for multiple strings
>  (there's a lot of them available, in various languages) than ANTLR.
>
>         Bye
>                 Vasek
>
> --
>  http://www.mangrove.cz/
>  Open Source integration
>