[antlr-interest] Fast Token-Recognition (Possible in ANTLR?)

Sun Feb 24 00:40:15 PST 2008

On Sunday 24 February 2008 00:44:32 Timmy Turner wrote:
> I want to find occurences of certain strings in a rather large text.
> Like finding {"sdf","foo","bar"} in "asdffoosomerandomtextbar".
>
> Of course I could try myLongString.indexOf(myListOfKeywords.get(i))
> but I was hoping to find a faster way. Then I found this here:
> http://en.wikipedia.org/wiki/Trie were I couldn't help it but think of
> ANTLR (and that chapter about the maze in Terence Parr's book).
>
> So I thought of creating a grammar consiting just of tokens, and
> letting ANTLR to generate a lexer for it which then I could use to
> spot these strings.
>
> The question remaining is just wther ANTLR really uses tries - or
> maybe there's an even better approach?
AFAIK it does (it's really the standard way), but it might be too heavy for 
the task you describe. Consider the case when you're looking for 
{"ab","abc"} - do you want to find the "ab" in "abc"? ANTLR by default doesn't 
(because programming languages generally don't), and while it might be 
possible to recognize common prefixes with ANTLR, IMHO it's much simpler to 
use a standalone component implementing fast search for multiple strings 
(there's a lot of them available, in various languages) than ANTLR.

	Bye
		Vasek
--
http://www.mangrove.cz/
Open Source integration