[pmwiki-users] Faster searches and categories
Patrick R. Michaud
pmichaud at pobox.com
Mon Sep 12 14:52:02 CDT 2005
On Mon, Sep 12, 2005 at 02:45:28PM -0400, Martin Fick wrote:
> I mean optimistic because you are hoping that someone builds the
> index before you need it. The worst case scenario is that there
> is no index and the first category pagelist request needs to
> search every page.
Yes. But I figure that one-time costs aren't truly significant
in the long run, and the site admin is generally going to be the
person incurring the one-time cost.
> Turns out the grep is still slightly faster in most
> situations. The situation where it is slower is actually
> when I just search the Category pages. My find is not
> terribly smart: it does not used the pattern passed in to
> limit the pages searched (the filtering is handled by
> pmwiki afterwards so it still works). This means that it
> is actually searching the entire site and it is still
> within a few percent of the index method's time!
Oh, I totally agree that the grep will drastically speed things up.
I'm just not sure how to make use of it in a portable manner
at the moment. Even in the grepsearch.php code, there's a likelihood
that the script will fail totally when a certain number of files
are reached, because their names won't all fit in a single shell
command line or in an environment variable.
As an aside, I'm concerned that the $ginclp variable in
grepsearch.php makes it possible for anyone to execute
arbitrary commands on the server-- consider the effect of
(if you attempt this, do it on a BACKUP!):
(:pagelist 'foo bar ; rm -rf . ; echo' :)
which I think causes the executed shell command to become
cd wiki.d; F=`find . -type f |grep -v '^\./\.'`;
grep -l -i -e foo bar ; rm -rf \. ; echo \$F |sed -es'|^.*/||g'
which would be a really Bad Thing. So there needs to be
some sort of guards put in place to prevent that sort of
thing from happening...
More information about the pmwiki-users