[pmwiki-users] Faster searches and categories

Mon Sep 12 14:52:02 CDT 2005

On Mon, Sep 12, 2005 at 02:45:28PM -0400, Martin Fick wrote:
> I mean optimistic because you are hoping that someone builds the
> index before you need it.  The worst case scenario is that there
> is no index and the first category pagelist request needs to 
> search every page.

Yes.  But I figure that one-time costs aren't truly significant
in the long run, and the site admin is generally going to be the
person incurring the one-time cost.

> Turns out the grep is still slightly faster in most
> situations. The situation where it is slower is actually 
> when I just search the Category pages.  My find is not
> terribly smart: it does not used the pattern passed in to
> limit the pages searched (the filtering is handled by
> pmwiki afterwards so it still works).  This means that it
> is actually searching the entire site and it is still
> within a few percent of the index method's time!

Oh, I totally agree that the grep will drastically speed things up.  
I'm just not sure how to make use of it in a portable manner
at the moment.  Even in the grepsearch.php code, there's a likelihood
that the script will fail totally when a certain number of files
are reached, because their names won't all fit in a single shell
command line or in an environment variable.  

As an aside, I'm concerned that the $ginclp variable in 
grepsearch.php makes it possible for anyone to execute 
arbitrary commands on the server-- consider the effect of
(if you attempt this, do it on a BACKUP!):

    (:pagelist 'foo bar ; rm -rf . ; echo' :)

which I think causes the executed shell command to become

    cd wiki.d; F=`find . -type f |grep -v '^\./\.'`; 
    grep -l -i -e foo bar ; rm -rf \. ; echo \$F |sed -es'|^.*/||g'

which would be a really Bad Thing.  So there needs to be
some sort of guards put in place to prevent that sort of
thing from happening...

Pm