[pmwiki-users] Yahoo! Slurp is broken (was: pmwiki.org performance)

Patrick R. Michaud pmichaud at pobox.com
Thu Mar 16 08:01:27 CST 2006

On Thu, Mar 16, 2006 at 12:57:50PM +0100, Sebastian Siedentopf wrote:
> Am 15.03.2006 um 21:16 schrieb Patrick R. Michaud:
> > If anyone else has other insights, ideas, or suggestions,
> > I'd be very interested in hearing them.
> The Slurp crawler should respect a delay statement in the robots.txt:  
> http://help.yahoo.com/help/us/ysearch/slurp/slurp-03.html

I've already got a Crawl-delay in my robots.txt file, although
Slurp might not be recognizing it within the "User-agent: *" line.

To me Crawl-delay doesn't seem to really address the real issue.  
As I understand it, Crawl-delay is intended to control the *rate*
of requests.  Yahoo! recommends a Crawl-delay of 5 to 10,
which I assume are seconds (they are for msnbot).  In other
words, this keeps Slurp from sending requests too quickly
during any particular crawl.

My issue isn't with the speed at which Slurp is sending requests -- it's
not that Slurp sends requests too fast for the server to keep up with.
My beef is with the number of (redundant) requests Slurp is sending 
in a relatively short period of time.  Here are the top five 
requests again (excluding robots.txt), over a period of fifteen days:

    380     /
    230     /wiki/PITS/PITS
    182     /wiki/Cookbook/HomePage
    178     /wiki/Cookbook/Cookbook?from=Cookbook.HomePage
    166     /wiki/PmWiki/PmWiki

I mean, does Yahoo! really need to be retrieving the pmwiki.org
home page over 25 times per day?!?  

It just seems to me that Slurp is much more aggressive than it
should be.


