[pmwiki-users] Yahoo! Slurp is broken (was: pmwiki.org performance)

Patrick R. Michaud pmichaud at pobox.com
Thu Mar 16 20:59:41 CST 2006


On Fri, Mar 17, 2006 at 11:23:37AM +1300, Robin Sheat wrote:
> On Friday 17 March 2006 03:01, Patrick R. Michaud wrote:
> > It just seems to me that Slurp is much more aggressive than it
> > should be.
> Yahoo may have a contact address for that kind of thing. Apparently you 
> can ask Google to slow down the rate of spidering of your site, Yahoo may 
> do the same. 

Yahoo! claims that you can slow down the rate, but it's not really
the rate that bugs me as much as the fact that they hit some pages
a dozen times or more per day.

> Also, (again, apparently), if you return 302 (IIRC, 'Not 
> Modified') to a request the spiders will learn how often your site 
> changes, and adjust accordingly. Is there any way to get PmWiki to return 
> this to spiders and browsers as appropriate. 

It's a 304 Not Modified header.  PmWiki already does this when
the $EnableIMSCaching option is set.  It's a bit harder to do 
this for spiders, because there it requires actually
parsing the date and time and doing a before/after
comparison, but I may see about doing that as well for
robots.  

> It would make things faster for users, too. 

Try setting $EnableIMSCaching = 1; and see if it helps.
(pmwiki.org runs with this set.)

> I know it's not a trivial problem, but perhaps when a 
> page is saved it could work out a list of dependencies, and when a 
> request is sent it checks the dates on all those pages, if they're all 
> older than the time on the request, then it doesn't need to regenerate 
> the page. I guess you'd also need a flag to say that if this page, or any 
> of the dependencies has dynamic markup then the page is always sent 
> fresh.

At the moment PmWiki doesn't bother computing dependencies --
instead it just uses the time that any page was changed in the
site as the last modification time for all pages in the
site.  Since page updates are generally less frequent
than browse requests, this works out pretty well, even for
(or especially for) pages that have dynamic markup.

Pm




More information about the pmwiki-users mailing list