[pmwiki-users] slurp is broken

christian.ridderstrom at gmail.com christian.ridderstrom at gmail.com
Fri Jul 13 08:24:53 CDT 2007


On Wed, 19 Jul 2006, Patrick R. Michaud wrote:

> On Wed, Jul 19, 2006 at 09:34:44PM +0200, christian.ridderstrom at gmail.com wrote:
>> On Wed, 19 Jul 2006, Patrick R. Michaud wrote:
>>> On Wed, Jul 19, 2006 at 11:36:53AM -0500, JB wrote:
>>>> PM,
>>>>
>>>> Can I please get a copy of your robots.txt file?
>>>
>>> Also, for any who are interested, here's the relevant
>>> sections of my root .htaccess file, which denies certain
>>> user agents at the webserver level instead of waiting
>>> for PmWiki to do it:
>>>
>>>   # HTTrack and MSIECrawler are just plain annoying
>>>   RewriteEngine On
>>>   RewriteCond %{HTTP_USER_AGENT} HTTrack [OR]
>>>   RewriteCond %{HTTP_USER_AGENT} MSIECrawler
>>>   RewriteRule ^wiki/ - [F,L]
>>>
>>>   # block ?action= requests for these spiders
>>>   RewriteCond %{QUERY_STRING} action=[^rb]
>>>   RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
>>>   RewriteCond %{HTTP_USER_AGENT} Slurp [OR]
>>>   RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
>>>   RewriteCond %{HTTP_USER_AGENT} Teoma [OR]
>>>   RewriteCond %{HTTP_USER_AGENT} ia_archive
>>>   RewriteRule .* - [F,L]
>>
>> The obvious solution: Add this to some PmWiki page?  Perhaps something
>> about administrative tasks? Or something related to robots.txt?
>
> It probably belongs in Cookbook.ControllingWebRobots, which also needs
> to be rewritten to be up-to-date with PmWiki 2.1.  There also needs
> to be a link in the administrative tasks section, or at least a
> FAQ question.

I'm going through old posts. Should I place the above on 
Cookbook.ControllingWebRobots?  (I wonder there's a problem placing it in 
an offical place - no security risks)

/C

-- 
Christian Ridderström, +46-8-768 39 44               http://www.md.kth.se/~chr


More information about the pmwiki-users mailing list