[pmwiki-users] slurp is broken
christian.ridderstrom at gmail.com
christian.ridderstrom at gmail.com
Fri Jul 13 08:24:53 CDT 2007
On Wed, 19 Jul 2006, Patrick R. Michaud wrote:
> On Wed, Jul 19, 2006 at 09:34:44PM +0200, christian.ridderstrom at gmail.com wrote:
>> On Wed, 19 Jul 2006, Patrick R. Michaud wrote:
>>> On Wed, Jul 19, 2006 at 11:36:53AM -0500, JB wrote:
>>>> PM,
>>>>
>>>> Can I please get a copy of your robots.txt file?
>>>
>>> Also, for any who are interested, here's the relevant
>>> sections of my root .htaccess file, which denies certain
>>> user agents at the webserver level instead of waiting
>>> for PmWiki to do it:
>>>
>>> # HTTrack and MSIECrawler are just plain annoying
>>> RewriteEngine On
>>> RewriteCond %{HTTP_USER_AGENT} HTTrack [OR]
>>> RewriteCond %{HTTP_USER_AGENT} MSIECrawler
>>> RewriteRule ^wiki/ - [F,L]
>>>
>>> # block ?action= requests for these spiders
>>> RewriteCond %{QUERY_STRING} action=[^rb]
>>> RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
>>> RewriteCond %{HTTP_USER_AGENT} Slurp [OR]
>>> RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
>>> RewriteCond %{HTTP_USER_AGENT} Teoma [OR]
>>> RewriteCond %{HTTP_USER_AGENT} ia_archive
>>> RewriteRule .* - [F,L]
>>
>> The obvious solution: Add this to some PmWiki page? Perhaps something
>> about administrative tasks? Or something related to robots.txt?
>
> It probably belongs in Cookbook.ControllingWebRobots, which also needs
> to be rewritten to be up-to-date with PmWiki 2.1. There also needs
> to be a link in the administrative tasks section, or at least a
> FAQ question.
I'm going through old posts. Should I place the above on
Cookbook.ControllingWebRobots? (I wonder there's a problem placing it in
an offical place - no security risks)
/C
--
Christian Ridderström, +46-8-768 39 44 http://www.md.kth.se/~chr
More information about the pmwiki-users
mailing list