[pmwiki-users] PMWiki default documents vs search engines

Patrick R. Michaud pmichaud at pobox.com
Wed Dec 13 17:14:13 CST 2006


On Wed, Dec 13, 2006 at 10:28:34PM -0000, porneL wrote:
> 
> When I search for information about PMWiki I rarely find PMWiki's own site  
> and documentation. The problem is that every PMWiki installation contains  
> full copy of documentation, which gets indexed by search engines:
> http://www.google.com/search?q=pmwiki+%22documentation+index%22 (217000  
> results)
> 
> * it's lots of needlessly duplicated content which is off-topic for most  
> of the sites
> * searches for pmwiki-related information bring outdated/unauthoritative  
> documents
> * exploits against pmwiki can easily find potential victims
> 
> My suggestion is to exclude (in default configuration) PMWiki group from  
> being indexed by search engines.

PmWiki is already doing this -- PmWiki's default configuration has excluded 
the PmWiki group pages from being indexed by search engines since at 
least version 0.6.1 (February 2004).

You can prove this for yourself -- just do an install of PmWiki,
and then "view source" on any of the pages in the PmWiki group
(except PmWiki.PmWiki).  In the HTML output you'll see the 
<meta name='robots' content='noindex,nofollow' /> tag right near
the top of the document.

However, apparently a lot of site administrators have created
or are using skin templates that don't have a <!--HeaderText-->
directive.  As a result, the robots meta tag that PmWiki produces
isn't being included in the output, and the search engines are
therefore indexing the pages.  All of the links that I followed
from the google search above went to sites that seemingly didn't
have <!--HeaderText--> or <!--HTMLHeader--> in the skin template.

My guess is that the only real fix for this would be to have
PmWiki force a <!--HTMLHeader--> directive into any skin template
that omits it.  I'm not entirely sure that the PmWiki core should
be taking too many liberties withskin templates, however.

The other possibility would be to have PmWiki abort with an
error message if it loads a skin template that doesn't have the
required <!--HTMLHeader--> tag.  I guess that seems more reasonable.

At any rate, at this point I'll take the position that PmWiki
is already doing the right thing with its default settings, and
the problem is from the broken skin templates on all of those
sites.

However, in looking at the issue I just realized that pmwiki.org
*is* using the default distribution setting, which means that the
PmWiki group pages on pmwiki.org aren't being indexed by search
engines!  Oooooooops!  :-)

So, I'll adjust pmwiki.org's settings to not use the distribution
default and allow those pages to be indexed.  Thanks for pointing
this out, and hopefully those pages will start appearing on search
engines above the copies that are coming from other sites.

Thanks!

Pm




More information about the pmwiki-users mailing list