[pmwiki-users] HTML -Wiki correspondense in search engines

Patrick R. Michaud pmichaud at pobox.com
Wed Jun 1 12:51:04 CDT 2005


On Wed, Jun 01, 2005 at 10:19:51AM -0700, John W Morris wrote:
>    In standard HTML we can put meta tags and such for search engines to key
>    on for each page if we want.

You can do this in PmWiki as well, using the (:description:), (:keywords:),
and (:robots:) directives.

>    How do search engines react to Wiki sites?  Wiki pages are created on the
>    fly so there are not "really there" for the search engine to follow a set
>    of normal links to index the site.  

Well, there are at least two different types of search engines.  Search
engines that use a web crawler to collect content (this would include all
of the major search engines such as Google, Yahoo!, MSN Search) don't
actually index the contents of files on a hard disk -- they index what
browsers see on the web.  Thus, the fact that (wiki) pages may be created
"on the fly" is invisible and irrelevant to these search engines--they
see the same result.

The other type of search engine, which indexes files on disk, will
likely have trouble with PmWiki pages or any other dynamic site, because
the page doesn't exist in its rendered form.  Even here, however, most
such engines have a "spider" or "crawl" mode in which they index content
as if it came from a browser as well, and wikis work just fine with that.

>    Also... the pages contain history that
>    is not generally germane to what is on the displayed page,

By default, PmWiki includes <meta> tags in the output that tell search
engine crawlers not to index the contents of the history, edit, upload,
and other utility pages.

>    The goal is to get a Wiki site to be represented in search engines as well
>    as a standard pure static HTML site would be.

If you do a Google search for "pmwiki", I think you'll find that PmWiki
sites are reasonably well indexed.  I just did a search and Google says
"Results 1-10 of about 901,000 for pmwiki", which says that it's not
having much trouble finding these sites.  :-)

>    I've poked around with the Wiki search engine but it is not
>    clear. Referencs to serach robots and how to allow or disallow them
>    doesn't  really explain the results of their actions.  Help Please? in
>    understanding this aspect of wiki-dom.

Well, the issue of search robots and allow/disallowing them isn't really
specific to wiki's -- it's the same as for all web sites.  Don't let the
dynamic nature of wiki pages throw you too far -- from a web crawler's
perspective, it's all the same.  

That said, we definitely could use a good document describing how search
engines work and the features that PmWiki provides for better managing
them.  I'm not sure when I'll find time to write such a document, although
searching the pmwiki-users archives is likely to come up with some
useful source materials.

>    One answer I understand would be to insert a meta tag in the html template
>    but is this the best way to get represented in the search engine.  

This can be done also, but the (:description:) and (:keywords:) tag
are likely to give better results because they can be specific to each
page.  (Of course, they can also be placed in a GroupHeader to generate
keywords specific to each wiki group.)

Pm



More information about the pmwiki-users mailing list