[pmwiki-users] Google local site search

H. Fox haganfox at users.sourceforge.net
Thu Dec 29 17:13:02 CST 2005


On 12/29/05, Patrick R. Michaud <pmichaud at pobox.com> wrote:
> On Thu, Dec 29, 2005 at 11:52:48AM -0700, H. Fox wrote:
> > On 12/29/05, Patrick R. Michaud <pmichaud at pobox.com> wrote:
> > > On Wed, Dec 28, 2005 at 03:24:27PM -0700, H. Fox wrote:
> > > > So, for example, a page's Edit and History links become self-referring
> > > > links, correct?
> > >
> > > Yes.
> >
> > Then it seems like it will mislead the search index.
> >
> > Every page will be indexed as a link that says "Edit" for instance.
> > Nearly every page on pmwiki.org will have eight extra self-referring
> > links that lack rel='nofollow' attributes?
>
> Ummm... search indexes normally index the contents of the page,
> not the contents of links to the page.  Or am I mistaken here?

[I seem to write about SEO here frequently, despite my lack of
professional expertise on the subject, but here goes...]

I think the link text is taken into account when the search engine
indexes a page's content.  If you have a link like this

<a href 'http://example.org/zdkhox.html'>Ziggleboo definition</a>

then the index *does* take the link text, "Ziggleboo definition", into account.

Having a lot of other links to the same page, like

<a href 'http://example.org/zdkhox'>Edit</a>
<a href 'http://example.org/zdkhox'>Page History</a>
<a href 'http://example.org/zdkhox'>View</a>
<a href 'http://example.org/zdkhox'>Search</a>
<a href 'http://example.org/zdkhox'>etc...</a>

would seem to affect the resulting indexing(*)... unless of course the
search robot was told not to follow the extraneous links.  ;-)

(*)  Think of it as "diluting" the significance link text that should
get advanced rank.  If *every* link that points to that page has the
same link text, that's a pretty good indication the page should be
ranked highly by the search engine when a search for that text is
submitted.

> > (I still don't understand why the default skin doesn't use nofollow
> > attributes in the PageAction links.)
>
> Oh, we can probably do that -- I forgot that we can just add
> rel='nofollow' to the wikistyle before each action not to be
> followed.
>
> > In that case, why cloak for googlebot?  Why not keep the ?action=
> > parameters intact and use the rel='nofollow' attribute for bots that
> > understand it?
>
> Because there may be links where an author forgets the nofollow.

If that were the case it would be a small price to pay.  (Philosophy
#3. Avoid gratuitous features)

I'm trying, perhaps a bit clumsily, to suggest that authors shouldn't
need to remember that because the nofollow attribute would be slipped
in automatically.

> For example...
>
> > > And even if the Skins include rel='nofollow' in the templates,
> > > what about markup...?
> > >
> > >     [[OtherPage]]
> > >     [[OtherPage?action=edit]]
> > >     [[OtherPage?action=dc]]
> >
> > It's a separate issue.  The bottom two should get rel='nofollow'
> > attributes by default.
>
> Er, wrong.  ?action=dc (like ?action=rss) should be followable
> by robots.

Then only the middle one.  (You tricked me there!)

My point remains the same: Slip a rel='nofollow' attribute into links
that robots shouldn't follow.

>  And I don't entirely agree that it's a separate issue,
> since sidebars and other items may contain markup with edit links
> or other actions.  Even if there are three links to ?action=edit
> with rel='nofollow' and one that omits it, then the link is likely to
> get followed.

Then the robot will hit the <meta name='robots' ...> tag.

Besides, I'm suggestion that such links get the rel='nofollow'
attribute without doing anything.  Sorry if that wasn't clear.

> > > So, the advantage of cloaking "?action=" is that it will work
> > > even for robots that don't understand rel="nofollow".
> >
> > Then it should be used for those, and not for robots that are known to
> > understand rel='nofollow'.
> >
> > I'm not entirely convinced it "works" for robots that understand
> > rel='nofollow', since you're showing the robot a lot of misleading
> > links (i.e. "transformed pages") for no reason.
>
> Can you give me an example of this -- i.e.,  how the links are
> misleading, and how it won't "work" for robots that understand
> rel='nofollow'?

I think of it as "communicating with the search engine".

1) Leave the action and add nofollow and you're telling Google's bot

  "We haven't morphed any links for your robot.  This link with
  ?action=edit is an "Edit This Page" link and it should not
   be followed by your robot.

2) Omit the action and nofollow and you're saying either

   "We morphed this "Edit" link for your bot.  You'll need to
   figure out how to deal with that.  We're not doing any
   monkey business here.  Really!"

or, if they don't catch on,

   "Here are a bunch of links to the same page, each
    having different link text.  You'll need to figure out
    to deal with them."  (See above.)

Put another way:  Why morph the page if you don't need to?  The extra
information you are omitting is potentially significant to the search
engine indexing algorithm, even if the significance may be minor. 
Besides, it's more obviously "honest" to not morph the page.

Since I've been "clear as mud" so far, here's what I'm suggesting
should be possible, if not by default then with some configuration
setting:

* Links that robots shouldn't follow should automagically get a
rel='nofollow' attribute.
* Robots that disregard the nofollow attribute -- and only those
robots -- should get the stealth treatment.

Hagan

> Also, can anyone tell me which robots implement rel='nofollow' such
> that they don't follow the link?  I know that Google doesn't (since
> July 2005), but what about the others?
>
> Pm




More information about the pmwiki-users mailing list