[pmwiki-users] yet another documentation suggestion ...

Thu Aug 4 10:20:40 CDT 2005

Patrick R. Michaud wrote:

> On Wed, Aug 03, 2005 at 11:22:23AM +1200, John Rankin wrote:
> 
>> Yes! It seems to me that people use many techniques for finding 
>> things in a large body of text, including a table of contents, an 
>> index and a search. One issue I see is that the page index (list of
>>  pages in alphabetical order) isn't very helpful in large page 
>> collections, because the sort is not necessarily in a useful order.
>> 
> 
> This is a very good point.  I wonder how hard it would be to add a 
> "relevance" measurement to the search, so that it could order pages 
> based on a predicted relevance instead of just alphabetically?

There are a lot of heuristics applied in that area. After all, since the 
search algorithm cannot really rate relevance, it has to guess it.

I can think of the following heuristics:
* How many pages refer to the page in question
   (this rates the "overall quality" of the page)
* How often any of the search terms appear on the page
* Whether the search terms are clustered or distributed evenly
   ("clustered" gets the better rank; this rates whether the
   page accidentally mentions each of the search terms (even) or
   they are used together with a specific meaning (clustered))
* Whether the search terms appear near the beginning of the page
   (nearer to the beginning improves the rank)

How to weigh them against each other, I don't know. We could ask Google, 
but I suspect that this is the #1 trade secret of the company :-)

Cf. http://en.wikipedia.org/wiki/Page_rank . I haven't checked the links 
that lead away from that page, but they looked interesting, too.

>> What if we had an (:index text:) directive?

We'd continually be battling out-of-date (:index...:) directives.
Particularly since errors in them don't make any obvious problems (at 
worst, a page will not be found, which is exactly the kind of error that 
never gets reported).

Regards,
Jo