[pmwiki-users] search does not find text with markup

Mikael Nilsson mini at nada.kth.se
Tue Dec 20 16:56:31 CST 2005


tis 2005-12-20 klockan 16:08 -0600 skrev Patrick R. Michaud:
> Here's where things stand in 2.1.beta14.  When a page is saved,
> PmWiki runs the markup text through the MarkupToHTML function 
> (excluding things such as (:include:) and (:pagelist:)) and then
> saves the first 600 bytes as an "excerpt" attribute.  This leading
> text is then readily available for things like RSS feeds and
> searches, and can be used to provide some idea of a page's contents
> in the absence of an explicit (:description:) directive.
> At the moment the 600 byte limit on excerpts is primarily there
> to prevent the internal $PCache from taking up too much memory,
> and also to keep disk space requirements down.
> 
> However, we could modify this somewhat -- we could save the entire
> rendered text, and we could strip the HTML tags from the excerpt.
> This could nicely resolve the problem described above, since the 
> excerpt would be searchable as well as the markup text.  It
> would also allow searches to easily display the text surrounding
> a found search term.  
> 
> The downsides of this approach are: 
> 1.  by removing the HTML from an excerpt we're left with only 
>     the text -- no structural indications such as paragraphs or lists
>     in the excerpt,
> 2.  storing the rendered text in the page file increases the
>     page file size a bit (although probably not too significantly
>     except for large pages),
> 3.  PmWiki's memory-based page cache can get too large if each 
>     page's excerpt attribute is stored there.
> 
> Still, these three downsides might be a good trade for the
> extra functionality we might get as a result.  Any opinions?

Well, to me it sounds like you need a simple text indexing engine using
flat-file databases, like phpdig: http://www.phpdig.net/index.php or
maybe SEARCpHp:  http://www.hansanderson.com/php/search/

For each page that is saved, you let the engine re-index that page. It's
very much similar to the linkindex files pmwiki maintains.

However, I'm pretty sure you've already considered this....

/Mikael

-- 
Plus ça change, plus c'est la même chose





More information about the pmwiki-users mailing list