[pmwiki-users] UTF-8 case insensitive index

Petko Yotov 5ko at free.fr
Thu Apr 12 16:04:41 CDT 2007


On Thursday 12 April 2007 21:45, you wrote:
> I'm working on the utf-8 case insensitive index issue...

Thanks Patrick, that is good news! :-))

>
> In addition to making the index case insensitive, should we also
> have it normalize strings to remove accent marks altogether
> for purposes of comparisons?
>
> Then someone could enter a search term without accents and
> still be able to quickly find pages that had the accented
> forms of the search term.  (Or vice-versa.)

Yes, that would be great. However, may I suggest that the conversions arrays 
be customizeable? While in French the accentuated letters are on separate 
positions, in Cyrillic and Greek there is also a "combining grave accent", a 
separate character after the letter, that may and should be stripped ("`", 
often written as "̀"). Some Czeck diacritical characters should probably 
remain unchanged.

The whole function PageIndexTerms should be made customizeable if possible -- 
other users could adapt it for languages we here don't know.

Thanks!
Petko





More information about the pmwiki-users mailing list