[pmwiki-users] UTF-8 case insensitive index
Petko Yotov
5ko at free.fr
Thu Apr 12 16:04:41 CDT 2007
On Thursday 12 April 2007 21:45, you wrote:
> I'm working on the utf-8 case insensitive index issue...
Thanks Patrick, that is good news! :-))
>
> In addition to making the index case insensitive, should we also
> have it normalize strings to remove accent marks altogether
> for purposes of comparisons?
>
> Then someone could enter a search term without accents and
> still be able to quickly find pages that had the accented
> forms of the search term. (Or vice-versa.)
Yes, that would be great. However, may I suggest that the conversions arrays
be customizeable? While in French the accentuated letters are on separate
positions, in Cyrillic and Greek there is also a "combining grave accent", a
separate character after the letter, that may and should be stripped ("`",
often written as "̀"). Some Czeck diacritical characters should probably
remain unchanged.
The whole function PageIndexTerms should be made customizeable if possible --
other users could adapt it for languages we here don't know.
Thanks!
Petko
More information about the pmwiki-users
mailing list