[pmwiki-users] UTF-8 support and mbstrings

Athan ssb at in.gr
Wed Jun 7 03:17:32 CDT 2006


I was able to modify pmwiki enabling case insensitive search.

Created a utf8toupper like function named utf8tolower in xlpage-utf-8.php 
This function is actually a "reversed" utf8toupper.
In pagelist.php, I replaced strtolower($t) with utf8tolower. Now pageindex 
is created and stored in utf-8.
In line 232 of pagelist.php ....
- if (!preg_match($i, $text))
+ if (!preg_match(utf8tolower($i), utf8tolower($text)))

After these mods search works case insensitively with any non-English 
language.
This is an a draft fix though. For better results a new CaseConversions 
array (u->l) is necessary.
It also requires the xlpage-utf8.php. Another issue is performance when 
mbstrings are not available.

I hope you will consider including something like that. You know, search is 
a primary feature for a wiki and sometimes such a quick and dirty fix is 
better than nothing.

Athan

"Patrick R. Michaud" <pmichaud at pobox.com> wrote in message 
news:20060606143900.GA26132 at host.pmichaud.com...
> On Tue, Jun 06, 2006 at 02:55:43PM +0300, Athan wrote:
>> Any hope to see such a version of pmwiki ?
>> Current version works fine with single byte chars but lacks case 
>> insensitive
>> search when use non-latin utf-8 strings.
>> So, why not an mbstring version? Most hosts support php with mbstrings
>> compiled in. Besides that, it is very easy to have it disabled when 
>> mbstring
>> functions are not available.
>
> PmWiki uses preg_match for its (case-insensitive) text search -- this is
> faster than calling the string or mbstring functions.  Unfortunately,
> there isn't an mbstring version of preg_match available.
>
> (Yes, there's an mb_eregi function that does pattern matching, but
> unfortunately it uses a somewhat different syntax from the pcre-based
> pattern matching functions.)
>
> So, in the case of search it's not a simple matter of replacing
> functions with mbstring equivalents -- it requires reworking the
> entire algorithm to be able to use mb_eregi, or avoiding the pattern-match
> searches altogether.
>
> However, I'm looking to modularize the pagelist functions anyway, so
> perhaps text search can be placed into its own module.  Then it would
> be much easier to have a mbstring version of text search.
>
> Votes are being recorded at http://www.pmwiki.org/wiki/PITS/00682.  :-)
>
> Pm 







More information about the pmwiki-users mailing list