[pmwiki-users] UTF-8 support and mbstrings

Patrick R. Michaud pmichaud at pobox.com
Tue Jun 6 09:39:00 CDT 2006


On Tue, Jun 06, 2006 at 02:55:43PM +0300, Athan wrote:
> Any hope to see such a version of pmwiki ?
> Current version works fine with single byte chars but lacks case insensitive 
> search when use non-latin utf-8 strings.
> So, why not an mbstring version? Most hosts support php with mbstrings 
> compiled in. Besides that, it is very easy to have it disabled when mbstring 
> functions are not available.

PmWiki uses preg_match for its (case-insensitive) text search -- this is
faster than calling the string or mbstring functions.  Unfortunately,
there isn't an mbstring version of preg_match available.  

(Yes, there's an mb_eregi function that does pattern matching, but 
unfortunately it uses a somewhat different syntax from the pcre-based 
pattern matching functions.)

So, in the case of search it's not a simple matter of replacing 
functions with mbstring equivalents -- it requires reworking the 
entire algorithm to be able to use mb_eregi, or avoiding the pattern-match
searches altogether.  

However, I'm looking to modularize the pagelist functions anyway, so 
perhaps text search can be placed into its own module.  Then it would
be much easier to have a mbstring version of text search.

Votes are being recorded at http://www.pmwiki.org/wiki/PITS/00682.  :-)

Pm




More information about the pmwiki-users mailing list