[pmwiki-users] 2.2.0-beta43 released (drafts, expressions, diff, utf-8)
Patrick R. Michaud
pmichaud at pobox.com
Mon Apr 16 07:36:54 CDT 2007
On Mon, Apr 16, 2007 at 10:04:55AM +0200, Petko Yotov wrote:
> On Sunday 15 April 2007 23:16, Patrick R. Michaud wrote:
> > Searches on sites using utf-8 are now performed case-insensitively
> > for accented characters.
>
> Hello Patrick.
>
> There is a problem with the $CaseConversions array, line 214:
>
> "\xc9\xbd" => "\x171\xa4",
You're correct. It's now fixed for the next release.
> I tried to find what is written in the source [1] but could not find the
> sequence C9BD.
"\xc9\xbd" is a UTF-8 sequence, while the UnicodeData.txt file
gives codepoint values (not UTF-8). So, \xc9\xbd (utf8) is the
encoding for U+027D.
The uppercase conversion of U+027D is U+2C64. The codepoint U+2C64
requires a 3-byte UTF-8 encoding, but the program I wrote to translate
codepoints to UTF-8 was set up to only handle 1-byte and 2-byte
conversions, so it mis-encoded the sequence as \x171.
So, the uppercase conversion of U+027D (\xc9\xbd in UTF-8) is
U+2C64 (\xe2\xb1\xa4 in UTF-8).
I had already caught the 3-byte instances elsewhere in the
tables, but apparently missed this one. Thanks for catching it!
Pm
More information about the pmwiki-users
mailing list