[pmwiki-users] PmWiki.org UTF-8 migration complete

Petko Yotov 5ko at 5ko.fr
Mon Oct 31 10:53:00 CDT 2011


On Monday 31 October 2011 09:14:23 Oliver Betz wrote:
> there is no difference between "xlpage-utf-8.php after XLPage()" and
> "xlpage-utf-8.php disabled", both result in correct display for old
> and new pages but wrong XLPage translations.
> 
> Seemingly that's because PmWikiDe.XLPage causes xlpage-utf-8.php to be
> included, correct?

Correct, but I plan disabling this inclusion in a future version and have 
admins add the include_once() line in config.php.

> Next step: After removing the 'xlpage-i18n' => 'utf-8' entry from
> PmWikiDe.XLPage, I get different results:
> 
> "xlpage-utf-8.php after XLPage()" results in wrong display of old
> pages.

Right, old pages, as well as some other pages like RecentChanges last modified 
with version 2.2.29 or earlier, and possibly pages written by recipes with 
PmWiki version 2.2.30 and earlier. Newer versions of PmWiki always store the 
charset= attribute, with previous versions this attribute was not enforced if 
it was missing.

> and "xlpage-utf-8.php disabled" results in corect display.
> Well, that's somewhat surprising. Can it be true that removing UTF-8
> entries completely fixes the display problems? Even if this is true,
> it's no solution, because new pages are stored as ISO-8859-1 instead
> of UTF-8.

Yes, but your wiki is no longer in UTF-8. This is the usual case of someone 
upgrading from an earlier version without migrating to UTF-8: even if the 
documentation is UTF-8, PmWiki will convert it on the fly to the older 
encoding. It is supposed to work without flaw.

> Next step (suggestion from your other mail):
>  $DefaultPageCharset = array(''=>'ISO-8859-1');
>  include_once("$FarmD/scripts/xlpage-utf-8.php");
>  XLPage('de','PmWikiDe.XLPage');
> 
> seems to cure the problem: New pages are UTF-8, pages without
> declaration and XLPage translations are dispalyed correctly, and it
> doesn't matter whather PmWikiDe.XLPage contains "'xlpage-i18n' =>
> 'utf-8'" or not.

This looks correct.

> Gladly I have neither other encodings nor page names with
> international characters.

Absolutely. But we'll have to migrate wikis with international page names too.

> Is $DefaultPageCharset = array(''=>'ISO-8859-1'); only used to guess
> the charset of pages not containing a definition, or are there other
> effects?

It will be also used to override a wrong charset for pages saved with versions 
2.2.29 and earlier in the encodings ISO-8859-2, *-9 or *-13.

Thanks a lot for trying the UTF-8 migration and reporting what you find. Even 
if I wrote on Sept. 18 that this migration is not trivial and shouldn't be 
rushed until we document it, your reports will help us document it.

Petko



More information about the pmwiki-users mailing list