[pmwiki-users] Defaulting PmWiki to utf8

Patrick R. Michaud pmichaud at pobox.com
Wed Nov 14 17:11:51 CST 2007


On Wed, Nov 14, 2007 at 11:28:50PM +0100, Arrigo Marchiori wrote:
> > Personally, I'm very much in favor of switching PmWiki's default
> > to utf8 -- it will bring us some huge benefits -- but the big 
> > obstacle is that migrating existing sites from the old iso-8859-1
> > default to a utf8 default may be somewhat complicated and/or
> > problematic.  Thus I'm seeking comments and opinions.
> 
> In Italian we use some accented characters (à, è, é, ì, ...) I think
> a charset change would be a major step for every PmWiki-based Italian
> site. Same thing for French, German...
> 
> As a regular UTF-8 user (you can see it by this e-mail :-) I
> personally think that the whole Internet should switch to UTF-8. But
> I'm seeing also not a very good support of this charset, on some
> systems. I'm afraid that some web servers or FTP clients may not
> accept filenames encoded in UTF-8. I hope someone can contradict me!

Actually, I tend to run into the reverse situation, where a number
of operating systems (notably Mac OSX, but also some versions of 
Windows) will accept filenames encoded in UTF-8 but not in 
another character set.  That's one reason I'm keen to switch.  :-)


> > So, what I'm seeing at the moment is that if we switch to using
> > utf8 by default, admins of existing sites have to be notified 
> > somehow that the default has changed and told how to configure
> > the site to continue using iso-8859-1, or given a procedure to
> > somehow convert the site's pages to utf8.  And once someone
> > starts the utf8 conversion, it can get a bit messy to try to
> > convert back.
> 
> Yes, I think that a big red label should be in the upgrade
> instructions, with pointer to a recipe or something that explains how
> to convert page text. I don't know about page names, though... :-/

We'd have a recipe to take care of the conversion.  It's not
difficult to write, it's just a pain if any unexpected errors
occur.  The first step would undoubtedly be to ensure a complete
backup of the wiki.d/ directory.  :-)

> I suggest to do the first test with the PmWiki localized
> documentation: that's a good ready-made example of foreign language
> text! :-)

Indeed.

> About how to implement a charset conversion, the only idea I have is
> to use something like html_entity_decode(htmlentities(text)). I'm
> afraid that the filenames' conversion could only be left to each site
> admin.

As I mentioned, the steps of the actual conversion aren't all
that difficult -- PHP provides utf8encode and utf8decode functions
that automatically convert between iso-8859-1 and utf-8.
The hard part is knowing _when_ a conversion is needed, and when
things should be left alone.

> These were my two cents.

Thanks, very helpful!

Pm



More information about the pmwiki-users mailing list