[pmwiki-users] Defaulting PmWiki to utf8
Patrick R. Michaud
pmichaud at pobox.com
Wed Nov 14 17:11:51 CST 2007
On Wed, Nov 14, 2007 at 11:28:50PM +0100, Arrigo Marchiori wrote:
> > Personally, I'm very much in favor of switching PmWiki's default
> > to utf8 -- it will bring us some huge benefits -- but the big
> > obstacle is that migrating existing sites from the old iso-8859-1
> > default to a utf8 default may be somewhat complicated and/or
> > problematic. Thus I'm seeking comments and opinions.
>
> In Italian we use some accented characters (à, è, é, ì, ...) I think
> a charset change would be a major step for every PmWiki-based Italian
> site. Same thing for French, German...
>
> As a regular UTF-8 user (you can see it by this e-mail :-) I
> personally think that the whole Internet should switch to UTF-8. But
> I'm seeing also not a very good support of this charset, on some
> systems. I'm afraid that some web servers or FTP clients may not
> accept filenames encoded in UTF-8. I hope someone can contradict me!
Actually, I tend to run into the reverse situation, where a number
of operating systems (notably Mac OSX, but also some versions of
Windows) will accept filenames encoded in UTF-8 but not in
another character set. That's one reason I'm keen to switch. :-)
> > So, what I'm seeing at the moment is that if we switch to using
> > utf8 by default, admins of existing sites have to be notified
> > somehow that the default has changed and told how to configure
> > the site to continue using iso-8859-1, or given a procedure to
> > somehow convert the site's pages to utf8. And once someone
> > starts the utf8 conversion, it can get a bit messy to try to
> > convert back.
>
> Yes, I think that a big red label should be in the upgrade
> instructions, with pointer to a recipe or something that explains how
> to convert page text. I don't know about page names, though... :-/
We'd have a recipe to take care of the conversion. It's not
difficult to write, it's just a pain if any unexpected errors
occur. The first step would undoubtedly be to ensure a complete
backup of the wiki.d/ directory. :-)
> I suggest to do the first test with the PmWiki localized
> documentation: that's a good ready-made example of foreign language
> text! :-)
Indeed.
> About how to implement a charset conversion, the only idea I have is
> to use something like html_entity_decode(htmlentities(text)). I'm
> afraid that the filenames' conversion could only be left to each site
> admin.
As I mentioned, the steps of the actual conversion aren't all
that difficult -- PHP provides utf8encode and utf8decode functions
that automatically convert between iso-8859-1 and utf-8.
The hard part is knowing _when_ a conversion is needed, and when
things should be left alone.
> These were my two cents.
Thanks, very helpful!
Pm
More information about the pmwiki-users
mailing list