[pmwiki-users] Converting international documentation to UTF-8
Petko Yotov
5ko at 5ko.fr
Sat Jul 30 19:08:05 CDT 2011
On Saturday 30 July 2011 06:36:58 Carlos AB wrote:
> I would like to have more information on how can we start converting
> PmWiki international documentation to utf-8?
I started working on it, but I found a number of annoying bugs which slowed me
down. We will provide downloads of the translations both in UTF-8 and in the
older encodings where needed, in the next few days.
A good thing for those WikiGroups will be to reduce the number of pages whose
"names" contain international characters.
Petko
P.S. I was thinking about having PmWiki convert these pages on-the-fly: the
same file would work both in Latin-1 and in UTF-8 wikis. But this will be more
complex than I thought.
The Charset page attribute was added to PmWiki betas more than 4 years ago,
but unfortunately some important things were omitted and it cannot be trusted.
1. The wikis using the encodings iso8859-2, -9 and -13 keep saving a wrong
charset ISO8859-1 (this will be corrected for version 2.2.30). We can fix
those wikigroups on pmwiki.org, but for an existing page on some other wiki,
we couldn't be sure what the encoding is: "ISO8859-1" or one of the others.
The Charset attribute can be trusted only when it is "UTF-8".
2. The charset page attribute is added by the function PostPage() when a page
is edited and saved. Some recipes or functions modify pages with the function
WritePage(), which doesn't save this attribute: for example RecentChanges
pages will not normally contain the attribute, and an automatic script may not
know if the page was or wasn't already converted to UTF-8.
3. Pages last saved with a PmWiki version <2.2.0-beta43 or <2.2.0-stable don't
have the Charset attribute at all. There are many wikis still running 2.1.27.
(Well, because it can't be trusted, this isn't a big deal.)
More information about the pmwiki-users
mailing list