[pmwiki-users] Re:Changing charset from ISO-8859-1 to UTF-8
akabaila at pcug.org.au
Thu Apr 7 18:31:02 CDT 2005
On Friday 08 April 2005 02:17, pmwiki-users-request at pmichaud.com wrote:
> Message: 5
> Date: Thu, 7 Apr 2005 18:13:16 +0200
> From: Laurent Meister <meister at apfelwiki.de>
> Subject: Re: [pmwiki-users] Changing charset from ISO-8859-1 to UTF-8
> To: "Patrick R. Michaud" <pmichaud at pobox.com>
> Cc: Pmwiki-users at pmichaud.com
> Message-ID: <c39a0775c8e4d1937999660f39ebe4d7 at apfelwiki.de>
> Content-Type: text/plain; charset="us-ascii"
> Am 07.04.2005 um 16:01 schrieb Patrick R. Michaud:
> > On Thu, Apr 07, 2005 at 11:37:16AM +0200, Laurent Meister wrote:
> >> Hello,
> >> is there a nice way to convert existing pages from a pmwiki written
> >> with ISO-8859-1 to UTF-8?
> > Depends on what you consider to be "nice". :-)
> > You can often enlist a browser's help in this -- edit a page that is
> > encoded in ISO-8859-1, copy the markup text to the clipboard, and
> > then paste it into the edit form for a site that is set up for utf-8.
> > The broswer will often handle the encoding translation.
> > Still, I suppose this is more work than one would want to do,
> > especially
> > for a large site.
> That is the reason why I posted my problem here. On ApfelWiki we have
> nearly 1500 wikipages. And it would be a lot of work. :-(
> > I could see about coming up with a translation
> > module for PmWiki, that could do mass conversion of pages or perhaps
> > even automatically converting character sets on the fly.
> That's what I consider to be nice ;-)
> >> Some Browser seems not to show chars like the " ?" euro-symbol
> >> properly. Switching the page to UTF-8 would solve this problem.
> > You might also treat the euro-symbol chars as markup to be converted to
> > €, either when the page is rendered or when it is saved. This
> > might
> > eliminate the need to convert pages to utf-8. Let me know if you want
> > to
> > try this approach. :-)
> You mean to make a special markup skript for this?
> Laurent Meister (kt007)
I have a similar, problem with a much smaller volume of files - conversion
from Lithuanian iso-8859-13 to utf-8 on my HAN (Home Area Network). I plan
to write a small Python script for convertion of Lithuanian diacriticals in
iso-8859-13 to map into utf-8. The processing would be on a file with the
results written to another file with a slightly modified name.
The idea is to have the "sheep safe and the wolf fed" - the old iso-8859-13
files will remain intact and the new utf-8 with modified names will be ready
The mapping of iso-8859-13 to utf-8 is very similar to iso-8859-1 to utf-8. I
would be happy to share the script with you, though I do realise that a
similar mapping can be completed with php - alas, I don't program php and I
am fast becoming a slow learner.
Actually, in some respects my conversion problems are worse than yours - I
just looked at a daily news summary from Lithuania - the email is in
iso-8859-4, which was used for Lithuanian for a while, with iso-8859-13 being
now a standard, or so it is claimed.
Good luck in your move to utf-8!
More information about the pmwiki-users