[pmwiki-users] Problem with utf & i18n (French accented characters)
Petko Yotov
5ko at free.fr
Sat Jun 9 14:10:51 CDT 2007
On Saturday 09 June 2007, Patrick R. Michaud wrote:
> On Fri, Jun 08, 2007 at 04:42:51PM -0400, Donald Z. Osborn wrote:
> > I am setting up a wiki (farm) that needs to be in two working
> > languages (English & French) and accommodate texts in some West African
> > languages that use extended Latin scripts - so it should accommodate
> > UTF-8.
> >
> > The early set-up is okay now except that I encountered an odd display
> > issue with the accented French characters in the interface: Basically,
> > although the output is in utf-8 and my browsers are set to utf-8 I am
> > getting the black diamond in Firefox 2 and empty square in MSIE7 for
> > the accented characters. Switch to iso-8859-1 and everything appears
> > normal. This is not what I expected.
Donald, the easiest way is to open for editing your UTF-8 page and the same
page at PmWikiFr, and clipboard-copy the text and paste it in your page.
It is especially important for the XLPage page.
>
> For a variety of reasons, the PmWikiFr.* pages (including PmWikiFr.XLPage)
> have been built using iso-8859-1 encoding instead of utf-8. So, they
> will tend to not display correctly inside of a utf-8 encoded page.
>
> Thus far PmWiki doesn't have the capability to automatically translate
> between character encodings, because many PHP installations don't
> provide the necessary translation functions. In recent versions
> of PmWiki 2.2.0-beta I've started storing the character encoding
> identification as part of the page so that PmWiki can eventually
> do this sort of translation, but we're not quite there yet.
>
> Since my machine _does_ have the necessary translations, it might be
> possible for me to come up with utf-8 versions of the PmWikiFr.*
> and other pages, and publish them simultaneously with the
> iso-8859-1 versions. But managing all of that -- separate sets of
> encodings for each language translations, and trying to explain
> to admins when to use each -- is likely to be a real headache.
>
> I'm very much open for suggestions on this topic.
>
> It would be very cool if we could find a good way to seamlessly
> convert existing iso-8859-1 and other sites to using utf-8
> (with the option to remain iso-8859-1 for those that want it).
There is the iconv program on most unix platforms that can do this, and also
the PHP iconv() function. Users may copy the downloaded PmWikiFr/* pages and
the new "import" feature of PmWiki may translate them into utf-8.
For "seamless conversion", a function similar to the PageIndexUpdate function,
or the various Backup recipes, or even the recipe that converts all pages to
CompressedPageStore may do the trick.
I also feel (like jdd), that UTF-8 is a good thing, it is the best choice for
new installations, especially for multilanguage sites, especially now that
PmWiki has a much better support (AsSpaced, case insensitive search).
MediaWiki, and DocuWiki, the other most popular php-wikis, are shipped with
UTF-8 by default.
Thanks,
Petko
More information about the pmwiki-users
mailing list