[pmwiki-users] i18n and iso-8859-13

Patrick R. Michaud pmichaud at pobox.com
Sat Apr 2 10:01:31 CST 2005


On Sat, Apr 02, 2005 at 08:01:17PM +1000, Algis Kabaila wrote:
> 
> Thank you for the answers.  I suspect that, at least in part, mapping is 
> accomplished by the web server, when it is invoked.  I base this opinion (and 
> it is only an opinion, not knowledge) on the following observation:
> [...]

Mappings aren't performed directly by the webserver, but the webserver
*is* supposed to inform the browser of the correct mapping to be used,
via the Content-Type header.  I suspect that what you're seeing is the 
result of a webserver not sending an appropriate header.  Let's walk 
through your diagnosis and see if that explains things...

> I have a sample of Lithuanian text in my home page 
> (http://www.pcug.org.au/~akabaila) on a separate HTML page lituanus.html.  I 
> recently edited it at home, specifying iso-8859-13.  I used  SuSE9.2 (kde 
> 3.3), Konqueror and Kate for editing and testing.  It all went fine - I could 
> see the correct glyphs in their correct places.  It confirms your suggestion 
> that the browser does the mapping.
> 
>  Before uploading, I thought it be worth while to look at the page on my home 
> "server" that runs  Apache 2.0.49 as a web page (At present my Apache is 
> still "out of the box", without any re-configuration at all.)  
> All glyphs were 
> wrong - I think they were from the iso-8859-1 space.  That suggests that the 
> web server does at least influence the mapping.  

Actually, an out-of-the-box Apache is likely to be specifying a charset.  
For example, on my FC3 server the default Apache configuration specifies

    AddDefaultCharset UTF-8

which tells Apache to put "UTF-8" in the Content-Type header in the absence
of any other information from the filename.  And according to 
http://httpd.apache.org/docs-2.0/mod/core.html#adddefaultcharset,
this will also override any charset specified in the body of the 
document via a <meta> element.

Now then, a browser will tend to trust the charset given by the
webserver's Content-Type: header in preference to a <meta>
element, so the browser displays the document as though it were
UTF-8 encoded.

To see this at work, I copied the lituanus.html document onto my
server, at http://www.pmichaud.com/sandbox/lituanus.html.  It displays
incorrectly in my version of Firefox, because Firefox has been told
by the webserver to display the document as UTF-8.  (Tools => Page Info)

However, if I create a sandbox2/ directory, and place the following
in sandbox2/.htaccess:

    AddDefaultCharset Off

then Apache won't put a charset parameter on the Content-Type header.
Firefox then uses the encoding it finds in the <meta> tag, and everything 
displays correctly (http://www.pmichaud.com/sandbox2/lituanus.html).

So, the mapping itself is still performed entirely by the browser, the
webserver is just telling the browser to use the wrong mapping.

> In desperation, regardless 
> of the bad glyphs, I decided to upload lituanus.html to my ISP (TIP), which 
> runs Apache 1.xx, probably expertly configured by the web gurus.  

I suspect they have AddDefaultCharset Off.  This would likely be true
if they built Apache from original sources (this is Apache's default), 
rather than using the version that came with their Linux distro.  

> [...]
> It all sounds logical and reasonable so far.  Now for the "unreasonable" bit: 
> I configured the home PmWiki for Lithuanian characters by including a line in 
> the .../local/config.php file the following line:
> [...]
> Well, in spite of my Apache not being able to display lituanus.html correctly, 
> the PmWiki, running on the same unconfigured Apache displays Lithuanian glyphs 
> correctly.  I am happy about it, but why is it so?  That is the real question 
> that I can not answer and that "blows out of the water" my tentative 
> conclusions.  


... because PmWiki directly sets the Content-Type header that the
webserver sends back, as opposed to using the <meta> tag to do it.
The xlpage-iso8859-13.php file does

    $HTTPHeaders[] = "Content-type: text/html; charset=iso-8859-13";

which eventually becomes a PHP header() call that modifies the
HTTP responses returned by the webserver.  Since there's an explicit
Content-Type header, Apache doesn't supply one (with the incorrect
encoding), and everything works.

> Alos, I looked with the Konqueror at the code that PmWiki produces on 
> pmwiki.org site, but can not see any "charset=xxxx" specification.  Where is 
> it? Is it in CSS and if so how can I access it?

It's in the HTTP response headers (where it's supposed to be according 
to the relevant standards).  The <meta http-equiv='...'> tag that many
HTML documents use is just a workaround that was developed for those
cases where one didn't want to (or couldn't) reconfigure the webserver
for a different character encoding.

> Any pointers to:
> 1. How to install utf-8 with another language (for me, Lithuanian) into 
> PmWiki;

This already exists in PmWiki -- all one has to do is specify

   'xlpage-i18n' => 'utf-8',

in the PmWikiLt.XLPage file.  This tells PmWiki to load 
the scripts/xlpage-utf-8.php file (which configures PmWiki for
dealing with utf-8 encoded documents).  Similarly, if you wanted
to do things in iso-8859-13, you would do

   'xlpage-i18n' => 'iso-8859-13',

in the XLPage and this tells PmWiki to load scripts/xlpage-iso-8859-13.php.
However, I strongly recommend going with utf-8 if at all possible,
and would prefer the PmWikiLt.* pages on pmwiki.org be done in 
utf-8 instead of iso-8859-13.  

> 2. How to configure Apache 2.xx and where and how learn more 
> about Apache web server, and configure it with iso-8859-13 and 
> utf-8, preferably not having to read 1000 pages of  docs that 
> cover other aspects of the web server.

http://httpd.apache.org/docs/mod/core.html#adddefaultcharset
is a good starting point, but I suspect that what you ultimately
want is to set

    AddDefaultCharset off

which tells Apache to leave any charset specification that is to
occur to the document itself.

Hope this helps.

Pm



More information about the pmwiki-users mailing list