[pmwiki-users] More template queries and UTF-8 display

Patrick R. Michaud pmichaud at pobox.com
Thu Jun 16 08:40:28 CDT 2005


On Thu, Jun 16, 2005 at 10:35:22PM +0930, Clytie Siddall wrote:
> They should still turn up in a search for PmWikiVi, shouldn't they?  
> I'd still rather have accented links, even if it means they only work  
> internally, as long as the search engines will pick up the main Vi  
> page. No accents just ruins the meaning of the words. :(

Welcome to the wonderful world of character sets and i18n.

You definitely want to keep the accented titles.  Search doesn't work
properly for them on pmwiki.org, because the search page 
(Main.SearchWiki) is in a group that is using iso-8859-1 encoding, 
while the PmWikiVi page titles are encoded with utf-8.  On a normal 
PmWiki installation this really isn't a problem, because there the 
entire site will typically be using the same encoding for all of its pages
and there the search will work just fine.

> I've noticed that my browser (or the server returning the page) is  
> mangling the accented vowels horribly when showing the address in the  
> status bar. If I copy the address of the translated cookbook  
> template, for example, instead of:
>     http://www.pmwiki.org/wiki/PmWikiVi/SáchC???mNang
> I get:
>     http://www.pmwiki.org/wiki/PmWikiVi/S%c3%a1chC%e1%ba%a9mNang
> which is UGLY. Which end is doing that? My browser is pretty good  
> with UTF-8 (OmniWeb 5.1, full of tech gadgets, um, serious tools).

PmWiki is doing it because it must do so to be standards-compliant.
According to the standard for urls (RFC 2396), urls must consist of 
characters only in the US-ASCII set; everything else has to be 
escaped (http://www.ietf.org/rfc/rfc2396.txt, section 2.4):

    Data must be escaped if it does not have a representation using an
    unreserved character; this includes data that does not correspond to
    a printable character of the US-ASCII coded character set, or that
    corresponds to any US-ASCII character that is disallowed, as
    explained below.

The URI syntax was recently updated by RFC 3986, but even there it says
that non-US-ASCII characters in urls must be %-escaped.  

There are many browsers which are able to handle non-US-ASCII
characters in urls, and RFC 3986 notes that these can work well 
in local or regional contexts or with improving technology.  But
there are still browsers that can't handle them, and since PmWiki
has a somewhat global perspective it chooses to follow the
published standards in this regard.

Pm



More information about the pmwiki-users mailing list