[pmwiki-users] Upgrade to 2.2.35 : problem with some page using apostrophe

ABClf languefrancaise at gmail.com
Sun Nov 13 16:43:40 CST 2011


Hi !

Thank you Petko ;
I commented the include_once script, and get my apostrophe printed
back in the last pmwiki version.
I'll test the future version to check if any problem related to
characters still appear.

When explaining my problem, I had the idea it may be related to the
one wich is being asked here :
http://www.generation-nt.com/reponses/probleme-d-apostrophe-et-utf8-entraide-11945.html

Yes, I guess part (all ?) of the problem is related to the copy-pasted
text ; my German friend used Microsoft Word to write his text before
copying it in PmWiki (though he tells me he doesn't do this all the
time).
To be confirmed...

Quote : "if your wiki doesn't have international characters in
page/file names"...
...do you include uploaded files (images, pdf...) too in your condition ?

Gilles.



2011/11/13 Petko Yotov <5ko at 5ko.fr>:
> On Sunday 13 November 2011 01:32:40, Petko Yotov wrote :
>> There are indeed problems with some characters such as typographical
>> apostrophes and dashes, and yes, they are different from normal
>> apostrophes.
> ...
>> For some reason, the browsers don't treat these characters the same way as
>> PHP does. The PHP iconv() function, like the `iconv` system program,
>> appear unable to convert these characters so that the browsers display
>> them correctly.
>
> I should add the utf_encode() function.
>
> These characters appear to be non-standard, or more precisely from a different
> standard.
>
> The code points 128-159 (0x80-0x9F) are not denined in the ISO-8859-1 charset,
> they are defined in the Windows-1252 charset:
>
>  https://en.wikipedia.org/wiki/ISO-8859-1
>  https://en.wikipedia.org/wiki/Windows-1252 (the special characters are
>    in the cells with thick green borders)
>
> From Wikipedia:
>  It is very common to mislabel Windows-1252 text with the charset label
>  ISO-8859-1. A common result was that all the quotes and apostrophes
>  (produced by "smart quotes" in Microsoft software) were replaced with
>  question marks or boxes on non-Windows operating systems, making text
>  difficult to read. Most modern web browsers and e-mail clients treat the
>  MIME charset ISO-8859-1 as Windows-1252 in order to accommodate such
>  mislabeling. This is now standard behavior in the draft HTML 5
>  specification, which requires that documents advertised as ISO-8859-1
>  actually be parsed with the Windows-1252 encoding.
>
> So, the PHP conversion functions actually follow the standard, but the text
> sent by the browsers is not completely standard.
>
> In order to convert these characters, maybe our automatic conversion from
> ISO-8859-1 to UTF-8 should do the same : consider the page text as
> Windows-1252. Indeed, if the text contains characters at these code points,
> these characters can only be Windows-1252-encoded.
>
> Petko
>
> _______________________________________________
> pmwiki-users mailing list
> pmwiki-users at pmichaud.com
> http://www.pmichaud.com/mailman/listinfo/pmwiki-users
>



-- 

---------------------------------------
| A | de la langue française
| B | http://www.languefrancaise.net/
| C | languefrancaise at gmail.com
---------------------------------------



More information about the pmwiki-users mailing list