[pmwiki-devel] $UploadNameChars - adding unicode characters
Petko Yotov
5ko at 5ko.fr
Mon Jul 29 02:46:22 PDT 2019
On 29/07/2019 10:38, Simon wrote:
> https://pmwiki.org/wiki/PmWiki/UploadVariables#UploadNameChars
> From the page
> The set of characters allowed in upload names. Defaults to "-\w. ",
> which
> means alphanumerics, hyphens, underscores, dots, and spaces can be used
> in
> upload names, and everything else will be stripped.
> $UploadNameChars = "-\\w. !"; # allow dash, letters, digits, dots,
> spaces and exclamations
> $UploadNameChars = "-\\w. \\x80-\\xff"; # allow Unicode
> Isn't \\x80-\\xff just extended ASCII?
If the charset/encoding of your wiki is ISO-8859-1/Latin-1/Windows-1252
or another 8-bit encoding, \x80-\xff are the characters in the code page
between 128 and 255, see
https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Code_page_layout
If you have enabled UTF-8 (variable-length, 8-32 bits/character) for
your wiki, it is a different code page, with characters \x20-\x7f are
the same as in most 8-bit code pages (ASCII) and the others are 2, 3 or
4 bytes for one character but all come from the \x80-\xff range.
> I'm trying to do this with no effect
>
> $UploadNameChars = "-\\w. !=\\+#\\x{014C}\\x{014D}"; # allow
> exclamations, equals, plus, and hash Ōō
Exclamations, equals, plus, and hash is strongly recommended to NOT
enable because these characters have different meanings in URL
addresses, and in PmWiki.
The exclamation sign is a stop-mark for a link, a hash signifies
internal anchor or ajax subpage, plus is the standard encoding of
spaces, and equals start values of URL parameters.
If you do enable these, many other things may and will break, and we
currently don't have the potential to support such configurations.
There is no such thing as \x{014C}, in the UTF-8 encoding these are the
2 bytes \xc5 and \x8c and in your range you would write these
\\xc5\\x8c. The small letter would be \\xc5\\x8d so the range would look
like \\xc5\\x8c\\x8d (no need to repeat \\xc5). If it is not the UTF-8
encoding, it depends if the current code page contains this character,
for example the iso8859-4 code page contains these Ōō characters at
single bytes \xd2 and \xf2:
https://en.wikipedia.org/wiki/ISO/IEC_8859-4
so if your wiki is in iso8859-4 then you could add the range \\xd2\\xf2.
Enabling this could be as easy as adding to config.php
$Charset = "ISO-8859-4";
but your local configuration files, if they contain the international
characters, need to be saved in the same encoding, see:
https://www.pmwiki.org/wiki/PmWiki/LocalCustomizations#encoding
If the international characters are not in the code page of the wiki,
they cannot be enabled, browsers cannot post such files correctly. The 2
characters are not in the Latin-1/iso8859-1 code page.
If this is a vital requirement for file names, you may try enabling
UTF-8 for your wiki, then browsers will be able to both post files and
pages (wikitext, pagenames, categories) with the international
characters without transforming these to HTML entities.
However, moving a wiki to UTF-8 is not easy if you already have uploaded
files with international characters, or pagenames with these, and you
may have some difficulties if the file system of the server is not
Unicode.
Or, you could try enabling some 8-bit encoding which does contain these
characters, but again, if it is not the same as the encoding on your
file system, using a file/ftp browser may not show the correct
characters, and a file uploaded via FTP with such characters in the name
may not be visible on the wiki.
If it is not a fatally important requirement to have these characters in
the filenames on the server, but you are annoyed when people upload
files which appear with broken names, I can suggest a custom
$MakeUploadNamePatterns array that will replace Ōō with Oo in the file
name (not the text inside the file) when a file is uploaded. Enabling
this will probably break existing links in the wiki to already uploaded
files with these characters, and these may need to be renamed.
There is no easy solution unfortunately.
Petko
More information about the pmwiki-devel
mailing list