[pmwiki-users] convert page from one charset to an other

The Editor editor at fast.st
Fri May 23 07:29:25 CDT 2008


>  - most pages become faulty - how can one convert all of them,
> without disturbing the special PmWiki format (script on the server?)
>
> I have to do the last step (converting pages), and don't know how
>
> thanks
> jdd


I just changed the BoltWire wiki engine to use UTF and included a
converter script with the last (beta) release.  I'm still not fully
clear on all the ins and outs of how these charsets work, but you
might give this a try and see if it doesn't do what you want.  Watch
out for line breaks in the email.

Instructions:
1.  Create a folder called 'pages' in your field directory and insert
a copy of all your local pages into this folder.
2.  Then paste the script below to something like utf8_fix.php and
place it directly above the pages folder you just created. Should be
side by side with your index.php file.
3.  Call the script in your browser and it will plow through the
entire pages folder and any subfolders, and convert each file from iso
to utf (reporting on each one, and when completed). If you have a lot
of pages you might need to increase the timeout and/or disallow user
aborts.
4.  Then rename your old wiki.d folder to something like wiki.d.backup
and rename "pages" to wiki.d.  This will allow you to restore the
original pages if you note any problems. From what I understand, there
may be situatons where it may not be possible to fully reverse this
conversion.

Again I want to emphasize I'm not an expert on charset encodings and
I'm not even sure how to tell whether or not this script is doing
anything. All seems fine on BoltWire, but my site is entirely ascii,
so can't tell. Anyway, proceed with caution. And any
feedback/corrections from those who are more knowledgeable in this
area are welcome.  I should also note, this script does not modify
page names in any way. Stripping special characters out of those could
be easily incorporated into this script if desired with just an extra
couple lines of code if needed. But it sounds as if you have taken
care of that already, manually.

Hope this helps someone.

Cheers,
Dan


<?php

$pagesDir = 'pages/';
$check = BOLTlistpages('', $pagesDir, '');
foreach ($check as $file) BOLTfix("$pagesDir/$file");
print_r("File system cleaning has been completed");
exit;

function BOLTlistpages($pat=NULL, $mydir='pages', $folder='') {
	if ($folder != '') $mydir = $mydir . "/$folder";
	if ($dir != 'system') {
		if ($mydir == 'plugins') $mydir = $pluginPath;
		if ($handle = opendir($mydir)) {
			while (false !== ($file = readdir($handle))) {
				if (($file == ".") || ($file == "..") || ($file == ".htaccess")) continue;
				if (is_dir("$mydir/$file")) {
					$handle2 = opendir("$mydir/$file");
					while (false !== ($file2 = readdir($handle2))) {
						if (($file2 == ".") || ($file2 == "..") || ($file2 ==
".htaccess")) continue;
						if (($pat != NULL) && (! preg_match($pat, $file2))) continue;			
				        $list[] = "$file/$file2";
						}
					}
				else {
					if (($pat != NULL) && (! preg_match($pat, $file))) continue;
			        $list[] = $file;
					}
			    }
			closedir($handle);
			}
		}
	if (is_array($list)) {
		sort($list);
		array_unique($list);
		}
	return $list;
	}
	
function BOLTfix($location) {
	$handle = fopen($location, "rb");
	$text = fread($handle, filesize($location));
	fclose($handle);
	if (BOLTutf8_check($text)) {
		$handle = fopen($location, "wb");
		$text = utf8_encode($text);
		fwrite($handle, $text);
		fclose($handle);
		print_r("fixed: $location<br>");
		}
	else print_r("ok: $location <br>");
	}

function BOLTutf8_check ($text) {
	if (strlen($text) == 0) return true;
	return (preg_match('/^.{1}/us', $text, $match) == 1);
	}



More information about the pmwiki-users mailing list