[pmwiki-users] Sync of local pmwiki with remote pmwiki

Sat Jul 23 06:18:11 CDT 2005

Patrick R. Michaud wrote:
> At any rate, PmWiki can already handle the merge.  I just need
> the invocation details -- i.e., how does someone initiate the
> synchronization?

I have spent a few weeks analysing the possible interactions of 
part-time offline wikis, for another wiki and years ago. Whatever that 
may be worth for PmWiki, here are the results (and they answer more than 
the question above).

One of the first things that I wanted was a decentralised approach: if 
two people would meet at a conference, they should be able to hook up 
their notebooks in a LAN, work on their respective local wikis (which 
would synchronise with each other), and later they'd connect with the 
central server - which, after all, wouldn't be so central anymore, it 
would be just one of a multitude of peers who would connect and disconnect.
I liked that particular scenario since it would also have the usual 
advantages of a decentral system: a more distributed workload and better 
redundancy in case of server failures.
However, that scenario seemed to be exceedingly complicated, until I 
observed that merging does the following:

1) Determine a common root where two branches started to diverge. (The 
central idea here is that it's irrelevant whether the root is the "real 
server root" or not - we just consolidate as far as we can locally, 
without asking at the superserver.)
2) Compare the diffs from the common root to the endpoints of the two 
branches.
3) Merge these diffs into a new version.
4) In case of conflicts, simply place both versions in the resultant 
page and mark the conflict as such. Optionally, notify the author of the 
latest change by email, implicitly burdening him with the task of 
resolving the conflict (that encourages people to finish and feed back 
their changes as quickly as possible, reducing the overall presence of 
conflicts).
The conflict could be marked up with something like
   (:original <date/time> [<author>]:)
     original markup goes here
   (:changed <date/time> [<author>]:)
     conflicting version 1 goes here
   (:changed <date/time> [<author>]:)
     conflicting version 2 goes here
   (:end:)
The keywords obviously need improvements. Also, the above is written as 
if it were a block markup, while it should really be inline: We don't 
want to repeat the entire paragraph just to highlight the differences in 
the changes to a single sentence. It's bad enough to have to read whole 
sentences to see changed words - though I don't think that the diff 
should be word-based, the resulting changesets are usually too fine to 
make much sense to a human.

Synchronisation happens when two wiki servers "meet". The way how they 
meet can be quite different. For example:
* A road warrior may instruct his wiki to prepare a list of changes and 
send them by mail to a wiki of his choice.
* Road warriers may ask a wiki to send them updates on a regular basis 
(maybe only for particularly interesting pages).
* A wiki administrator may want to scan the LAN for wikis and have them 
autoconnect. (Not sure whether that's technically feasible.)
* A wiki admin could set up a wiki cluster: machines with well-known 
names and PmWiki URLs. The wiki clusters could then exchange data. Since 
the merge algorithm detailed above is robust against failures (simply 
re-run it later), the exact timing of updates can be tailored to the way 
that works best (either immediately after each edit, or hourly, or 
whatever).
* There might be other scenarios :-)
In other words, synchronisation initiation is usage-dependent - it's 
probably best to make that configurable, (say) by providing a function 
that prepares a change set, and another function that merges a changeset 
into the wiki pages.

One aspect that I haven't fully explored is interaction with 
authorisation. Wiki sites probably need to authorise to each other, and 
a wiki admin will probably want to decide what other wikis they trust 
well enough to accept their changes. Probably it's necessary to store 
the credentials that the original author provided, so that each site can 
double-check that the changes are reflected, too.

This opens up another issue: distributing changes in permission-related 
attributes. Probably that cannot be decentralised.

Regards,
Jo