[Pmwiki-users] Re: Re: Categories instead of hierarchies?

Fri Oct 29 08:21:11 CDT 2004

On Friday 29 October 2004 11:02, you wrote:
> On Fri, Oct 29, 2004 at 10:19:02AM +0530, mistyfire wrote:
> > Hello,
> > As the complexity and size of wiki grows a "multi-category page" seems a
> > lot more logical than hierarchies.
> > I have recently joined the mailing list and was wondering if saving the
> > native wiki pages in XML format would be a good idea.
> > [...]
>
> Interesting suggestion.  PmWiki 2 does allow changing the page storage
> mechanism being used, so developing a "save page in XML format" item
> would not be a big issue.
>
Understanding the complexity involved in enabling the full features of XML.
My present primary concern is just to let the native-files be saved in XML 
format. Presently PmWiki uses......the following format with some other tags 
which I might have overlooked.
-------------------------------
version=pmwiki-1.0.11
newline=?
text=
time=1098279383
diff:1098279383:1098279383:=
author=
author:1098279383=
host:1098279383=127.0.0.1
name=Main.SideBar
host=127.0.0.1
agent=
rev=1
-------------------------------
suggested conversion:
But with no additional XML advanced (XML/XSLT) or search features/functions in 
present code.
-------------------------------
<version>pmwiki-1.0.11</version>
<newline id=""></newline>
<text>the text on this page....</text>
<time>1098279383</time>
<author ID=1098279383></author>
<name>Main.SideBar</name>
<host>127.0.0.1</host>
<agent>Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040803 
Firefox/0.9.3</agent>
<rev>1</rev>
<diff past=1098090202 current=1098089007 diffid=2>
  <content>.....diff produced text...</content>
  <author ID=1098090202></author>
  <host ID=1098090202>127.0.0.1</host>
</diff>
<diff past=1098090334 current=1098090202 diffid=1>
   <content>.....diff produced text...</content>
  <author ID=1098090334></author>
  <host ID=1098090334>127.0.0.1</host>
</diff>
-------------------------------
The file stored at present would not be parsed using "XML parsing 
functions" but just as normal text except that now <text></text> will replace 
simple tags like "text="
-------------------------------
> What is a non-trivial issue is maintaining the integrity of the
> many cross-referencing links that this proposal introduces.
> For example, you propose having every wikipage contain a set of
> <category> elements for each category the page belongs to, 
Category entry in a wiki-page is already I guess implemented.... its only the 
way it would be stored in the native-files in wiki.d directory.
.....Eg. if in wiki-page an author enters:

This movie not only gives the viewers..........
(:category XYZ,2004,Horror:)

.....It would be stored in wiki.d directory as:
<text>
This movie not only gives the viewers..........
<belongsTo>
 <category>XYZ</category>
 <category>2004</category>
 <category>Horror</category>
</belongsTo>
</text>
>and then each category page (or a single common category page) has a set of
> <wikipage> elements for each page in each category.  Maintaining the
> referential integrity of these elements is certainly doable, but not
> trivial--adding or removing a page from one or more categories requires
> correctly rewriting the category page(s).  
Thanks for pointing out the integrity part .... this simple change should be 
able to resolve it.
1. An entry of category in a wikipage should automatically be entered in 
"Category-Page" ("CategoryPage.a.xml" - one page for each directory under 
wiki.d).  The format could be:
<list>
  <wikipage name=wikipage1>
    <category></category>
    <category></category>
  </wikipage>
  <wikipage name=wikipage2>
    <category></category>
    <category></category>
  </wikipage>
</list>
----------------------------------
2. A single CategoryList page (CategoryList.xml) should be maitained with list 
of all categories. The format could be:
<list>
<category></category>
<category></category>
<category></category>
</list>
----------------------------------
At the time of saving a wikipage with category entry, e.g:
(:category XYZ,2004,Horror:)

1. Look for the entries in CategoryList.xml 
  IF exists exit else make an entry.
2. Pick the right CategoryPage.(a).xml page based on the first character of 
the wiki-page name.
 .....(i) Look for the Entry for this wiki-page name.
......IF exists delete all entries "within it" and make fresh entries based on 
current page. ELSE make a new entry.
----------------------------------
Any Entry once made in CategoryList.xml should not be removed.
----------------------------------
Making or deleting and entry in any wiki-page: eg
(:category XYZ,2004,Horror:)
to
(:category XYZ,2004,Comedy:)

Should not jeopardize the referential integrity as all enteries within the 
referenced wiki-page  at the time of saving in CategoryPage (withing that 
wiki-page name entry) are refreshed.
----------------------------------
> Also, many PmWiki administrators 
> take advantage of filesystem operations (copy, rename, delete) to perform
> various page maintenance tasks--indeed, the ability to do this is one of
> the reasons PmWiki uses a flat file scheme--but these operations would no
> longer be available to administrators because they would result in
> internally inconsistent pages unless there's some way of rebuilding the
> associated indexes (more code overhead).
I think creating two files  CategoryList.xml and (One or Multi) 
CategoryPage.xml should solve it.
There are no other separate index files to be maintained.
In case these files get deleted or fresh one are to be created .....any fresh 
saving of wiki-page would created an entry. 
A simple administrative code can be written which picks one file at a 
time(read-only) and checks for entry in those two files and updates them. 
The code used during normal save operation could be resused except that 
respective wiki-pages are now read as read-only and not saved but a 
corresponding entry is made in two category files.
>
> Thus, moving to XML really doesn't make the process of maintaining
> category relationships any easier--the same operations still have to
> be performed by the PmWiki code (i.e., determining the applicable
> categories and links from the markup, storing them as XML elements,
> maintaining internal reference consistency among files).  
Since this two xml file system should solve the internal reference consistency 
moving to XML format could be helpful if this is implemented at the 
developmental stage.
> But yes, 
> XML could potentially improve the ability to search and query for pages
> matching certain criteria.  
This capability could be implemented at any stage later as an advanced 
features.
>
> Of course, this assumes that PHP already 
> has the built-in tools to do XML-based search/queries for us (my initial
> reading of the documentation is that it doesn't, but I could be wrong
> there).
If I am not mistaken since Version 5.0.0 Beta 1 (29-Jun-2003) XML support have 
be made.
>
> Beyond that, I'd be somewhat concerned about memory overheads associated
> with processing pages in XML -- many PHP installations run in a 8 megabyte
> sandbox, and it's possible that XML-storage related overheads could
> cause certain pages or situations to bump up against that limit.
Presently just storage in XML format can be implemented and further 
Capabilities added as an "addon or cookbook" or in its furture releases.
>
> Given PHP's minimal support for XML operations (e.g., search/query), I
> don't see that XML-based storage gains a whole lot over PmWiki's
> present storage mechanism at this time, 
Making storage in XML would lessen the burden in futher releases when advanced 
features are added. (Specially, for storing complex analytical research 
papers)
> except for possibly being able 
> to apply XSLT transformations in certain situations.  And XML/XSLT is
> probably *way* outside of the technical capabilities of the audiences
> that PmWiki targets.
>
> Thanks for the excellent suggestion, however -- I'll keep it in mind
> and keep watch to see how support for XML/XSLT improves in PHP over time.
> There's nothing that prevents us from adopting this approach in the future.
>
> Pm