[pmwiki-users] Hiearchical Groups Proposal.
Joachim Durchholz
jo at durchholz.org
Wed Oct 18 11:29:47 CDT 2006
Patrick R. Michaud schrieb:
> On Wed, Oct 18, 2006 at 04:35:33PM +0200, Joachim Durchholz wrote:
>> The Editor schrieb:
>>> Jo said:
>>>> Ideally, metadata would be pages, possibly with a special marker in the
>>>> page name so PmWiki knows that they don't get meta-metadata.
>>> Separate pages for groupheaders and footers (which should NOT be
>>> inheritable to subgroups). But no to passwords, etc, which should be
>>> inheritable (ie: store in the page).
>>
>> Say, if the current page is named foo.bar.baz. The relevant attribute
>> pages might be named foo.bar.baz.%Header, foo.bar.baz.%footer, and
>> foo.bar.baz.%passwords. If these don't exist, PmWiki would have to look
>> in foo.bar.%Header, foo.bar.%Footer, and foo.bar.%passwords, then
>> foo.%Header, foo.%Footer, and foo.%passwords, and finally %Header,
>> %Footer, and %passwords.
>
> Actually, at least in the case of passwords, PmWiki would practically
> need to scan the entire path to the root and not just stop at the
> first one that is found. Some passwords can be blank or otherwise
> inherited from a parent page.
Scanning for foo.%* is roughly just the same effort as scanning for
foo.%passwords.
So if PmWiki needs to go further up the hierarchy, it should scan for
foo.$* anyway, and cache the results in case any additional
metainformation at that level is needed.
>> E.g. PmWiki could store the actual page text in a file, and all the
>> other data (history and whatnot) in separate metadata files; that would
>> make the PmWiki files far easier to read and process using external
>> tools.
>
> On the other hand, it makes administrative overhead much worse,
> because one has to worry about keeping the files in sync.
> Or, if we're really speaking only of reading the pages (and not
> updating them), then I would vote to keep the markup text in
> the pagefile as it is now, for administrative reasons, but overload
> WritePage() such that it writes the text= attribute to separate
> .txt files for external tools to process.
I think we have a spectrum here.
1. Keep everything that relates to a page in a single file. GroupHeader
etc. could then go as attributes into the parent page.
2. As 1, but place a few often-needed things into their own files. The
current version of the page is a candidate. Others might be split out
from the "main file" as needed.
3. Split every kind of metadata out into a separate file.
Whatever the trade-offs dictate...
>> It would also help with page displays - the page read code would
>> be reduced to a simple filegetcontents(), instead of the current code
>> that has to identify the text, try to stop reading at end-of-text, and
>> convert line ends back to newline codes; this would simplify and
>> slightly speed up the page read code.
> [...] the file_get_contents() function is only available in
> PHP 4.3.0 or later (and thus far PmWiki still targets 4.1.x).
> Yes, this could be worked around by providing a custom
> file_get_contents() function for versions < 4.3.0, but it
> doesn't seem to offer enough benefit to be worth the trouble.
That's beside the point. Whether it's file_get_contents() or something
taken from Pear's Compat package, just reading a file is less work than
reading and decoding it.
There's an additional point, of course: decoding the text is a small
overhead compared to interpreting all the markup.
OTOH every bit that makes PmWiki faster would help :-)
> [...] Displaying a page nearly always involves processing its
> metadata as well (if only to check permissions), so we aren't really
> avoiding the need to do some file parsing somewhere.
Right, I didn't see that.
> Also, for security reasons we have to encode/decode the contents of
> the markup text anyway,
Why that?
> so the extra step of decoding the newlines isn't at all significant,
> as PmWiki is using PHP's urldecode() function for this (and it has to
> be done even if we don't decode the newlines).
Again: why is there a need to urldecode?
> And, on many systems, keeping the text in a separate file may
> actually slow things down a bit.
Granted. Opening two files instead of one will generate some overhead.
It may be best to keep just the data that's needed for a page view in
one file, and everything else in another one.
> For example, the slowness that
> I observe on the pmwiki.org server seems to be due to filesystem
> latency issues -- i.e.., delays in being able to open files
> from whatever mass storage system my provider is using.
Latencies can be due to unindexed directories, I'd say.
If that's the case, you should probably move to a server that uses a
better file systen ;-)
Regards,
Jo
More information about the pmwiki-users
mailing list