[pmwiki-devel] Pagelist Caching
Patrick R. Michaud
pmichaud at pobox.com
Fri May 25 16:37:49 CDT 2007
On Fri, May 25, 2007 at 01:00:54PM -0700, Martin Fick wrote:
> You performed some tests with large pagelists to
> evaluate where most of the time is spent and concluded
> that a large portion is spent rendering the pagelist,
> but I am not sure that is a fair conclusion.
>
> Primarily, I think that you are considering A)
> pagelists that create large result sets but you may be
> ignoring B) the pagelists which must scan a large
> amount of pages but which only end up producing small
> pagelists!
I haven't ignored (B) -- it's just that for the examples
that have been recently discussed on the mailing list, it
was the size of the output set that was eating up the
majority of time and not the time needed to compute the
pagelist. But the pagelist caching algorithm in PmWiki
_definitely_ targets the (B) situation you describe.
Let's see an example where the pagelist cache is already
doing (B): http://www.pmwiki.org/wiki/Test/AuthList2 .
In this case, pagelist is looking for pages on the site
that have some sort of password associated with them.
Thus, in order to find these pages, it has to scan all
of the pages on the site (5731) to come up with the 80
pages that have a password of some sort on them.
We can see what is happening in the pagelist production via
the stopwatch at the bottom of the page.
When computing the list from scratch, the system actively
scans all 5731 pages looking for those that have passwords.
Here's the relevant stopwatch trace -- I've added line numbers
to the stopwatch output to make it easier to reference them
in the description:
0: 00.00 00.00 config start
...
4: 00.09 00.09 FPLTemplate begin
5: 00.09 00.09 MakePageList pre
6: 00.09 00.09 PageListSources begin
7: 00.09 00.09 PageStore::ls begin wiki.d/{$FullName}
8: 00.15 00.12 PageStore::ls merge wiki.d/{$FullName}
9: 00.22 00.19 PageStore::ls end wiki.d/{$FullName}
10: 00.23 00.21 PageStore::ls begin $FarmD/wikilib.d/{$FullName}
11: 00.24 00.21 PageStore::ls merge $FarmD/wikilib.d/{$FullName}
12: 00.24 00.21 PageStore::ls end $FarmD/wikilib.d/{$FullName}
13: 00.27 00.23 PageListSources end count=5731
14: 00.27 00.24 PageListSort pre ret=4 order=name
15: 00.27 00.24 MakePageList items count=5731, filters=PageListPasswords
16: 02.84 02.31 MakePageList post count=80, readc=5731
17: 02.84 02.31 PageListCache begin save key=5a8da5720010ae125b59fa8e5c6022bc
18: 02.84 02.31 PageListCache end save
...
22: 02.85 02.32 MakePageList end
23: 03.23 02.68 MarkupToHTML begin
24: 04.05 03.48 MarkupToHTML end
25: 04.05 03.48 FPLTemplate end
It takes 0.18 wall-clock seconds to scan the pagestores for a
list of the 5731 pages on the site (lines 6-13), and then
an additional 2.57 seconds to read all 5731 of them and find the 80
that have passwords on them (line 16, readc=5731, count=80).
Pagelist then saves this list as key=5a8da5720010ae125b59fa8e5c6022bc,
so that it can be used later. The total time to scan all 5731
pages and produce the list of 80 was 2.76 seconds (lines 5 and 22)
and it then takes an additional 1.20 seconds to render the output
(line 24). Total time for the pagelist is 3.96 seconds (lines 4 and 25).
If we then do a page reload, the pagelist is able to reload
the list from the cache instead of having to rescan the 5731
pages (assuming nothing has invalidated the cache). Here's
the stopwatch trace:
0: 00.00 00.00 config start
...
4: 00.10 00.09 FPLTemplate begin
5: 00.10 00.09 MakePageList pre
6: 00.10 00.09 PageListCache begin load key=5a8da5720010ae125b59fa8e5c6022bc
7: 00.10 00.09 PageListCache end load
8: 00.10 00.09 PageListSources begin
9: 00.10 00.09 PageListSources end count=80
10: 00.10 00.09 PageListSort pre ret=4 order=name
11: 00.10 00.09 MakePageList items count=80, filters=
12: 00.11 00.10 MakePageList post count=80, readc=0
...
16: 00.11 00.10 MakePageList end
17: 00.45 00.42 MarkupToHTML begin
18: 01.22 01.18 MarkupToHTML end
19: 01.22 01.18 FPLTemplate end
Here the pagelist function detected that it had a valid pagelist
cache entry, so instead of scanning the 5731 pages it simply
loaded the final list of 80 directly from the cache (0.11 seconds,
lines 5-12). No page reads were involved (readc=0, line 12).
So, the cache reduced the time to compute the list of 80
pages from 2.76 seconds to 0.01 second.
This list of 80 pages then goes through the pagelist template
formatting (1.11 seconds, line 18), for a total time of 1.12
seconds to produce the pagelist output from the cache.
So yes, the pagelist cache is addressing exactly the situation
that you say it should, by eliminating the time required to scan
a large number of pages to produce a relatively short list.
It does not improve the speed of rendering the list, however.
Hope this is satisfactory...?
Pm
More information about the pmwiki-devel
mailing list