[pmwiki-users] Speedy search?
Patrick R. Michaud
pmichaud at pobox.com
Fri Feb 17 10:21:29 CST 2006
Karl wrote:
> I'll send it to you via PM.
Okay, I have it set up on my server, here's an example timing
I'm seeing for the markup "(:pagelist group=Techlib fmt=dictindex:)"
00.00 00.00 MarkupToHTML begin
00.08 00.01 MakePageList begin
00.47 00.05 MakePageList scanning 1070 pages, readf=0
00.53 00.06 MakePageList sort
00.53 00.06 MakePageList end
06.48 01.05 MarkupToHTML end
06.61 01.06 MarkupToHTML begin
06.67 01.07 MarkupToHTML end
06.74 01.07 MarkupToHTML begin
06.74 01.07 MarkupToHTML end
07.01 01.08 now
The first column is wall-clock seconds, the second column
is CPU time used. You can see that creating the list of pages
itself is fairly quick -- only takes 0.53 seconds of real time
(0.06 seconds of CPU time). Not bad for having to scan and
check access permissions on over 1000 pages.
What *is* taking forever is formatting the output -- the time between
"MakePageList end" and "MarkupToHTML end". So, adding a few stopwatch
points into dictindex.php and re-running the script, we get:
00.00 00.00 MarkupToHTML begin
00.12 00.00 FPLDictIndex start
00.12 00.00 MakePageList begin
00.52 00.03 MakePageList scanning 1070 pages, readf=0
00.58 00.05 MakePageList sort
00.58 00.05 MakePageList end
00.58 00.05 FPLDictIndex generate names
02.29 00.69 FPLDictIndex sort
02.47 00.72 FPLDictIndex format output
03.65 01.01 FPLDictIndex end
03.72 01.03 MarkupToHTML end
03.84 01.04 MarkupToHTML begin
03.93 01.05 MarkupToHTML end
03.94 01.05 MarkupToHTML begin
03.95 01.05 MarkupToHTML end
04.17 01.06 now
Because of varying system loads during testing, I generally look at the
CPU seconds instead of the wall clock seconds for comparison. Here we
can see that the "generate names" section of FPLDictIndex is eating up
the bulk of the time -- 0.54 seconds CPU time. That's a lot. A close
second is the format output section, at 0.29 seconds CPU time.
The generate names section looks like:
StopWatch("FPLDictIndex generate names");
for($n=0;$n<count($matches);$n++)
$matches[$n]['name'] = FmtPageName('$Name',$matches[$n]['pagename']);
$cmp = create_function('$x,$y',
"return strcasecmp(\$x['name'],\$y['name']);");
Calls to FmtPageName can be really expensive, so lets try PageVar
instead:
for($n=0;$n<count($matches);$n++)
$matches[$n]['name'] = PageVar($matches[$n]['pagename'], '$Name');
00.00 00.00 MarkupToHTML begin
00.01 00.00 FPLDictIndex start
00.01 00.00 MakePageList begin
00.09 00.04 MakePageList scanning 1070 pages, readf=0
00.10 00.05 MakePageList sort
00.10 00.05 MakePageList end
00.10 00.05 FPLDictIndex generate names
00.46 00.41 FPLDictIndex sort
00.48 00.43 FPLDictIndex format output
00.68 00.62 FPLDictIndex end
00.69 00.62 MarkupToHTML end
That's some improvement, but we're still eating up 0.36 seconds.
How about just computing the name directly?
for($n=0;$n<count($matches);$n++)
$matches[$n]['name'] =
preg_replace('/^[^.]*\\./', '', $matches[$n]['pagename']);
00.06 00.05 FPLDictIndex generate names
00.54 00.42 FPLDictIndex sort
Not much improvement. Maybe if we try letting MakePageList do the
sort, and sort based on title instead of name...?
StopWatch("FPLDictIndex start");
$opt['order'] = 'title';
$matches = MakePageList($pagename, $opt);
00.00 00.00 MarkupToHTML begin
00.01 00.01 FPLDictIndex start
00.01 00.01 MakePageList begin
00.06 00.04 MakePageList scanning 1070 pages, readf=1
01.61 00.27 MakePageList sort
01.72 00.36 MakePageList end
01.73 00.36 FPLDictIndex format output
01.94 00.58 FPLDictIndex end
01.95 00.59 MarkupToHTML end
01.97 00.60 MarkupToHTML begin
01.98 00.62 MarkupToHTML end
01.99 00.62 MarkupToHTML begin
01.99 00.62 MarkupToHTML end
02.38 00.63 now
Now MakePageList takes longer (0.36 seconds versus 0.05 seconds),
but FPLDictIndex, which includes the increased time of MakePageList,
is a lot shorter (0.58 seconds versus 1.01 seconds). We've cut
the time almost in half.
I'm going to play with the "format output" section a bit to
see if I can make any improvements for it. But it looks to me
as though the slowness is in generating output results, and not
in the actual scanning/construction of the pagelist.
Pm
More information about the pmwiki-users
mailing list