[pmwiki-users] An idea for extensible searching

Michael Tempest michael.tempest at ate-aerospace.com
Tue Jul 25 09:47:34 CDT 2006


Hi

There has been a lot of traffic on this list about (:pagelist:)
recently. Here is one idea for more extensible filtering.

I think this idea is practical and would be useful, (or else I wouldn't
be posting it) but it has rough edges.
I'm willing to have a go at implementing it, but I'd like to draw on the
collective wisdom of the list before committing to code...

Currently, the basic syntax is like this (taken straight from
PmWiki.PageLists):
  (:pagelist group=abc fmt=def list=ghi order=jkl argument1 argument2 etc:)

I'd like to propose an extension that is more flexible but can be reduced to
the old form (which means it is backwards-compatible). Here is the "full"
version of the example above, which I will explain below:
  (:pagelist fmt=def order=jkl group=matches:abc fullname=list:ghi
text=matches:argument1 text=matches:argument2 text=matches:etc:)

Firstly, fmt and order have no effect on which pages are included in the
list, so I moved them to the front for clarity. The same would apply to
count.

The change is this: all search terms have the form "part=test:value".
(Or, in different words: aspect=operator:operand)

The basic list of parts to be searched on is: group, name, fullname, time,
ctime, text, link, trail.
In terms of implementation, I would categorise these parts in terms of where
the data comes from.
* These come from the full names of pages: group, name, fullname
* These come from the page index file: time, link, text
* These come from the actual pages: ctime, trail

This categorisation makes optimising the search easier, because the search
terms applied to full names are generally quicker than the search terms
applied to the page index file, which are in turn quicker than those applied
to the actual page content.

The list of of parts can be extended by defining custom functions for new
parts. Once again, the new "part" would have to fall into one of the
categories above. Here are some examples of custom "part" extensions:
* From the full name: extract a fixed portion of the name, like "2006-07-24"
from "MyGroup.Event-2006-07-24-Brooklyn"
* From the page itself: extract the history, or extract edit permission, or
the title, or page-variables

I think there are some truly generic "test" functions like a regex pattern
match, but there are others that are less accommodating - like a date/time
test. So, here is a list of tests, and the parts they could be applied to:
* list - predefined regex filter on page names - applies to fullname
* match - a sort-of-wildcard match - applies to fullname, group, name, link, 
text e.g. name=match:PageList*
* from - a page name specifier - applies to trail e.g. trail=from:RecentChanges
* youngerthan - does an "age" compare - applies to time and ctime e.g. 
ctime=youngerthan:7

Note that the "match" test corresponds to the current behaviour for "name"
and "group" and tests on the text. That is, it would handle things like
"group=match:-PmWiki,-Site".

For backwards compatibility, there should be a default "test" for each
"part". Also, the default "part" is "text".
"list=ghi" would be transformed into "fullname=list:ghi". If no "list" test
is specified, then "fullname=list:default" is assumed.

Here are some things I know I have not considered properly:
* Searching on trails seems clunky in this proposal. I don't like my
solution in this respect. Any better ideas?
* What should the "youngerthan" test do if applied to data that is obviously
not a time or date?
* I'm not sure how to generalise "inverse" checks like "group=-Site".
"youngerthan:ctime=-7" is just plain wrong. I would prefer a more flexible
way of combining terms with boolean algebra, e.g. "! youngerthan:ctime=7"
but that is separate and complex issue. I do not want to go into that here.
* Could this be used to "merge" pagelist with attachlist, sensibly?
* The "a=b:c" notation is neither pretty nor particularly intuitive but it is 
the best idea I've had.

BTW - I've intentionally done nothing to the fmt, order and count parameters. I 
think fmt
is supposed to be completely orthogonal to the others as fmt defines
presentation but the others define content. The order parameter could be
extended with something like "order=jkl order=mno" meaning first order by
jkl, then by mno - but that is a completely different discussion.
There is more than one way to handle "count", but I'd rather leave it alone for 
now.

Finally - thanks to Pm (and others) for such delightful code. I feel quite at 
home :-)

Have a great day

Michael  




More information about the pmwiki-users mailing list