[pmwiki-users] New Recipe: Text Extract

Mon Jan 21 14:02:37 CST 2008

see http://www.pmwiki.org/wiki/Cookbook/TextExtract

A markup expression for extracting text lines from multiple pages
using regular expressions and wildcard pagename patterns.

Lines including directives are ignored to prevent unforeseeable results
(perhaps an option should allow this as well?).

PmWiki's (:include PageName:) markup allows display of text from
another page, but it is restricted to a single page input, and does
not have search options. The pagelist directive is wonderful, but
difficult to use to get output of specific text lines according to
search matches. This 'extract' markup expression offers very flexible
use by specifying criteria within the markup, and I hope it can fill a
gap left by PmWiki's 'include' and 'pagelist' directives.

* Requirement: PmWiki 2.2.beta

* File: extract.php  (was grep.php)

* Markup syntax:
{(extract Pattern PageName [PageName2] [PageName3] [keyword=value] ...)}

Arguments:

* Pattern - display lines matching regex pattern. Pattern must be the
  first argument. A dot '.' would include all of the page text.
* PageName - source pages from PageName or Group. Allowed are Wiki
  wildcards * and ? OR PageName#section.
* Options:
  prefix=link - display page link above extract;
  prefix=STRING display STRING above extract
  suffix=STRING - display STRING on line below text page extract
  cut=PATTERN - do not display lines matching PATTERN
  lines=n - display first n lines;
  lines=-n display last n lines;
  lines=n..m display lines from n to m inclusive;
  lines=n.. display lines from line n to end
  snip=PATTERN - do not display text matching PATTERN, remove it from
  the line

I hope the rename from 'grep' to 'extract' is welcome.

Perhaps other markup expressions furthering text extraction can be
added.

  ~Hans