[pmwiki-users] FW: cookbook "ShellTools" (was: Include specific lines of text on a page)

Sun Jan 20 16:21:57 CST 2008

(this was originally sent off-list, but it's getting confusing having part
of the conversation on-list and part off-list)

> Peter, I think nesting does not work well when processing multiple
> pages. That was the reason I included into the grep.php script an
> option of cut= and hide=, rather than trying to make these into
> separate MXPs (markup expressions, is this a usable acronym???).
> It is enough processing just to open every page in the file list once.
> So I propose to do the same for a tail= option, if that is useful:
> tail=n, n being an integer, would display the last n lines from every
> page which  matches the wildcard pattern.

How would you differentiate between these 2 calls which do very different
things:

Tail -n 20 file1 file2 | grep 'mytext'

Grep 'mytext' file1 file2 | tail -n 5

The first one takes the last 20 lines of 2 different files and then searches
for "mytext" within those 40 lines.  The second searches for "mytext" in 2
different files and takes the last 5 matching lines.  Clearly the order of
processing is going to be VERY important in this and I'm afraid it's going
to get very difficult to control this order of processing...

Another option would be to implement something like this:

{(shell tail -n 20 file1 file2 | grep "mytext")}
{(shell grep "mytext" file1 file2 | tail -n 5)}

Obviously at this point we've moved into a whole, completely other world
where we are implementing a subset of a completely different language.
Powerful?  Yes.  But I'm not sure either of us is prepared to put in the
huge number of hours required to do something like that...

Pseudo-code:
Split by \n or ;
For each of these expressions
	Split by | for piping
		For each of these expressions
			Access the appropriate function, saving the output
in an array and passing it on to the next function

Hmmm...  Maybe it's not that difficult after all...  Of course it would not
(could not) be as robust of a bash interpreter, but the basics might be
relatively easy to implement.  ParseArgs() unfortunately is going to be a
limiting factor because it's going to pick up any x=y anywhere in the line
and glump them all together into the $opt array - no way then to
differentiate which option was for which command...  It's a tool designed to
serve a single command and we would be implementing multiple commands within
the single call.

I suppose then you could look at a (!...!) markup or something to bypass
ParseArgs()...  Anyway, that's a different kettle of fish.

> But seeing this I think a lines=n option is more friendly and useful:
> lines=5 would display the first 5 lines from  every page, and
> lines=-5 would display the last 5 lines from every page matching.
> This would be similar to the (:include PageName lines=... :) option
> (but would work on multiple pages).

This implements and solves the immediate concern, but it takes away the
incredible power that comes from having a set of simple tools that the
user/administrator/author can then combine in ways far beyond the
toolsmith's plans and designs.  Basically if it's a one-tool-fits-all then
the person writing the tool would have to imagine each and every possible
combination and sequence and figure out how to do it - each new combination
would be a change to code instead of just a different application of the
existing tools.

I've spent a few minutes figuring out how ParseArgs() and markupexpr.php
work and now I've got a working {(tail ...)} which processes multiple files
and file patterns as well as handles the nesting situation.  I have not
tested the nesting when it has parentheses already in the text -- I'm afraid
it's going to cause problems.  In any event, the function is at the end of
this email...

It has 3 different options:

N=lines where "lines" is a number indicating how many pages from the end of
the file to be picked up.

Prefix="string" including automatically replaced strings PAGENAME and
PAGETITLE.  This prefix will come at the beginning of any set of lines
coming from each file.

Pipe=0 or non-zero - if this option is set to anything non-zero then it will
interpret the arguments as a nested call instead of as a series of
filenames.

I've tested it on files containing more and less lines than are in the n=x
option.  I've tested it to a single level of nesting.  As mentioned, I have
not tested it on text containing parentheses which might confuse the
markupexpr.php parsing engine...

Anyway, obviously (tail...) is the simplest possible tool to implement
(except head) but it does show the nesting and stuff...

> Looking again at the (:include ..:) options I realise that in the
> grep.php script I lost the ability to source text not just from whole
> pages, but from specified sections. I think it would be useful to gain
> that too.

This would be a very powerful feature - I have no idea how it's implemented,
but I've been reading enough of your stuff in the forum to know that you are
very experienced in that field.  And this may be one of the places where it
makes a lot of sense to deviate from simple text-file-processing and do
something wiki-specific.

Strictly from a shell perspective I would implement something like that in
the following manner:

Sed -n '/^#BEGINSECTIONA/,/^#ENDSECTIONA/p' file1 | grep "mytext" | tail -n
3

Or something like that.  (In this case sed is -n not printing lines by
default and then p=printing anything between /pattern1/ and /pattern2/.  Sed
also has line number capability (sed -n '32,43p' file1 | ...).  It also has
text replacement capability

Grep "SUCCESS" file1 file2 | sed 's/SUCCESS//'

That line does something like what you were solving a couple of posts ago.
It first finds the lines containing the given text.  Then in the matching
lines it does a text replacement of a given string (in this case SUCCESS --
I don't have the email in front of me for the specifics) and replaces it
with another string (in this case an empty string).  In the shelltools
cookbook in my envisioning that would look like this:

{(sed find="SUCCESS" replace="" pipe=1 (grep "SUCCESS" file1 file2))}

If I wanted to find the first n lines or the last n lines there could be an
insertion of a (head...) or (tail...) at any level outside the (grep...).  I
don't know if I'm adequately showing how powerful these simple tools can be
when given the flexibility to order them and combine them in a freeform
way...

Just throwing around my $0.02...  Probably at some point in here I should
have thorwn some things back to the list...  Oh, well...

-Peter

# {(tail OPTIONS PAGEPATTERN ...)} 
# Options: 
# n=number        - number of lines to print from end of file
# pipe=(anything) - if set to non-zero, take input from nested markup
instead of files
# prefix="string" - string which will become the line leading each set of
lines
#                   from the current file.  PAGENAME and PAGETITLE will be
replaced
#                   with the appropriate values.
# PAGEPATTERN...  - source pages from PageName or Group.Pagename 
# 		allowing wiki wildcards * and ?
# 		(multiple files/patterns allowed)
$MarkupExpr['tail'] = 'Tail($pagename, @$argp, @$args)'; 
function Tail($pagename, $opt, $filelist) 
{
	if (@$filelist[0]=='') return '';
	$n = (isset($opt['n'])) ? $opt['n'] : 10;
	$piped = isset($opt['pipe']) && @$opt['pipe'];
	$prefix = ($opt['prefix']) ? $opt['prefix'] : '';
	$grp = PageVar($pagename, '$Group');
	if ($piped) {
		$textrows = explode("\n",$filelist[0]);
		if ($n > count($textrows)) {
			$newrows = $textrows;
		} else {
			$newrows = array_slice($textrows,
count($textrows)-$n, $n);
		}
	} else {
		$sourcelist = array();
		foreach ($filelist as $filepat) {
			//check for group.name pattern
			if (strstr($filepat,'.')) 
				$pat = $filepat;
			else $pat = $grp.".".$filepat;
			//make preg pattern from wildcard pattern
			$prpat = GlobToPCRE(FixGlob($pat));
			//make list from preg name pattern
			$sourcelist = array_merge($sourcelist,
ListPages("/$prpat[0]/"));
		}
		$newrows = array();
		//process each source page in turn
		foreach($sourcelist as $source) {
			if ($source==$pagename) continue;
			$page = RetrieveAuthPage($source, 'read', true);
			if ($page) {
				$m = 0;
				$text = $page['text'];
				$textrows = explode("\n",$text);
				if ($prefix != '') {
					$this_prefix =
str_replace('PAGENAME', $source, $prefix);
					# How do I look up the title of a
page programmatically?
					#$this_prefix =
str_replace('PAGETITLE', $?title?, $prefix);
					$array_prefix = array($this_prefix);
				} else $array_prefix = array();
				if ($n > count($textrows)) {
					$newrows = array_merge($newrows,
$array_prefix, $textrows);
				} else {
					$newrows = array_merge($newrows,
$array_prefix, array_slice($textrows, count($textrows)-$n, $n));
				}
			}
		}
	}
	for ($i = 0; $i < count($newrows); $i++) {
		$newrows[$i] = "$i: $newrows[$i]";
		if (substr($newrows[$i], -2) != '\\\\') $newrows[$i] .=
'\\\\';
	}
	return implode("\n", $newrows);
}