[pmwiki-users] TextExtract (Search recipe) update

ABClf languefrancaise at gmail.com
Wed Sep 9 15:46:49 CDT 2009


KWIC format is mostly used for showing what's next to the searched pattern ;
its purpose is to print in a convenient way the words that enclose the
queried pattern.Usually using a fixed size font, colouring or spacing the
targeted word more than normal, X glyphs before and X glyphs after (glyphs
or words, no matter).
(For a real one : http://www.someya-net.com/concordancer/index.html)

This is a visualisation tool and a search program, not as generic as Oliver
mentioned in his post (How hard would it be to define the extent of
the extract not in paragraphs or lines but in words?)
I permit to ask in case you would create a new word excerpt unit ; then,
would it be possible as an option to show and style the result in a kwic
format (i.e. the pattern highlighted –that's already done– and centred,
idealy with fixed size font) ?

Nothing more than a suggestion,
Gilles.







2009/9/9 Hans <design5 at softflow.co.uk>

> Wednesday, September 9, 2009, 2:43:49 PM, ABClf wrote:
>
> > One word to say that what would be nice too (for my own point of
> > view), is to get a KWIC format ; then one can use textextract to
> > produce lexical analysis inside pmwiki.
>
> i am not familiar with KWIC.
> Is that the same than to add a wordcount boundary around each found
> term? Like "show term within 25 words before and after".
> The number of words could be specified by the option parameter.
>
> And should such boundaries go beyond paragraphs, or stop at
> paragraph divisions. i think it would look better if they stop,
> otherwise a result may show just a few words from the preceding
> paragraph, which may make no sense at all. If we specify unit=para
> and say words=20 we would stay automatically within pargraphs,
> just restrict the output a bit more.
>
> And what about sentence boundaries? It would be extremely hard to
> determine any of those. I don't think I'd like to go that way.
>
>
>  ~Hans
>
>


-- 
---------------------------------------
| A | de la langue française
| B | http://www.languefrancaise.net/
| C | languefrancaise at gmail.com
---------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pmichaud.com/pipermail/pmwiki-users/attachments/20090909/4dd0ffc6/attachment.html 


More information about the pmwiki-users mailing list