[Pmwiki-users] Re: Spamming technique
Sun May 23 20:12:50 CDT 2004
Sound good Chrisitan, but how difficult would it be to write a php
script (or cookbook recipe) to do the same? Not all of us are
comfortable with bash script, and not all of us have shell access.
I like the idea of only seeing whihc ones have been updated since the
last time you checked though. Sounds like a real time saver.
----- Original Message -----
From: "Christian Ridderstr?m" <chr at home.se>
To: <pmwiki-users at pmichaud.com>
Sent: Sunday, May 23, 2004 2:18 PM
Subject: [Pmwiki-users] Re: Spamming technique
> On Sun, 23 May 2004, Crisses wrote:
> > Hey, has anyone tried to run a grep on their site in the wiki.d
> > to see all http:// requests in their pages? maybe I'll do that
> > to eyeball what comes up). an initial "approved" file would be
> > easy to make from there.
> I just ran the following command (in bash):
> grep ^text= wiki.d/* | tr ? \\n | tr " " \\n | tr ']' \\n | \
> grep -i http: | sed -e " s/.*\(http.*\)/\\1/" | sort | \
> uniq > URIs.lst
> and it extracts a unique lists of URIs starting with 'http:'. The
> was a rather long list (more than 400 URIs). Realizing that I will use
> this command again, I ended up putting in a script 'find-URIs.sh' that
> you can find here:
> In order for you to use this script, you have to modify the line
> dir0=~lyx/www/pmwiki # pmwiki/-directory
> so that $dir0 points to your wiki directory. Then you can execute the
> script through:
> ./find-URIs.sh +n > URIs.lst
> which puts the result in a file called 'URIs.lst'.
> Since I get so many URIs, I've put some 'valid-URIs.lst' in a file
> use to filter the result as follows:
> cat URIs.lst | grep -v -f valid-URIs.lst
> I still end up with about 200 links that I manually check (basically I
> just have to glance at them to see that they look reasonable).
> What I've done now is to check in 'URIs.lst' into my version control
> system, so that the next time I run the command to check for URIs, I
> simply see which URIs are new.
> Oh, and I did find that another WikiSandbox-page had some bad links in
> Christian Ridderstr?m
> Pmwiki-users mailing list
> Pmwiki-users at pmichaud.com
More information about the pmwiki-users