[pmwiki-users] Honeypots for Spam (was Spam status and suggestion for PmWiki.org)

Tue Oct 10 16:07:28 CDT 2006

> From: "Patrick R. Michaud" <pmichaud at pobox.com>
> On Tue, Oct 10, 2006 at 11:00:04AM -0400, Neil Herber wrote:
> >    I don't think that a "honeypot" (aka "spam trap" on mail servers) is a
> >    winning proposition for a wiki (see reasons below).
> > 
> >    I would not want to automate adding items posted on the trap page to the
> >    blocklist for a few reasons:
> >    1) a malicious individual would simply post a series of valid URLs,
> >    poisoning the blocklist.
> >    2) a lot of spam includes valid URLs
> 
> _If_ we were to implement a honeypot on pmwiki.org, then we wouldn't
> block approved urls, and any honeypot-based blocks would go to a
> separate Blocklist-Honeypot page to make it easy to distinguish
> the automatic items from the manual ones.
> 
> >    3) blocking the posting IP is, in my experience, the least effective
> >    blocking method and prone to overkill if it is a proxy address
> 
> Agreed, partially.  
> 
> While blocking an IP doesn't provide much in the way of long-term 
> protection against wikispam, it can limit the amount of damage a 
> single spambot does to a site.  From what I've seen, some spambots
> will hit multiple pages on a wiki from the same IP address -- so
> once the spambot hits a "honeypot" page, then it would at least
> be prevented from defacing other pages on the site in the same
> "session".  That would be a plus.
> 
> And if the IP addresses are maintained in a separate 
> Site.Blocklist-Honeypot page, then there's no real problem with 
> clearing that page from time to time to "restore" the IP addresses 
> to the active pool.
> 

A few more thoughts.  

Honeypots are often used as tools to gather information about sources of
attack.  Making use of that information to provide some realtime
response and protection to limit the scope of an attack seems like a
nice plus.  Either way, honeypots can be helpful.

If, as Pm suggests, a separate Site.Blocklist-Honeypot page were used to
hault an automated spam attack while it is still in progress, then it
might make sense to compromise on the IP blocking issue by using the
entire IP address for entries in Blocklist-Honeypot (that addresses the
prior concern about blocking address ranges that are assigned by cable
providers, while still allowing an automated spam attack to be blocked
in progress).

If we were to start using honeypots to deal with automated attacks, then
it would make sense to gather additional information, such as domain and
keywords, even if those words are not automatically placed into a
blocklist.  By gathering that additional information in one place, we
could then take a look at what has been captured and make a judgment
about adding some word blocking (for example, after observing that the
Halloween and Prom Dresses attacks each left spam links pointing back
to the same domain, blogshots.nl).

FWIW, in my view, while all spam is bad, the worst of the worst are the
spam attacks that overwrite existing content on multiple pages within a
short period of time.  In an environment such as PmWiki.org, were
different people chip in to clean up these attacks, we end up missing
an opportunity to learn from these attacks (because we each just
restore, block an IP address, hopefully, and move on, without knowing
anything about the prior attack that involved the same patterns).

Pico