Overcoming Spam on WordPress

By    John Garner on  Sunday, September 12, 2021
Summary: If you are often trying to figure out how to stop the constant ebb and flow of spam, when you allow comments on your site, you may find the below helpful.

Anyone who manages a WordPress site and allows people to leave comments probably also faces the constant issue of sorting through comments to weed out the spam. I'm pretty sure most people have a system that attempts to combat this in various ways.

There is no silver bullet. If a rogue company uses actual people to post spam to your site, it is unlikely you can stop this on the face value of the comment posted. Prevent people from leaving a URL either to their site or in the comment will some stop some types of comments.

But even with that, it will probably not catch and stop inappropriate comments or plain trolling.

These different types of comments and the associated tactics to stop them address different issues too:

  • Having spammy links on your website can hurt your Search Engine rankings
  • Having inappropriate / hateful and trolls comments can pollute and create a poor impression for visitors, but know that deleting comments you don't agree with is the wrong way to go. Note: If comments push unwanted information or opinions but remain respectful, deleting them is akin to censorship. You can ask that commenters provide proof or relent given that disinformation is rife and the modus operandi is to promote untruthful points of view without substantiating their arguments.
  • Lots of spam also means that this spam is in most cases going to be logged / inserted into the database that WordPress runs on, which is pollution, but in extreme cases can slow your website down.
  • Some extreme anti-spam solutions actually prevent actual people from posting comments
  • Middle of the row anti-spam solutions make it tedious for people to leave comments, think Captcha or Re-Captcha with a message telling you that you didn't recognise and input the correct distorted characters. Or you are on your 3rd screen of images from re-captcha where you need to identify all the images with motorbikes. And you think "forget it". "I can't be bothered to send them a message any more".

I've tried so many solutions over the years and they include:

  1. Solutions on the website that try to stop the spam using techniques that prevent automation (vast majority of spam is managed by automation, it takes the spammers less time, so costs less money).
  2. Solutions that use more advanced techniques than the previous category and stretch into re-captcha territory and frequently block actual people for straight forward reasons easily explained but you'll probably end up losing comments.
  3. Solutions that have a minimal install on the site and offload processing of the comments through triage to approve, flag as spam or delete etc.
  4. Solutions that sit as a net around your site and identify the IP addresses of known spammers to stop them from even getting to the site. This same technique is recommended in security principles to stop known bad apples before they can even try to infiltrate your site and is a principle of a lot of firewall products (i.e. iptables).
  5. Types of Machine Learning solutions that learn to identify tactics and other characteristics of spammers approaches and will actually adapt (but require input to do so). Some marketers will call this Artificial Intelligence or AI, the new buzzword in town, when in reality we couldn't afford it if it actually was artificial intelligence, but it sounds cool.

I have used Akismet on many occasions and even paid for it on commercial sites, but it has not really grown much in anti-spam functionality over the years. Maybe this is too harsh a comment, but with all the other approaches mentioned, Akismet hasn't really adopted alternative ways to stop spam as far as I can tell. I would put this is category 3 above. At the moment, not the first solution I would recommend.

I used a pretty controversial (see linked article) plugin called WPSpamShield for a few years and it was more of a category 2 plugin with some pretty caustic people behind it. Context: I work in a global consultancy company and I have to download big files frequently so I use a product called Internet Download Manager and when I tried to download a file from the company behind this plugin called Red Sand Marketing to debug a situation, I had some guy tell me, copying their 'lawyer' I guess, that I had tried to hack their site and if I continued they would take me to court. I was asked to download a file and their security systems switched from a small few kilobytes file to a humongous one. When I contacted them saying what is going on I was told I was trying to hack them. Because I (well IDM) tried to request their file too many times, it was shocking. The guy stopped any further discussion, saying they blacklisted me. I get the impression they went out of business though (should I say "shocking" too).

I currently use a solution that falls into category 1 from the list above and at some point it may stop being as good at preventing spam, but at the moment it stops the fully automated spam dead in its tracks. I have had one or two (what looked like) normal comments that were caught at the beginning and the author and I discussed it; for the first one the comment was clearly of inferior quality, but it was also from a blacklisted IP address. Trying to follow up with the author of the comment would lead nowhere, as if they weren't interested in talking 😉 The plugin author is probably correct in thinking it was actually a spammer trying to figure out why his script to automate spam wouldn't work on this site. But I think that now and again there will be a person that happens to get an IP that a spammer used and their comment will incorrectly be considered as spam because of the way they submit the spam plus the blacklisted IP.

WP Armour Honeypot AntiSpam
WP Armour Honeypot AntiSpam

Other than having purchased a lifetime deal of the WP Armour plugin, I have no relationship with the company and if you'd like to try it out, I currently recommend it to anybody that asks me about anti-spam solutions. There is even a free version. You can test it yourself via a simulation to check how it works and what people will see. The message that people see is not great, nor is it very self explanatory. Granted in 99% of cases, only automated scripts and not people will see it, but it would be good to direct people to a means to alert you if they have an issue, bots are unlikely to complain.

It would be nice if it had its own branded site and domain and I would like to see some additional functions. Like being able to approve a comment so that it goes back into the comment workflow. Some issues I see are more specific to my current website performance tweaking mindset: being able to log comments in an external DB that doesn't affect the same database as WP. You just in case I'd like to be able to check them without it hitting the WP DB. I had the same request to the authors of WPActivityLog, to have a simple external solution that logs could be sent to. They do have a place they send a copy to, but I wanted to avoid it being added into the local DB at all. I know it complicates things a bit, but it would be great to see alternative solutions with these systems that write a lot of data to DBs; a way to avoid adding to the main WP DB.

With WP Armour, you can see a screenshot of the log page below, I can set it to record none of what is submitted and considered as spam. For months now, I have been reviewing multiple pages of 50 spam entries like the below, week after week and have not seen one that was not spam. I am close to switching it off and thinking that if one person gets (I guess you could say) misdiagnosed is it worth all the hassle of reviewing all these and having them all created in the DB so that I can check them?

WP Armour HoneyPot AntiSpam spam
WP Armour HoneyPot AntiSpam spam

I use Cloudflare on a lot of my sites which can provide protection described in category 4 above, more of a security solution but does provide a certain type of protection against computers using IP addresses known to be used for nefarious deeds 😉 There is an option to prevent what is called "data scraping" too. I tried it and it blocked a tool I used to check a site for errors etc. Guess that means it works, but there wasn't a way I could find to whitelist the tool I used at that point.

Article written by  John Garner

Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Posts

Check out the most recent posts from the blog: 
Sunday, September 24, 2023
The reliability & accuracy of GenAI

I question the reliability and accuracy of Generative AI (GenAI) in enterprise scenarios, particularly when faced with adversarial questions, highlighting that current Large Language Models (LLMs) may be data-rich but lack in reasoning and causality. I would call for a more balanced approach to AI adoption in cases of assisting users, requiring supervision, and the need for better LLM models that can be trusted, learn, and reason.

Read More
Saturday, September 23, 2023
From Chatbots to Reducing Society's Technical Debt

I discuss my experience with chatbots, contrasting older rules-based systems with newer GenAI (General Artificial Intelligence) chatbots. We cannot dismiss the creative capabilities of GenAI-based chatbots, but these systems lack reliability, especially in customer-facing applications, and improvements in the way AI is structured could lead to a "software renaissance," potentially reducing society's technical debt.

Read More
Friday, June 16, 2023
The imbalance of power in the AI game: in search of the common good

The article discusses the contrasting debate on how AI safety is and should be managed, its impact on technical debt, and its societal implications.
It notes the Center for AI Safety's call for a worldwide focus on the risks of AI, and Meredith Whittaker's criticism that such warnings preserve the status quo, strengthening tech giants' dominance. The piece also highlights AI's potential to decrease societal and technical debt by making software production cheaper, simpler, and resulting in far more innovation. It provides examples of cost-effective open-source models that perform well and emphasizes the rapid pace of AI innovation. Last, the article emphasises the need for adaptive legislation to match the pace of AI innovation, empowering suitable government entities for oversight, defining appropriate scopes for legislation and regulation, addressing ethical issues and biases in AI, and promoting public engagement in AI regulatory decisions.

Read More
Thursday, June 1, 2023
Japan revises copyright laws for AI

Japan has made its ruling on the situation between Content creators and Businesses. Japanese companies that use AI have the freedom to use content for training purposes without the burden of copyright laws. This news about the copyright laws in Japan reported over at Technomancers is seen as Businesses: 1 / Content Creators: 0 The […]

Read More