Overcoming Spam on WordPress

By    John Garner on  Sunday, September 12, 2021
Summary: If you are often trying to figure out how to stop the constant ebb and flow of spam, when you allow comments on your site, you may find the below helpful.

Anyone who manages a WordPress site and allows people to leave comments probably also faces the constant issue of sorting through comments to weed out the spam. I'm pretty sure most people have a system that attempts to combat this in various ways.

There is no silver bullet. If a rogue company uses actual people to post spam to your site, it is unlikely you can stop this on the face value of the comment posted. Prevent people from leaving a URL either to their site or in the comment will some stop some types of comments.

But even with that, it will probably not catch and stop inappropriate comments or plain trolling.

These different types of comments and the associated tactics to stop them address different issues too:

  • Having spammy links on your website can hurt your Search Engine rankings
  • Having inappropriate / hateful and trolls comments can pollute and create a poor impression for visitors, but know that deleting comments you don't agree with is the wrong way to go. Note: If comments push unwanted information or opinions but remain respectful, deleting them is akin to censorship. You can ask that commenters provide proof or relent given that disinformation is rife and the modus operandi is to promote untruthful points of view without substantiating their arguments.
  • Lots of spam also means that this spam is in most cases going to be logged / inserted into the database that WordPress runs on, which is pollution, but in extreme cases can slow your website down.
  • Some extreme anti-spam solutions actually prevent actual people from posting comments
  • Middle of the row anti-spam solutions make it tedious for people to leave comments, think Captcha or Re-Captcha with a message telling you that you didn't recognise and input the correct distorted characters. Or you are on your 3rd screen of images from re-captcha where you need to identify all the images with motorbikes. And you think "forget it". "I can't be bothered to send them a message any more".

I've tried so many solutions over the years and they include:

  1. Solutions on the website that try to stop the spam using techniques that prevent automation (vast majority of spam is managed by automation, it takes the spammers less time, so costs less money).
  2. Solutions that use more advanced techniques than the previous category and stretch into re-captcha territory and frequently block actual people for straight forward reasons easily explained but you'll probably end up losing comments.
  3. Solutions that have a minimal install on the site and offload processing of the comments through triage to approve, flag as spam or delete etc.
  4. Solutions that sit as a net around your site and identify the IP addresses of known spammers to stop them from even getting to the site. This same technique is recommended in security principles to stop known bad apples before they can even try to infiltrate your site and is a principle of a lot of firewall products (i.e. iptables).
  5. Types of Machine Learning solutions that learn to identify tactics and other characteristics of spammers approaches and will actually adapt (but require input to do so). Some marketers will call this Artificial Intelligence or AI, the new buzzword in town, when in reality we couldn't afford it if it actually was artificial intelligence, but it sounds cool.

I have used Akismet on many occasions and even paid for it on commercial sites, but it has not really grown much in anti-spam functionality over the years. Maybe this is too harsh a comment, but with all the other approaches mentioned, Akismet hasn't really adopted alternative ways to stop spam as far as I can tell. I would put this is category 3 above. At the moment, not the first solution I would recommend.

I used a pretty controversial (see linked article) plugin called WPSpamShield for a few years and it was more of a category 2 plugin with some pretty caustic people behind it. Context: I work in a global consultancy company and I have to download big files frequently so I use a product called Internet Download Manager and when I tried to download a file from the company behind this plugin called Red Sand Marketing to debug a situation, I had some guy tell me, copying their 'lawyer' I guess, that I had tried to hack their site and if I continued they would take me to court. I was asked to download a file and their security systems switched from a small few kilobytes file to a humongous one. When I contacted them saying what is going on I was told I was trying to hack them. Because I (well IDM) tried to request their file too many times, it was shocking. The guy stopped any further discussion, saying they blacklisted me. I get the impression they went out of business though (should I say "shocking" too).

I currently use a solution that falls into category 1 from the list above and at some point it may stop being as good at preventing spam, but at the moment it stops the fully automated spam dead in its tracks. I have had one or two (what looked like) normal comments that were caught at the beginning and the author and I discussed it; for the first one the comment was clearly of inferior quality, but it was also from a blacklisted IP address. Trying to follow up with the author of the comment would lead nowhere, as if they weren't interested in talking 😉 The plugin author is probably correct in thinking it was actually a spammer trying to figure out why his script to automate spam wouldn't work on this site. But I think that now and again there will be a person that happens to get an IP that a spammer used and their comment will incorrectly be considered as spam because of the way they submit the spam plus the blacklisted IP.

WP Armour Honeypot AntiSpam
WP Armour Honeypot AntiSpam

Other than having purchased a lifetime deal of the WP Armour plugin, I have no relationship with the company and if you'd like to try it out, I currently recommend it to anybody that asks me about anti-spam solutions. There is even a free version. You can test it yourself via a simulation to check how it works and what people will see. The message that people see is not great, nor is it very self explanatory. Granted in 99% of cases, only automated scripts and not people will see it, but it would be good to direct people to a means to alert you if they have an issue, bots are unlikely to complain.

It would be nice if it had its own branded site and domain and I would like to see some additional functions. Like being able to approve a comment so that it goes back into the comment workflow. Some issues I see are more specific to my current website performance tweaking mindset: being able to log comments in an external DB that doesn't affect the same database as WP. You just in case I'd like to be able to check them without it hitting the WP DB. I had the same request to the authors of WPActivityLog, to have a simple external solution that logs could be sent to. They do have a place they send a copy to, but I wanted to avoid it being added into the local DB at all. I know it complicates things a bit, but it would be great to see alternative solutions with these systems that write a lot of data to DBs; a way to avoid adding to the main WP DB.

With WP Armour, you can see a screenshot of the log page below, I can set it to record none of what is submitted and considered as spam. For months now, I have been reviewing multiple pages of 50 spam entries like the below, week after week and have not seen one that was not spam. I am close to switching it off and thinking that if one person gets (I guess you could say) misdiagnosed is it worth all the hassle of reviewing all these and having them all created in the DB so that I can check them?

WP Armour HoneyPot AntiSpam spam
WP Armour HoneyPot AntiSpam spam

I use Cloudflare on a lot of my sites which can provide protection described in category 4 above, more of a security solution but does provide a certain type of protection against computers using IP addresses known to be used for nefarious deeds 😉 There is an option to prevent what is called "data scraping" too. I tried it and it blocked a tool I used to check a site for errors etc. Guess that means it works, but there wasn't a way I could find to whitelist the tool I used at that point.

Article written by  John Garner

Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Posts

Check out the most recent posts from the blog: 
Tuesday, May 23, 2023
Sustainable Enterprise AI Adoption: Protecting Confidentiality, Ensuring Accuracy, and Successful Business Integration

The public's recent access to breakthroughs in AI has sparked excitement but their integration into businesses often leads to significant issues, especially without proper management. Implementing AI effectively requires robust security measures to protect sensitive data, investment in unbiased technology, sufficient training for understanding AI systems, identification of the best AI use cases, assurance of reliable data sources, and careful management to prevent over-reliance on AI over human workforce. It's also critical to understand that AI systems like ChatGPT have their limitations and inaccuracies, and they need continuous monitoring and fine-tuning, while keeping in mind that these technologies have evolved from a long history of advancements, thanks to various companies and organizations.

Read More
Saturday, May 13, 2023
AI in my pocket

A novel AI topic that is trending, is around the porting of foundation models like Llama on to Google Pixel phones. This also maps to the leaked Google Memo about the threat of open source to their general 'moat model'.

Read More
Wednesday, May 10, 2023
AI: I see hallucinations

Discussing AI-generated hallucinations in language models like ChatGPT, which sometimes provide incorrect or fictional information aka BS. This problem is concerning for businesses that require trustworthy and predictable systems. While search engines like Google and Bing attempt to improve their accuracy and user experience, neither is perfect. The unpredictability of AI systems raises concerns about high-stakes decisions and public trust. Is the closing of OpenAI’s open-source projects a good idea? Could it benefit from expert analysis to understand and mitigate AI hallucinations?

Read More
Monday, May 8, 2023
AI promises: the good, the bad, the ugly

Looking at the current condition and possibilities of AI and AGI, emphasizing the rapid progress, benefits, and potential risks linked to their development. AI tools are already driving productivity gains in various industries. We look at applications ranging from farming to law. However, concerns about the security, accuracy, and ethical implications of these technologies persist. Some experts, like Dr. Geoffrey Hinton, are advocating for stricter regulation and caution in AI development.

Read More