From Chatbots to Reducing Society's Technical Debt

Summary: I discuss my experience with chatbots, contrasting older rules-based systems with newer GenAI (General Artificial Intelligence) chatbots. We cannot dismiss the creative capabilities of GenAI-based chatbots, but these systems lack reliability, especially in customer-facing applications, and improvements in the way AI is structured could lead to a "software renaissance," potentially reducing society's technical debt.

The Chatbot years

The rules-based chatbot approach was effective for me when I began testing them a few years ago, yielding the expected responses.

Unless they aligned with the question response pair I established, the end-user's experience was average. During my tests, getting a good and appropriate response happened infrequently or required a monumental amount of work. Meeting such a variety of expectations and needs through these scenarios was very time-consuming.

Later chatbot versions had the capability to use variables and branches to guide / coerce users towards a specific path through fully formed questions to achieve the desired outcome. It reduced the complexity considerably, but forcing the user to follow a set path wasn't a great user experience or outcome for the end user.

I therefore gave up on chatbots, deciding the experience was still not great.

AI based Chatbots

So with me being so enthusiastic about AI, writing articles, and doing presentations about it frequently, GenAI based chatbots must now be my every other day hobby. And with all the hype and companies investing in GenAI chatbots this must be the solution I'd be looking for. Well, actually, no. And I'm surprised so many companies are investing in chatbots that are customer facing. Here's why:

Even though GenAI is capable of generating some amazing, diverse and often very creative responses in a chatbot scenario, I have not experienced reliability.

Reliable, predictable, and repeatable glory. I have not experienced that.

You can have (the statistical probability of it depends on the context of course), home run after home run using GenAI based Chatbots. But there will always be hallucinations, brain fart, error result moments. Depending on the complexity of the context and questions it may even throw GenAI way off course and result in wrong, unexpected or even very bad results.

Is it that important you may say. Well when I can't predict what some system is going to say, to do, or not do, I would not consider that as a viable solution for a company. Maybe for amusement, "you never know what you will get", "surprise !".

But for customer service when people are frequently seeking help, or for resolution of an issue they have? I can't imagine how an unpredictable, unreliable and unrepeatable output is going to be appropriate.

With complicated questions, or what are called adversarial questions, ChatGPT, the best of the best, even the fine tuned LHRF version can't get to a 60% accuracy in the TruthfulQA set of questions.

Accuracy of Various LLMs on adversarial questions (ThruthfulQA mc1)

Ref: https://synthedia.substack.com/p/gpt-4-is-better-than-gpt-35-here

These tests pull on certain functions that our brains may struggle with at first, but through reasoning are achievable, but these same functions we use are not currently available in LLMs used by GenAI. Even the best LLMs, like the optimised Reinforcement Learning from Human Feedback (RLHF) versions, struggle with these tests.

Some people believe that GenAI's inherent creative streak, which people have grown to love, is the reason behind this. I suspect however, this is more about GenAI missing key functions if we were comparing to the human brain and how it works.

Wouldn't a human brain do better?

I spoke about this in my recent The future of AI in Enterprise presentation at HumanMade's AI event, related to what professor Diettrich talks about in this video, one I refer to and explain from Mahowald and Ivanona's paper (ref: https://arxiv.org/abs/2301.06627) and the table below from their paper:

Dissociating language and thought in large language models: a cognitive perspective

This falls down to, as professor Dietterich explains, how GenAI is missing functions from (our current understanding of) how the human brain works. And specifically what we have that LLMs are missing, as far as we know (given that OpenAI is no longer 'open' about the way GPT-4 is structured).

What's wrong with LLMs and what we should be building instead

Issues with LLMs: what we should be building instead

I've tested Chatbase and the AI Engine WordPress plugin (with embeddings / Pinecone). Both give great results, but the current GPT-4 model is simply not 100% reliable, reasonable or capable of repeatable results.

These 2 systems work well. The current LLM models they are connecting to, however, just don't give reliable results. The English configured version provided better results than a French version I set up with both Chatbase and the AI Engine.

To start with the French version using GPT-3.5 was constantly making up product or review page URLs when the setup was based on forcing it to only consider URLs from a specific set of URLs. But this also happened (although less) when I gained GPT-4 API access too.

Chatbots may not be a practical solution for customer-facing interactions until we reduce output errors and improve reliability. While using these systems as an assistant-based solution in a company makes sense, employees must still double-check any output from such tools. It's crucial for employees who use them to understand 'what good looks like', the correct answer, and to be able to spot issues right away.

The software renaissance; reducing society's technical debt

But like the article from Sequoia that summarises what they describe where we are as the first act of GenAI, as if we were part of a Shakespearean play, and that act 2 will be more interesting, I still feel that what the team over at SKVentures points to as a renaissance of the software world is far more interesting. When I see what Noel Tock is doing creating some amazing images and as I discussed with Christian Ulstrup, AI could help reduce society's technical debt by creating far cheaper software. And that all sounds far more interesting. Granted, Sequoia may make far less money (but the opposite could also be true).

As discussed here, talking about the good of open-source, SKVentures suggests that AI could have a really positive impact on the cost of software, cheaper and far more innovative software would reduce society's technical debt and finally follow the same downward trend that we have seen with CPUs, and other important hardware we need in the digital transformation of society in general:

Next Collapsing Tech Cost is Software Itself

It would be great if society can benefit from a software renaissance, one which benefits all of us, rather than being burdened by the technical debt SKVentures talks about.

Article written by John Garner

Recent Posts

Check out the most recent posts from the blog:

Wednesday, June 18, 2025

The ONHT Framework for Intermediate users

John Garner

No Comments

This Intermediate Guide for the ONHT (Objective, Needs, How, Trajectory) Framework transforms you from someone who uses GenAI into someone who thinks with GenAI by adding the missing cognitive functions that current GenAI lacks. The framework works through three critical pillars – Empathy (understanding all stakeholders), Critical Thinking (challenging assumptions), and Human in the Loop (active partnership). Master these patterns and you'll be solving complex problems others can't even approach, becoming indispensable by designing interactions that produce exceptional results rather than just functional outputs.

Monday, June 16, 2025

The ONHT Framework: Beginners Guide

John Garner

No Comments

Stop getting generic AI responses. Learn the four-letter framework that transforms vague requests into precise results. The ONHT framework: Objective (what problem you're solving), Needs (key information that matters), How (the thinking approach), and Trajectory (clear steps to the answer), teaches you to think WITH AI, not through it, turning "analyse customer feedback" into board-ready insights. Real examples show how adding context and structure gets you from Level 1 basics to Level 3 mastery, where AI delivers exactly what you need.
The difference? Knowing how to ask.

Sunday, June 15, 2025

The ONHT Framework: GenAI Prompting Solutions That Actually Work for People

John Garner

No Comments

GenAI tools are transforming work, but most people get poor results because they don't understand how to communicate with AI built on structured data. This guide is a series of articles that teaches the ONHT framework—a systematic approach to prompting that transforms vague requests into exceptional outputs by focusing on Objectives (what problem), Needs (what information), How (thinking approach), and Trajectory (path to solution). Master this framework and develop an expert mindset grounded in human-in-the-loop thinking, critical analysis, and empathy, and you'll excel with any AI tool, at any company, in any role.

Sunday, September 24, 2023

The reliability & accuracy of GenAI

John Garner

No Comments

I question the reliability and accuracy of Generative AI (GenAI) in enterprise scenarios, particularly when faced with adversarial questions, highlighting that current Large Language Models (LLMs) may be data-rich but lack in reasoning and causality. I would call for a more balanced approach to AI adoption in cases of assisting users, requiring supervision, and the need for better LLM models that can be trusted, learn, and reason.

From Chatbots to Reducing Society's Technical Debt

The Chatbot years

AI based Chatbots

Wouldn't a human brain do better?

The software renaissance; reducing society's technical debt

Leave a Reply

Recent Posts