From Chatbots to Reducing Society's Technical Debt

By    John Garner on  Saturday, September 23, 2023
Summary: I discuss my experience with chatbots, contrasting older rules-based systems with newer GenAI (General Artificial Intelligence) chatbots. We cannot dismiss the creative capabilities of GenAI-based chatbots, but these systems lack reliability, especially in customer-facing applications, and improvements in the way AI is structured could lead to a "software renaissance," potentially reducing society's technical debt.

The Chatbot years

The rules-based chatbot approach was effective for me when I began testing them a few years ago, yielding the expected responses.

Unless they aligned with the question response pair I established, the end-user's experience was average. During my tests, getting a good and appropriate response happened infrequently or required a monumental amount of work. Meeting such a variety of expectations and needs through these scenarios was very time-consuming.

Later chatbot versions had the capability to use variables and branches to guide / coerce users towards a specific path through fully formed questions to achieve the desired outcome. It reduced the complexity considerably, but forcing the user to follow a set path wasn't a great user experience or outcome for the end user.

I therefore gave up on chatbots, deciding the experience was still not great.

AI based Chatbots

So with me being so enthusiastic about AI, writing articles, and doing presentations about it frequently, GenAI based chatbots must now be my every other day hobby. And with all the hype and companies investing in GenAI chatbots this must be the solution I'd be looking for. Well, actually, no. And I'm surprised so many companies are investing in chatbots that are customer facing. Here's why:

Even though GenAI is capable of generating some amazing, diverse and often very creative responses in a chatbot scenario, I have not experienced reliability.

Reliable, predictable, and repeatable glory. I have not experienced that.

You can have (the statistical probability of it depends on the context of course), home run after home run using GenAI based Chatbots. But there will always be hallucinations, brain fart, error result moments. Depending on the complexity of the context and questions it may even throw GenAI way off course and result in wrong, unexpected or even very bad results.

Is it that important you may say. Well when I can't predict what some system is going to say, to do, or not do, I would not consider that as a viable solution for a company. Maybe for amusement, "you never know what you will get", "surprise !".

But for customer service when people are frequently seeking help, or for resolution of an issue they have? I can't imagine how an unpredictable, unreliable and unrepeatable output is going to be appropriate.

With complicated questions, or what are called adversarial questions, ChatGPT, the best of the best, even the fine tuned LHRF version can't get to a 60% accuracy in the TruthfulQA set of questions.

Accuracy of Various LLMs on adversarial questions (ThruthfulQA mc1)

Ref: https://synthedia.substack.com/p/gpt-4-is-better-than-gpt-35-here

These tests pull on certain functions that our brains may struggle with at first, but through reasoning are achievable, but these same functions we use are not currently available in LLMs used by GenAI. Even the best LLMs, like the optimised Reinforcement Learning from Human Feedback (RLHF) versions, struggle with these tests.

Some people believe that GenAI's inherent creative streak, which people have grown to love, is the reason behind this. I suspect however, this is more about GenAI missing key functions if we were comparing to the human brain and how it works.

Wouldn't a human brain do better?

I spoke about this in my recent The future of AI in Enterprise presentation at HumanMade's AI event, related to what professor Diettrich talks about in this video, one I refer to and explain from Mahowald and Ivanona's paper (ref: https://arxiv.org/abs/2301.06627) and the table below from their paper:

Dissociating language and thought in large language models: a cognitive perspective

This falls down to, as professor Dietterich explains, how GenAI is missing functions from (our current understanding of) how the human brain works. And specifically what we have that LLMs are missing, as far as we know (given that OpenAI is no longer 'open' about the way GPT-4 is structured).

What's wrong with LLMs and what we should be building instead
Issues with LLMs: what we should be building instead

I've tested Chatbase and the AI Engine WordPress plugin (with embeddings / Pinecone). Both give great results, but the current GPT-4 model is simply not 100% reliable, reasonable or capable of repeatable results.

These 2 systems work well. The current LLM models they are connecting to, however, just don't give reliable results. The English configured version provided better results than a French version I set up with both Chatbase and the AI Engine.

To start with the French version using GPT-3.5 was constantly making up product or review page URLs when the setup was based on forcing it to only consider URLs from a specific set of URLs. But this also happened (although less) when I gained GPT-4 API access too.

Chatbots may not be a practical solution for customer-facing interactions until we reduce output errors and improve reliability. While using these systems as an assistant-based solution in a company makes sense, employees must still double-check any output from such tools. It's crucial for employees who use them to understand 'what good looks like', the correct answer, and to be able to spot issues right away.

The software renaissance; reducing society's technical debt

But like the article from Sequoia that summarises what they describe where we are as the first act of GenAI, as if we were part of a Shakespearean play, and that act 2 will be more interesting, I still feel that what the team over at SKVentures points to as a renaissance of the software world is far more interesting. When I see what Noel Tock is doing creating some amazing images and as I discussed with Christian Ulstrup, AI could help reduce society's technical debt by creating far cheaper software. And that all sounds far more interesting. Granted, Sequoia may make far less money (but the opposite could also be true).

As discussed here, talking about the good of open-source, SKVentures suggests that AI could have a really positive impact on the cost of software, cheaper and far more innovative software would reduce society's technical debt and finally follow the same downward trend that we have seen with CPUs, and other important hardware we need in the digital transformation of society in general:

Next Collapsing Tech Cost is Software Itself

It would be great if society can benefit from a software renaissance, one which benefits all of us, rather than being burdened by the technical debt SKVentures talks about.

Article written by  John Garner

Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Posts

Check out the most recent posts from the blog: 
Sunday, September 24, 2023
The reliability & accuracy of GenAI

I question the reliability and accuracy of Generative AI (GenAI) in enterprise scenarios, particularly when faced with adversarial questions, highlighting that current Large Language Models (LLMs) may be data-rich but lack in reasoning and causality. I would call for a more balanced approach to AI adoption in cases of assisting users, requiring supervision, and the need for better LLM models that can be trusted, learn, and reason.

Read More
Saturday, September 23, 2023
From Chatbots to Reducing Society's Technical Debt

I discuss my experience with chatbots, contrasting older rules-based systems with newer GenAI (General Artificial Intelligence) chatbots. We cannot dismiss the creative capabilities of GenAI-based chatbots, but these systems lack reliability, especially in customer-facing applications, and improvements in the way AI is structured could lead to a "software renaissance," potentially reducing society's technical debt.

Read More
Friday, June 16, 2023
The imbalance of power in the AI game: in search of the common good

The article discusses the contrasting debate on how AI safety is and should be managed, its impact on technical debt, and its societal implications.
It notes the Center for AI Safety's call for a worldwide focus on the risks of AI, and Meredith Whittaker's criticism that such warnings preserve the status quo, strengthening tech giants' dominance. The piece also highlights AI's potential to decrease societal and technical debt by making software production cheaper, simpler, and resulting in far more innovation. It provides examples of cost-effective open-source models that perform well and emphasizes the rapid pace of AI innovation. Last, the article emphasises the need for adaptive legislation to match the pace of AI innovation, empowering suitable government entities for oversight, defining appropriate scopes for legislation and regulation, addressing ethical issues and biases in AI, and promoting public engagement in AI regulatory decisions.

Read More
Thursday, June 1, 2023
Japan revises copyright laws for AI

Japan has made its ruling on the situation between Content creators and Businesses. Japanese companies that use AI have the freedom to use content for training purposes without the burden of copyright laws. This news about the copyright laws in Japan reported over at Technomancers is seen as Businesses: 1 / Content Creators: 0 The […]

Read More
crossmenuarrow-down