The Chatbot years
The rules-based chatbot approach was effective for me when I began testing them a few years ago, yielding the expected responses.
Unless they aligned with the question response pair I established, the end-user's experience was average. During my tests, getting a good and appropriate response happened infrequently or required a monumental amount of work. Meeting such a variety of expectations and needs through these scenarios was very time-consuming.
Later chatbot versions had the capability to use variables and branches to guide / coerce users towards a specific path through fully formed questions to achieve the desired outcome. It reduced the complexity considerably, but forcing the user to follow a set path wasn't a great user experience or outcome for the end user.
I therefore gave up on chatbots, deciding the experience was still not great.
AI based Chatbots
So with me being so enthusiastic about AI, writing articles, and doing presentations about it frequently, GenAI based chatbots must now be my every other day hobby. And with all the hype and companies investing in GenAI chatbots this must be the solution I'd be looking for. Well, actually, no. And I'm surprised so many companies are investing in chatbots that are customer facing. Here's why:
Even though GenAI is capable of generating some amazing, diverse and often very creative responses in a chatbot scenario, I have not experienced reliability.
Reliable, predictable, and repeatable glory. I have not experienced that.
You can have (the statistical probability of it depends on the context of course), home run after home run using GenAI based Chatbots. But there will always be hallucinations, brain fart, error result moments. Depending on the complexity of the context and questions it may even throw GenAI way off course and result in wrong, unexpected or even very bad results.
Is it that important you may say. Well when I can't predict what some system is going to say, to do, or not do, I would not consider that as a viable solution for a company. Maybe for amusement, "you never know what you will get", "surprise !".
But for customer service when people are frequently seeking help, or for resolution of an issue they have? I can't imagine how an unpredictable, unreliable and unrepeatable output is going to be appropriate.
With complicated questions, or what are called adversarial questions, ChatGPT, the best of the best, even the fine tuned LHRF version can't get to a 60% accuracy in the TruthfulQA set of questions.
These tests pull on certain functions that our brains may struggle with at first, but through reasoning are achievable, but these same functions we use are not currently available in LLMs used by GenAI. Even the best LLMs, like the optimised Reinforcement Learning from Human Feedback (RLHF) versions, struggle with these tests.
Some people believe that GenAI's inherent creative streak, which people have grown to love, is the reason behind this. I suspect however, this is more about GenAI missing key functions if we were comparing to the human brain and how it works.
Wouldn't a human brain do better?
I spoke about this in my recent The future of AI in Enterprise presentation at HumanMade's AI event, related to what professor Diettrich talks about in this video, one I refer to and explain from Mahowald and Ivanona's paper (ref: https://arxiv.org/abs/2301.06627) and the table below from their paper:
This falls down to, as professor Dietterich explains, how GenAI is missing functions from (our current understanding of) how the human brain works. And specifically what we have that LLMs are missing, as far as we know (given that OpenAI is no longer 'open' about the way GPT-4 is structured).
I've tested Chatbase and the AI Engine WordPress plugin (with embeddings / Pinecone). Both give great results, but the current GPT-4 model is simply not 100% reliable, reasonable or capable of repeatable results.
These 2 systems work well. The current LLM models they are connecting to, however, just don't give reliable results. The English configured version provided better results than a French version I set up with both Chatbase and the AI Engine.
To start with the French version using GPT-3.5 was constantly making up product or review page URLs when the setup was based on forcing it to only consider URLs from a specific set of URLs. But this also happened (although less) when I gained GPT-4 API access too.
Chatbots may not be a practical solution for customer-facing interactions until we reduce output errors and improve reliability. While using these systems as an assistant-based solution in a company makes sense, employees must still double-check any output from such tools. It's crucial for employees who use them to understand 'what good looks like', the correct answer, and to be able to spot issues right away.
The software renaissance; reducing society's technical debt
But like the article from Sequoia that summarises what they describe where we are as the first act of GenAI, as if we were part of a Shakespearean play, and that act 2 will be more interesting, I still feel that what the team over at SKVentures points to as a renaissance of the software world is far more interesting. When I see what Noel Tock is doing creating some amazing images and as I discussed with Christian Ulstrup, AI could help reduce society's technical debt by creating far cheaper software. And that all sounds far more interesting. Granted, Sequoia may make far less money (but the opposite could also be true).
As discussed here, talking about the good of open-source, SKVentures suggests that AI could have a really positive impact on the cost of software, cheaper and far more innovative software would reduce society's technical debt and finally follow the same downward trend that we have seen with CPUs, and other important hardware we need in the digital transformation of society in general:
It would be great if society can benefit from a software renaissance, one which benefits all of us, rather than being burdened by the technical debt SKVentures talks about.