UbiKann - Empowering Ubiquity

The reliability & accuracy of GenAI

Reliability and accuracy of GenAI

Reliability and accuracy of GenAI

When you're used to working with Enterprise type solutions that require 99.999% accuracy, reliability and performance, looking at using GenAI was baffling at first. Why are so many clever people, experts even, promoting this tool as a game changer even for scenarios where complete accuracy is required. Accuracy that could otherwise lead to major issues and even devastating effects. Getting bad press for a company's online support system could be damaging. But basing decisions on a system that cannot get above to a 60% pass rate even with optimised (LHRF) LLMs, when faced with adversarial questions is worrying.

You get the impression that a flawed model, the current known LLM structure, was put on an astonishingly powerful system it, was given an overwhelming amount of data (that from a statistical and probability perspective makes it h mistakes), forced to accept what good looks like, how to sound authoritarian and given a visual interface and animation that is pleasing to us (for its main function).

I was reminded of this when I was writing about my experience with chatbots over the years.

In the first versions of chatbots, you had to create many scenarios to get data that would correspond with the answer needed for all potential questions.

Second chatbot versions attempted to manipulate users into accepting their question and answer format rather than addressing their queries.

Now we have GenAI based chatbots, that are great at assisting us with certain things, and are capable of being very creative. But like Gary Marcus argues, the 'distribution shift' issue and more specifically the failures of GenAI in basic cognitive tests demonstrate its lack of proper reasoning and causality based thought process.
He takes the example where data providing clear relationship structures is presented to GenAI and LLMs cannot understand (do a simple reverse engineering and put in context) the relationship between son and mother. You have all you need to infer a relationship from either side of 2 data points. As Gary says, when "you know Tom is Mary Lee‘s son, but can’t figure out without special prompting that Mary Lee therefore is Tom’s mother, you have no business running all the world’s software." Ref: Owain Evans: the reversal curse, "LLMs trained on "A is B" fail to learn "B is A"".

So it seems like we have an AI model known as Generative AI based on a LLM that is flawed in its structure but can with both extremely powerful and high performing hardware plus unimaginable amounts of data guess the right answer say to a complex exam question but will get both fairly simple and adversarial questions wrong as they require reasoning.

I've met people who fit in that same category. Good at reciting and learning things off by heart, even recognising patterns, but struggle with strategy, new scenarios or change in general. So it's no wonder GenAI gets so much hype and can pull off the feat of being compared to human capabilities. Even Gary should see why the comparison can fool people. We all know someone like ChatGPT 🙂

On a tangent topic, these LLMs also require we forget about the fact that they are based on stolen data. I wouldn't be surprised if every single LLM hides troves or stolen data it uses to generate its sparks of good to amazing content. Now and again complete nonsense, made up things, while giving you the impression there is no possible way the answer you just got is anything but perfect.

As we discover what GenAI can do, it's important to understand its limits and adapt our use of such tools, but more importantly, when and where we use it. Both guardrails, limits and restrictions are required. But also clear guidelines and disclosure about the fact it is not reliable or accurate and in specific cases is even less reliable.

When Gary also says "If I say all odd numbers are prime, 1, 3, 5, and 7 may count in my favor, but at 9 the game is over." it should remind us that it may give us correct answers sometimes but not all the time. That we need to be aiming for better models and systems. Better models and systems that can remember things, learn things and be more accurate and reliable will enable is to move to relationship of trust with such systems.

As per the AI for Enterprise presentation I put together for the HumanMade AI next chapter event one section covers the similarities with the human brain structure and may explain what GenAI LLM models are theoretically lacking. Preventing them from being more accurate, reasoning and what Gary Marcus calls a cognitive model of the world.
Before talking about AGI and autonomous models, we first need models that are reliable and that we can trust. Models that can learn and reason.


Warning: Trying to access array offset on value of type bool in /home/ubikann/public_html/wp-content/plugins/amp-analytics/amp-analytics.php on line 65
Exit mobile version