For those who make a career or hobby out of following the ins and outs of artificial intelligence (AI), the release of a new large language model (LLM) is met with feverish attention. A whole ecosystem of AI “benchmarks,” like MMLU or ARC, has emerged to try to answer the supposedly objective question: which model is best?
When the Chinese company Deepseek launched its R1 model, there was an international furore. On these benchmarks, it performed competitively with top models from Silicon Valley giants like OpenAI and Anthropic — despite an apparently much lower cost of production. In contrast, the release of OpenAI’s recent GPT-5 was met with a collective meh — the consensus being that it had underperformed expectations.
These leaderboard dramas are entertaining, but they easily become a distraction. It’s very easy to get caught up in all this one-upmanship and spend far too long worrying whether you’ve chosen the right model. For most people, that time is wasted — especially when comparing models in the same “weight class” (similar price point, size on disk, or openness). You’ll almost always get a better return from spending an hour using a model and improving your processes than from another hour comparing benchmark charts.
Of course, there are cases where frontier performance really matters. If you’re building an AI tool to accelerate scientific discovery, a 10% improvement on the relevant benchmark could make a real difference. But if you’re using an LLM to take the sting out of drafting an email to your annoying boss, the AI is probably good enough already — and that extra 10% isn’t going to change your life.
Focusing too much on what’s happening at the frontier distracts from more meaningful questions for most organisations. Am I using the right size of model? Could I get away with a cheaper one? Do I control the model, or can a third party modify or delete it? Where is it hosted, and what happens to the data I submit?
AI multipliers
To think about this more rigorously, it helps to borrow a concept from economics — the idea of a multiplier. The economic multiplier describes how one pound of government spending can generate more than a pound of economic growth by stimulating further investment and activity. During the financial crisis, politicians and economists argued endlessly over how big the multiplier was, and whether it varied across sectors.
We can think in similar terms about an AI model multiplier: if a model gets 10% better, what’s the impact on your specific use case or organisation? When AI models weren’t very capable, this multiplier was high. The leap from GPT-2 to GPT-3.5, for instance, suddenly made a wide range of tasks feasible — a huge boost for many businesses.
Different use cases, however, have very different multipliers. The most complex, multi-stage tasks — the kind our hypothetical AI scientist might tackle — have a high multiplier. But simple, well-understood tasks such as extracting data from text or writing boilerplate copy have a low multiplier; they’re already effectively solved. Average the multipliers across all your business activities, and you get your organisation-level multiplier.
By now, that multiplier is small for most organisations. The models can already write decent emails, generate ad copy, prioritise client requests, or identify parts on a conveyor belt. Doing those things slightly better rarely moves the needle much.
In fact, a multiplier can even be negative. If a new model offers trivial improvements while costing more, upgrading might actually make the organisation worse off.
We’re living through a kind of industrial revolution in knowledge work. But this time, the tools are so cheap and accessible that it’s as if a power loom were installed in every attic and spare bedroom, not just the big mills. If that happened, we wouldn’t waste time comparing which household had the slightly newer model — we’d be asking how this mass access to capability might transform the way society works, produces, and builds the future.
Expanding what AI models provide
If new model improvements matter less than we think, does that mean most organisations have already felt the full impact of AI? Is this the best we can hope for? Far from it. Most can expect significant further gains in the coming years — even if the underlying models barely improve.
That’s because there are many ways AI systems can become more useful without the models themselves getting “smarter.” Services can expand what models can do — by letting them access tools like web browsing or code execution. Over time, everything from retail platforms to law firms will reconfigure their systems to make them easier for customers’ AI tools to interact with. Taken together, these trends underpin the rise of “agentic AI” — systems that can act in the real world, not just describe it.
Even before agents become widespread, there are plenty of organisations who have barely begun their AI adoption journey. Letting your employees type into a Copilot chat window is only the start. Building your own knowledge base and connecting it to an LLM to deliver context-specific answers is more powerful. But the deepest transformation happens behind the scenes — when processes themselves are redesigned around what AI makes possible.
The sociologist Max Weber once observed that charismatic leaders occasionally steal the spotlight, but the real work of politics is “the strong and slow boring of hard boards.” AI is now entering its own “slow boring” phase: the era of patient, process-level transformation. The glamorous frontier breakthroughs will continue, but the meaningful progress will come from millions of teams doing the steady, unglamorous work of integration.
So next time a model release sparks another online leaderboard frenzy, remember: the future of AI isn’t being decided in Silicon Valley. It’s being decided in offices, factories, classrooms, and council buildings — wherever people are quietly figuring out how to make these tools work for them. The future, increasingly, is in your hands.
Donnacha Kirk, Deputy Director of AI Technology & Research Services, AI Collaboration Centre
Donnacha leads a team at the AI Collaboration Centre of Data Scientists and Applied Researchers, providing hands-on AI expertise to SMEs. His role focuses on helping businesses develop AI-driven products, strengthening SME connections with researchers at Ulster University and Queen’s University Belfast, and expanding AICC’s skills training offerings.
Donnacha holds a BA and MSc in Experimental & Theoretical Physics from Cambridge, followed by a PhD in Cosmology from UCL. His early career as an astrophysics researcher at UCL and Imperial College saw him pioneering machine learning applications on large-scale telescope and satellite data. As part of global collaborations such as the Dark Energy Survey (US Department of Energy) and the Euclid space satellite project (European Space Agency), his research focused on gravitational lensing and the study of dark matter and dark energy.
From 2017 to 2025, Donnacha transitioned into the commercial sector, leading data science teams across industries including finance, insurance, cybersecurity, and e-sports. His experience spans large legacy organisations and agile startups, providing a deep understanding of how businesses at different scales can harness AI and external expertise to drive innovation. Having worked through the rise of cloud computing and the explosion of AI tools, he is passionate about shaping how these technologies will transform the business landscape in Northern Ireland.
