Why I have changed my mind about AI and you should too


It’s time to rethink our relationship with AI
Flavio Coelho/Getty Images
It is undeniable that the launch of ChatGPT was a historically significant event, but is that because it was the first glorious step towards a superintelligent future or because it was the start of a world filled with AI snake-oil salespeople? I’ve long thought that large language models, the technology behind AI chatbots, are fascinating but flawed, putting me firmly in the snake-oil camp. But a week spent vibe coding has revealed something surprising: both the boosters and the sceptics are wrong.
First, I should explain. Vibe coding, if you aren’t familiar, is a term coined about a year ago by Andrej Karpathy, an AI researcher who co-founded and formerly worked at OpenAI. It refers to the process of developing software by “vibing” with an AI model, instructing it in plain language while letting it generate the actual code. Recently, I’ve seen people saying that the latest tools – Claude Code and ChatGPT Codex – have become surprisingly good at coding, such as in a piece in The New York Times titled “The A.I. disruption we’ve been waiting for has arrived”.
I decided to experiment with these tools, and I have been astonished by the results. In just a few short days, with only limited experiencing of coding, I have created personally useful apps like an audiobook picker that checks what is available at my local library, and a combined camera and teleprompter app that runs on my phone.
That might sound boring to you, and that is perfectly fine, for reasons I will explain later. What is important here is that this process has seen me engage more deeply with products like ChatGPT than I have before. Previously, I have tried minor experiments, been disgusted at generic writing, sycophancy or inaccurate search results, and bounced off. For these new coding projects, my extended use made me realise something I hadn’t before – the way LLMs have been productised produces a machine I am destined to hate.
Very few of us have been exposed to a “raw” LLM, by which I mean a statistical model that has been trained on a large collection of data to produce plausibly representative text. Instead, the majority of us are using technology that has been mediated through a process called reinforcement learning from human feedback (RLHF). AI companies use humans to rate the text produced by a raw LLM, rewarding answers that are perceived to be confident, useful and engaging while penalising harmful content or answers that are likely to discourage a majority of users from engaging with their products.
It is this RLHF process that produces the generic “chatbot voice” that you are probably familiar with. It is a process that bakes in the implicit values of the producer, from a general “move fast and break things” Silicon Valley attitude to the more specific Elon Musk-infused ideology of Grok, the controversial X chatbot.
Currently, it is very difficult to get a chatbot to express uncertainty, contradict the user or arrest forward momentum. This became most obvious to me when I encountered an unsolvable problem with my teleprompter. I had been trying to create an app that would overlay text on my existing camera app, assuming that would be easier than creating a camera from scratch, but the code ChatGPT was producing kept failing. It repeatedly suggested fixes, urging me forwards with the project. It was only when I realised that the intricacies of the Android operating system, which I won’t bore you with, meant making an all-in-one app would be much easier. As soon as I asked ChatGPT to produce this, it worked instantly.
Learning from this, I began instructing ChatGPT to constantly question both itself and me. I demanded vigilant scepticism. “Jacob wants the assistant to default to evidence-first analysis: avoid extrapolation, explicitly flag inference vs evidence, and prefer stating uncertainty or stopping when evidence is thin, unless the user asks for speculation,” is just one of the frameworks (generated by itself) that I have imposed into its memory. In other words, I built a model uniquely designed to work with my psychological profile, carefully unpicking OpenAI’s values and replacing them with my own.
It’s not perfect. It is very hard for an LLM to fight its RLHF training, and the default keeps seeping through. But what this means is that I now have a tool that serves as a somewhat-useful cognitive mirror. I didn’t use it to write this article, both because its writing style is still terribly turgid and because New Scientist, quite rightly, has strict rules against AI-generated copy, but I used it to think about this article. I asked my cognitive mirror to probe arguments and counterarguments, rejecting many of its conclusions as false or spurious. I extracted value, but it required caution and work, not letting the AI do the heavy lifting. Crucially, my brain remained fully engaged at all times.
This leads me to reinforce a conclusion I had already reached: engaging with someone else’s AI output is, in almost all cases, functionally useless. You can’t gain anything from AI-generated text that wouldn’t be better received by prompting an AI yourself. I also continue to refute the idea that AI is actually intelligent in any way – instead, I consider LLMs to be a cognitive aid, like a calculator or word processor. With this framing, as a private tool, not world-conquering machine, I now see the benefit. For that reason, it is right that you shouldn’t care about my teleprompter app. What should excite you is the possibility of solving your own unique problems in your own unique way.
Here’s where our current AI paradigm introduces another issue. In my view, the best LLM would be one that runs on your own computer, with no connection to a private corporation. It should be treated as a dangerous, experimental tool that you have full control over. I’m reminded of the meme that software engineers keep a loaded gun next to their printer, in case it makes a noise they don’t recognise. Sadly, running your own cutting-edge LLM isn’t currently possible for a variety of reasons, not least that the AI boom is driving up prices of the very hardware you need.
I must also address the original sin of LLMs: potential copyright infringement. By design, this technology can only be built on data ingested at a large scale, essentially the entire textual record of humanity. It is undeniable that firms like OpenAI built their models by using copyrighted text without permission, though whether this was actually illegal is the subject of ongoing court cases. A private LLM would have the same issues, but I can see solutions, such as public sector models, effectively pardoned by governments and distributed freely for the benefit of all, not private corporations. I also remain concerned about the environmental impact of data centres, but again this could be partly mitigated by a wider distribution of LLMs running on our own machines.
I accept that some people reading this will accuse me of having sold out to the tech bros. All I can say to that is that I haven’t revised my long-held position on LLMs as a technology that is fascinating, dangerous and occasionally extraordinary.
What I have realised is the main way that we are engaging with the technology, via slick chatbots like ChatGPT, is where so much of the harm comes in and is allowed to pass out into the world. LLMs shouldn’t be settled and productised, forced into every part of our lives with a sparkling emoji that wants to be your friend. It would be much better if we used these tools mindfully, with increased friction and full awareness of and caution against the potential harm they can cause. Here, a useful metaphor rears its fanged head. I don’t want OpenAI’s snake oil. I want snakes.
Topics:



