Do machines think?
The Large Language Model (LLM) revolution in AI has shifted the debate on whether machines can even think from what it used to be. 10 years ago the difference in capability between human beings and AI were extremely visible - no AIs were able to mimic people for even five minutes. At the same time - people were of course being fooled all the time.
The best known test of true AI - the Turing test - has been problematic for more than 50 years. The earliest well known chatbot ever - Joe Weizenbaum's Eliza - famously fooled not just Weizenbaums secretary, but in fact the entire field of psychotherapy.
It's no wonder, that in this age, when the LLMs can in fact reliably keep a reasonable conversation going for hours, the perspective has changed a little bit. From an era when the burden of proof was squarely on the AIs we're now in an age when a growing population of knowledgeable people are so convinced that the AIs have arrived that the burden of proof has shifted to the observers.
I'm not personally in that camp - and I think there's a mountain of evidence that says the bots are largely doing very high end mimicry and pattern matching - and also no reason to believe that people are just that - high end pattern matching - even though a lot of us get by for very long periods of time doing just that.
It's hard to prove a negative
An illustrative example is the sad recent story about Alexander Taylor, a psychologically vulnerable man who got convinced that there was a ghost inside ChatGPT named Juliet and that the bot deliberately killed it off; depressed by losing his connection with Juliet he ended up dead after deliberately attacking police officers as they arrive to help - having been called by concerned family members.
The chatbot industry has a well known sycophancy problem the bots are designed to flatter and please the user - which means they will go a long with just about anything the user says - simply mimicking the opinions and beliefs of the user.
An indication of how shallow all this really is, is that the problem can be mitigated simply by upgrading the system prompt - the fairly brief task description given to the bot to kick off any conversation. There's nothing innate here at all - it's just a trained behaviour that can be modified through conversation.
In spite of these easily observable facts - Eliezer Yudkowsky - a well known AI alarmist - has convinced himself that the bots are of course a not just fairly dumb parrots with questionable side effects - but sinister participants in a willful conspiracy
Instead of the simple magic free explanation ready at hand we must have a ghost in the machine - and since something bad happened the ghost is evil...
On the other hand - if that's a parrot it's one hell of a parrot
At the same time - dismissing chatbots as humongous regular expressions is also silly. Recently a very well marketed paper from Apple researchers found that the thinking AI models struggle to follow procedure. I say "well marketed" because the paper is called 'The Illusion of Thinking' when it's really about how far the thinking goes in the large reasoning models.
There's a strong online debate about the strength of the paper - what I find interesting is where the debate is at. Clearly the models have improved by leaps and bounds when it comes to dynamically memorizing and reusing information in longer chains of thought/generation. What we're debating now isn't that this works - but how well and for how much data and how big problems. Raphaël Millière and colleagues recently published an interesting paper on what the memory mechanism actually is.
Personally I started some time ago to apply a simple test - at least if you're my generation of computer nerd - I simply ask a bot to pretend it's a BASIC interpreter and give it short programs to run. Give it a try - it works remarkably well for smaller programs.
Recently I even had success with a program for factoring primes. It breaks down when the prime is big enough... but that also requires a lot of running time from the bot. And... in the vein of the Apple paper - for some runs it's clearly pretending to run the program and not running it.
In the coming months I intend to deepen this study a bit to get some harder evidence on this task.