The Illusion of Thinking & Rebuttal
An overdue look at Apple's paper about the reasoning abilities of leading AI models.
Trends on the internet are often very ephemeral. Once in a while, every other post on my LinkedIn feed is about some new development that everyone seems to lose their mind over. But after a few days or weeks, the hype blows over, the related discussions dissipate, and the army of experts and people who make short posts with bullet points, many emojis and line spacing after each sentence move on to talk about something else.
Recent examples of high-flying and suddenly disappearing trends from my own filter bubble are DeepSeek-R1, OpenAI’s Studio Ghibli images, and seemingly Apple’s recent paper which shows that reasoning models “collapse” when faced with issues over a certain threshold of complexity. Even though the hype train left the station weeks ago towards new destinations, I want to revisit the central claims in Apple’s paper as it might help us to better understand our common future with AI.
The Illusion of Thinking-Paper
When Apple held its annual World Wide Developers Conference (WWDC25) from June 9-13 major announcements on AI were absent. Sure, some cool new AI features were announced for the Apple aficionados but nothing we haven’t seen elsewhere or couldn’t expect. Siri is not turning into an emotionally attuned “super assistant” like OpenAI’s ChatGPT and Apple is not throwing in the hat to compete with Google and Microsoft as an “AI-first company”.
Apple brands itself on luxury consumer hardware, attention to detail, and high standards in aesthetics, privacy and data security. Unlike other companies that are willing to go head-first into AI like bulls in a China shop, Apple can’t afford to take this route. Mistakes such as the dysfunctional AI-generated summaries of news which Apple was forced to roll back this year are much more costly for Apple, than it would be for say Meta or Google, due to its trusted brand value.
Still, Apple's bundle of announcements at WWDC25 widely elicited a lukewarm response. People have come to expect only extraordinary things from Apple, but now it is sitting cautiously on the sideline while its competitors are pushing the boundaries of a new paradigm with AI. There is, however, also a chance that its passive second-hand approach to AI can pay off.
In my view, Apple’s most important AI contribution came a few days prior to WWDC25 in the shape of the paper “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity”. Competitors like Google and OpenAI are building their ambitious AI efforts on the idea that models can reason and that this reasoning can scale. Only then, can we truly have helpful AI agents who understand humans and accommodate our needs. Apple’s new paper throws a bucket of cold water on that assumption.
Apple’s internal research team - Parshin Shojae, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio and Mehrdad Farajtabar - argues why so-called Large Reasoning Models (LRMs), including OpenAI o1 and o3, DeepSeek-R1, Claude 3.7 Sonnet Thinking, and Gemini Thinking, cannot reason. It only seems like they can.
Below, I will summarize the paper’s key points and briefly address the rebuttals.
Keep reading with a 7-day free trial
Subscribe to Futuristic Lawyer to keep reading this post and get 7 days of free access to the full post archives.