š Playing with Strawberry: When machines can reason
My first time testing OpenAIās new model, o1 (aka Strawberry)
I spent some hours playing with OpenAIās new model, o1. This is their ānextā model, nicknamed Strawberry. It is meant to have some reasoning abilities.
The trouble with ordinary large language models, like GPT-4 and Claude Sonnet 3.5, is that they shoot from the hip. They extemporise like blustering politiciansāthey can sound good but be very wrong. o1 uses chain-of-thought reasoning and refines its strategies if the current one doesnāt work. Itās similar to how we might think through a difficult puzzle.
tl;dr: This is a holy shit moment for me.
I gave the model several tests. Here was one that was particularly challenging:
The first five terms of a sequence are -1, 28, -3, 136, 35
What is the 15th term?
I wonāt give away the answer. In the video below, the model explains its reasoning. It searches the strategy space for approaches to help unravel the solution. As given approaches fail, it tries a new one.
The video shows one minute of its efforts. It didnāt get the solution unaided. I gave it first one and then two more clues. Each clue bound the solution space a little. With the third clue, it quickly got the correct solution. If you work it out, please pop the answer and your workings in the comments!
I set the problem for Claude Sonnet 3.5. It blithely spat out an answer with great confidence, but it was wrong.
Can Strawberry pass an Oxford entrance exam?
According to the QS rankings, Oxford University is the third highest-ranked university in the world. I set o1 some of Oxfordās entrance papers, including the thinking skills assessment (a paper set for the second most competitive undergraduate degree) and history and physics aptitude tests.1