🍓 Playing with Strawberry: When machines can reason

My first time testing OpenAI’s new model, o1 (aka Strawberry)

Sep 13, 2024

∙ Paid

I spent some hours playing with OpenAI’s new model, o1. This is their “next” model, nicknamed Strawberry. It is meant to have some reasoning abilities.

The trouble with ordinary large language models, like GPT-4 and Claude Sonnet 3.5, is that they shoot from the hip. They extemporise like blustering politicians—they can sound good but be very wrong. o1 uses chain-of-thought reasoning and refines its strategies if the current one doesn’t work. It’s similar to how we might think through a difficult puzzle.

tl;dr: This is a holy shit moment for me.

I gave the model several tests. Here was one that was particularly challenging:

The first five terms of a sequence are -1, 28, -3, 136, 35
What is the 15th term?

I won’t give away the answer. In the video below, the model explains its reasoning. It searches the strategy space for approaches to help unravel the solution. As given approaches fail, it tries a new one.

The video shows one minute of its efforts. It didn’t get the solution unaided. I gave it first one and then two more clues. Each clue bound the solution space a little. With the third clue, it quickly got the correct solution. If you work it out, please pop the answer and your workings in the comments!

I set the problem for Claude Sonnet 3.5. It blithely spat out an answer with great confidence, but it was wrong.

Can Strawberry pass an Oxford entrance exam?

According to the QS rankings, Oxford University is the third highest-ranked university in the world. I set o1 some of Oxford’s entrance papers, including the thinking skills assessment (a paper set for the second most competitive undergraduate degree) and history and physics aptitude tests.1

Exponential View

🍓 Playing with Strawberry: When machines can reason

My first time testing OpenAI’s new model, o1 (aka Strawberry)

Can Strawberry pass an Oxford entrance exam?

This post is for paid subscribers