Exponential View

Exponential View

Share this post

Exponential View
Exponential View
šŸ“ Playing with Strawberry: When machines can reason
Copy link
Facebook
Email
Notes
More

šŸ“ Playing with Strawberry: When machines can reason

My first time testing OpenAI’s new model, o1 (aka Strawberry)

Azeem Azhar
Sep 13, 2024
āˆ™ Paid
58

Share this post

Exponential View
Exponential View
šŸ“ Playing with Strawberry: When machines can reason
Copy link
Facebook
Email
Notes
More
14
2
Share

I spent some hours playing with OpenAI’s new model, o1. This is their ā€œnextā€ model, nicknamed Strawberry. It is meant to have some reasoning abilities.

The trouble with ordinary large language models, like GPT-4 and Claude Sonnet 3.5, is that they shoot from the hip. They extemporise like blustering politicians—they can sound good but be very wrong. o1 uses chain-of-thought reasoning and refines its strategies if the current one doesn’t work. It’s similar to how we might think through a difficult puzzle.

tl;dr: This is a holy shit moment for me.

I gave the model several tests. Here was one that was particularly challenging:

The first five terms of a sequence are -1, 28, -3, 136, 35
What is the 15th term?

I won’t give away the answer. In the video below, the model explains its reasoning. It searches the strategy space for approaches to help unravel the solution. As given approaches fail, it tries a new one.

The video shows one minute of its efforts. It didn’t get the solution unaided. I gave it first one and then two more clues. Each clue bound the solution space a little. With the third clue, it quickly got the correct solution. If you work it out, please pop the answer and your workings in the comments!

I set the problem for Claude Sonnet 3.5. It blithely spat out an answer with great confidence, but it was wrong.

Can Strawberry pass an Oxford entrance exam?

According to the QS rankings, Oxford University is the third highest-ranked university in the world. I set o1 some of Oxford’s entrance papers, including the thinking skills assessment (a paper set for the second most competitive undergraduate degree) and history and physics aptitude tests.1

This post is for paid subscribers

Already a paid subscriber? Sign in
Ā© 2025 EPIIPLUS1 Ltd
Privacy āˆ™ Terms āˆ™ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More