OpenAI's o1 : The Next Game Changer in Reasoning and AI Evolution?

#ai #openai #chatgpt #news

OpenAI just recently released their next model, o1 (previously codenamed Strawberry). OpenAI stated that o1 is their new model that will be more reasonable while answering to user's questions.

We've developed a new series of AI models designed to spend more time thinking before they respond. They can reason through complex tasks and solve harder problems than previous models in science, coding, and math.

OpenAI

OpenAI has been doing the breakthrough since their first model in June 2020, ChaGPT 3. ChatGPT was great but lacked in a lot of things. Since then, OpenAI's been training newer model to improve what was lacking before.

OpenAI has recently launched ChatGPT o1. This release is a big step forward for their AI models, building upon the progress made with ChatGPT 4o. ChatGPT o1 more time to generate answers. This means it often creates responses that are more reasonable & precise, especially when handling complex questions or multistep problems. Also, it shines in coding tasks, showing better accuracy than earlier versions.

However, it's important to note that o1 is part of OpenAI's larger goal of creating human-like artificial intelligence. Yet, it's slower & costs more to use compared to GPT-4o. Right now, OpenAI calls this new version a “preview.” This term emphasizes that it's still in the early stages of development.

How Is It trained?

Although OpenAI has not said anything about it yet, but the training approach for o1 is fundamentally different from it's predecessors. According to OpenAI’s research lead, Jerry Tworek,

o1 has been trained using a completely new optimization algorithm and a new training dataset specifically tailored for it.

Learning to Reason with LLMs

According to the research report by OpenAI on o1, it ranks 89th percentile on competitive programming questions (conducted by codeforce), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (known as AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (known as GPQA).

Their large-scale reinforcement learning algorithm teaches the o1 how to think productively using its chain of thought in a highly data-efficient training process.

In the data above we can conclude that o1 performance smoothly improves with both train-time and test-time compute.

What Is Chain-of-Thought?

Similar to us humans, o1 may take it's time to think of an answer to complex questions. It uses the new method introduced by OpenAI called, chain of thoughts. It's an work of reinforcement learning which hone it's skill to develop a chain of thoughts about the question, not only that but if the answer is wrong somewhere in the step, it can fix the problem and move on. o1 can also break-down the complex steps into the simpler one's to make it more easy and find the solution. With these, the o1's ability to think improves a lot and result in better answers.

The above picture is the comparison of 4o and o1-preview in coding tasks. The expected output should be [1, 3, 5][2, 4, 6] of the same format.

But 4o failed to do the task, whereas the o1-preview succeeded in giving the right output.

So what's that chain of thought over there?

It's a very long chain of thoughts tbh. But here's some snapshot of it to see how it thinks and what it does in the background.

And you know the rest.

When Can I Use It?

For Plus and Team

ChatGPT Plus and Team users will be able to access o1 models in ChatGPT starting today (12th September 2024). Both o1-preview and o1-mini can be selected manually in the model picker, and at launch, weekly rate limits will be 30 messages for o1-preview and 50 for o1-mini. We are working to increase those rates and enable ChatGPT to automatically choose the right model for a given prompt.

For Business

ChatGPT Enterprise and Edu users will get access to both models beginning next week.

For Developers

Developers who qualify for API usage tier 5(opens in a new window) can start prototyping with both models in the API today with a rate limit of 20 RPM. We’re working to increase these limits after additional testing. The API for these models currently doesn't include function calling, streaming, support for system messages, and other features. To get started, check out the API documentation(opens in a new window).

What about free users?

Don't worry about that, OpenAI has not forgotten about free users.

We also are planning to bring o1-mini access to all ChatGPT Free users.

What's Next?

Well this is just the beginning of the era in AI. o1 is planned to add various new and more improved features including; browsing, file and image uploading.

So it's just the waiting game for us.

Keep Hacking!