In a nutshell: OpenAI has unveiled a new series of AI language models named the “o1,” specifically engineered to enhance reasoning capabilities, particularly for complex issues in science, coding, and mathematics. The company is so confident in these advancements that it has reset the model version counter to 1, starting anew after GPT-4o, and has notably moved away from the GPT branding.
The first model in the “o1” series, named “o1-preview,” is available in both ChatGPT and OpenAI’s API. Despite its preview status, the company promises regular updates and enhancements are part of the plan.
The “o1” models have been trained to enhance their problem-solving approach by spending more time analyzing issues before providing an answer. This method allows the models to experiment with various strategies, identify their own errors, and tackle complex tasks in a more systematic, human-like manner.
The results shared by OpenAI suggest a significant advancement with the new “o1” models. According to the company, these models perform at a level comparable to PhD students on challenging benchmarks in fields such as physics, chemistry, and biology.
For example, it achieved an 83 percent accuracy rate on a test qualifying students for the International Math Olympiad, a notable improvement over the 13 percent accuracy of GPT-4o.
Of course, AI benchmarks can sometimes be unreliable, so the true performance of the “o1” models will become clearer as more users test them in various scenarios.
Additionally, the new models seem to resolve some long-standing questions, such as the number of R’s in “strawberry,” finally putting the memes to rest. OpenAI also showcased a demo where the model successfully generated Python code for an arcade game, highlighting its advanced capabilities.
OpenAI o1 answers a famously tricky question for large language models. pic.twitter.com/5ZlQIOBWEd
– OpenAI (@OpenAI) September 12, 2024
OpenAI was previously reported to be working on a project codenamed “Strawberry” to develop models capable of tackling complex reasoning tasks. Given that the “o1” series seems to be the result of the Strawberry project, it’s amusing to think that the project’s name might have been inspired by the “strawberry” test.
In addition to enhancing reasoning capabilities, OpenAI also focused on strengthening defenses against “jailbreaking,” a technique used to bypass safety mechanisms. According to the company, the “o1-preview” scored 84 out of 100 in one of its most challenging jailbreaking tests, compared to only 22 for GPT-4o.
To make these models more accessible, especially for developers, OpenAI is also releasing a lighter “o1-mini” version designed for coding tasks.
Access to both “o1-mini” and “o1-preview” is now rolling out for paid ChatGPT Plus and Teams plans. While the advanced reasoning capabilities are currently opt-in with weekly usage limits, OpenAI is working to expand capacity and enable automatic model selection based on the complexity of the prompt.