
OpenAI’s New Frontier Models: O3 and O3-Mini
OpenAI has started inviting select users to test the latest models for reasoning, called O3 and O3-Mini. These new models represent further advances over the recently released O1 and O1-Mini models launched in the first weeks of this month. I’ll describe what they are and why they matter. Let me explain.
Why Are They Called O3?
Well, the name “O3” sounds quite random, but it is funny. According to the CEO of OpenAI, Sam Altman, the company didn’t want to have copyright issues with the telecom company O2. Also, he jokingly admitted that OpenAI isn’t great at naming things! The announcement was made during their “12 Days of OpenAI” livestream event.
Altman added that the new models would first be given to selected researchers for safety testing. The O3-Mini is expected to be available by the end of January 2025, while O3 will follow shortly.
What Makes O3 Special?
Altman described O3 as the start of a new era for AI. These models can handle complex tasks that require deep reasoning. He said, “For the last day of this event, we thought it would be fun to go from one frontier model to the next frontier model.”
To give you an example of how competitive the AI world is becoming, just yesterday, Google launched Gemini 2.0 Flash Thinking. This model shows its reasoning process using bullet points, which is something OpenAI’s models don’t do. The rivalry between OpenAI and Google is heating up, and both are pushing boundaries in advanced AI models.
Performance Highlights of O3
O3 is not about fancy claims but rather about breaking records. Let me show a couple of numbers to see how impressive it is:
1) Mastery in coding: O3 outperforms O1 by 22.8% on SWE-Bench Verified and achieves a Codeforces rating of 2727. This is even higher than the OpenAI Chief Scientist’s score of 2665.
2) Excellence in Mathematics and Science: It scored 96.7% on the AIME 2024 exam, missed only one question, and reached 87.7% in GPQA Diamond, way beyond human experts.
3) New Benchmarks: O3 set records in the most difficult tests, such as Frontier Math by EpochAI. For example, it solved 25.2% of problems where no other model exceeded 2%. On the ARC-AGI test, it tripled O1’s score, surpassing 85% accuracy. These results were verified live by the ARC Prize team.
Safety First: What Is Deliberative Alignment?
OpenAI has always been focused on making AI safe and aligned with human values. They introduced a new technique called deliberative alignment for their O1 models, and they’re extending it to O3. This method embeds safety rules directly into the model, allowing it to follow these rules while responding to queries.
Here’s a real-life example: Imagine you’re using an AI to draft legal documents. In that case, the AI will have to make sure its suggestions align with strict legal dictums. Deliberative alignment helps the model remember these rules and apply them on the spot instead of relying on pre-programmed responses. In this way, the AI will be more reliable and less error-prone.
Deliberative alignment also reduces common issues like:
- Jailbreak attacks (where users trick the AI into doing something unsafe).
- Over-refusing harmless requests (e.g., rejecting a benign question because it’s overly cautious).
This approach improves on older methods like reinforcement learning from human feedback (RLHF) and creates models that can generalize better across different scenarios, including multilingual tasks.
What Does This Mean for AI?
The release of O3 and O3-Mini is huge. These models are setting new standards in coding, math, and reasoning. By opening up safety testing to the research community, OpenAI is going to make sure these tools will be powerful, yet responsible.
But what excites me personally is to see how these models can solve some of the real challenges in the physical world: from complex scientific problems to even better developer tools, huge possibilities exist. At the same time, I am glad that OpenAI takes safety seriously, too, because the more advanced AI gets, our task would be to make sure it is used for good.
If you want to know more or request access, head to OpenAI’s website. Let’s see where this next leap in AI takes us!