OpenAI Announces o3: The Path to AGI Now Stands Unobstructed

After twelve days of streaming, OpenAI's grand finale has arrived. Sam Altman, returning amidst the Christmas atmosphere, unveiled the company's latest masterpiece: OpenAI o3.

GPT o3 Launch

This release once again pushes the boundaries of AI capabilities to new heights, reinforcing OpenAI's unshakeable position on the iron throne of AI advancement.

It brings to mind what an OpenAI researcher said before the release of o1: "On our path to AGI, there are no more obstacles in our way."

Why did OpenAI skip directly to o3 without releasing o2? The answer is quite simple: to avoid potential trademark conflicts with O2, the British telecommunications provider.

As soon as OpenAI's stream concluded, X erupted with excitement.

GPT o3 Launch

o3's capabilities represent a dimensional leap over all existing models. Let's examine its impressive benchmarks:

GPT o3 Launch

Software Engineering and Coding Excellence

The model underwent evaluation on two significant benchmarks:

SWE-Bench Verified (Software Engineering Examination): o3 achieved a remarkable 71.7% score, significantly outperforming o1.
Codeforces: o3 scored 2727, ranking 175th globally and surpassing 99.99% of human participants.

GPT o3 Launch

o1's coding capabilities were already extraordinary, and o3 has taken another giant leap toward the summit of AGI.

GPT o3 Launch

Mathematical and Scientific Prowess

In mathematical competitions and scientific examinations, o3 demonstrated exceptional capabilities:

AIEM 2024: Nearly perfect score, marking the first time an AI has achieved such a feat
GPQA Diamond (Doctoral-level Scientific Examination): Showed improvements, though not as dramatic as in mathematics and programming

The next mathematical benchmark is particularly interesting.

GPT o3 Launch

The FrontierMath Breakthrough

On FrontierMath, a benchmark developed by Epoch AI in collaboration with over 60 top mathematicians, o3 achieved a groundbreaking 25.2% success rate. This is particularly significant considering that previous models like GPT-4 and Gemini 1.5 Pro struggled to reach even 2% on this benchmark.

The ARC-AGI Achievement

Perhaps the most remarkable achievement is o3's performance on ARC-AGI, a benchmark designed in 2019 to test AI systems' abstract reasoning capabilities. The test requires AI to identify patterns and solve new problems, presented in grid format with ten possible colors per square, ranging from 1x1 to 30x30 grids.

It looks something like this:

GPT o3 Launch

Extremely difficult and abstract.

Here's the progression in scores:

GPT-2 (2019): 0%
GPT-3 (2020): 0%
GPT-4 (2023): 2%
GPT-4o (2024): 5%
o1-preview (2024): 21%
o1 (2024): 32%
o1 Pro (2024): ~50%
o3 (2024): 87.5%

GPT o3 Launch

It took five years to progress from 0% to 5%, but now, the jump from 5% to 87.5% happened in just six months. The human threshold score for this test is 85%.

The path to AGI truly stands unobstructed.

However, while o3 is powerful, it remains a future promise.

Join waitlist

OpenAI has developed three smaller o3 models based on the main o3.

GPT o3 Launch

Currently, o3-mini is expected to be publicly available by the end of January.

I'm increasingly excited about the AI industry's evolution in 2025. Inference models, agents, AI hardware, and world models - each of these areas promises to be more exciting than the transitional period of 2024.

2025 will truly be the year when AI reaches for the stars.