Read more at source.
Read more at source.
The o3 model has shown remarkable performance in various measures. It excels in tasks that involve complex coding-related skills and demonstrates advanced math and science competency. The model is three times better than its predecessor, o1, at answering questions posed by ARC-AGI, a benchmark designed to test an AI model's ability to reason over extremely difficult mathematical and logic problems. Notably, the o3 model is 20 percent better than o1, a significant improvement that has surprised many in the field.
The unveiling of the o3 model by OpenAI and the Gemini 2.0 Flash Thinking model by Google highlights the intensifying competition in the AI research space. Both companies are striving to demonstrate their capabilities in pioneering advances in AI. This competition is crucial for OpenAI as it seeks to attract more investment and build a profitable business, while Google is keen to maintain its position at the forefront of AI research.
OpenAI has revealed that there are two versions of the new model, o3 and o3-mini. The company has not yet made these models publicly available, but plans to invite outsiders to apply to perform testing of them. This is part of OpenAI's commitment to transparency and collaboration in the development and testing of its AI models.
OpenAI has also unveiled a new method for AI safety, known as deliberative alignment. This technique involves training a model with a set of safety specifications and having it reason about the nature of the request as well as its own response. This approach makes the model more difficult to trick into misbehavior, as its reasoning process can root out attempts at mischief.
The o3 model scores much higher on several measures than its predecessor, OpenAI says, including ones that measure complex coding-related skills and advanced math and science competency. It is three times better than o1 at answering questions posed by ARC-AGI, a benchmark designed to test an AI models ability to reason over extremely difficult mathematical and logic problems they're encountering for the first time.