OpenAI unveils o1, a model that can fact-check itself

Share this post:

ChatGPT maker OpenAI has announced its next major product release: A generative AI model code-named Strawberry, officially called OpenAI o1.

To be more precise, o1 is actually a collection of models. Two are available today in ChatGPT and via OpenAI’s API: o1-preview and o1 mini, a smaller, cheaper model. You’ll have to be subscribed to ChatGPT Plus or Team to see them in the ChatGP clientT; Enterprise and Edu users will get access early next week.

Note that the o1 chatbot experience is fairly barebones at present; unlike ChatGPT, o1 can’t browse the web or analyze files (yet). It’s rate-limited — weekly limits are currently 30 messages for o1-preview and 50 for o1-mini. And the o1 models are expensive. In the API, o1-preview is $15 per 1 million input tokens (3x the cost of GPT-4o) and $60 per 1 million output tokens (4x GPT-4o). (1 million tokens is equivalent to around 750,000 words.)

OpenAI says it plans to bring o1-mini access to all free users of ChatGPT, but hasn’t set a release date. We’ll hold the company to it.

o1 avoids some of the reasoning pitfalls that normally trip up generative AI models, at least according to OpenAI. That’s because o1 can effectively fact-check itself by spending more time considering all parts of a command or question.

OpenAI says that o1, originating from an internal company project known as Q*, is particularly adept at solving math and programming-related challenges. But what makes the text-only o1 “feel” qualitatively different from other generative AI models is its ability to “think” before responding to queries.

When given additional time to “think,” o1 can reason through a task holistically — planning ahead and performing a series of actions over an extended period of time that help it arrive at answers. This makes o1 well-suited for tasks that require synthesizing the results of multiple subtasks, like detecting privileged emails in an attorney’s inbox or brainstorming a product marketing strategy.

“o1 is trained with reinforcement learning,” which teaches the system through rewards and penalties, “to ‘think’ before responding via a private chain of thought,” said Noam Brown, a research scientist at OpenAI, in a series of posts on X. He added that OpenAI used a new optimization algorithm and training data set specifically tailored for the o1 models.

“The longer [o1] thinks, the better it does on reasoning tasks,” Brown said.

TechCrunch wasn’t offered the opportunity to test o1 before its debut; we aim to get our hands on it as soon as possible. But according to a person who did have access — Pablo Arredondo, VP at Thomson Reuters — o1 is better than OpenAI’s previous models (e.g. GPT-4o) at things like analyzing legal briefs and identifying solutions to problems in LSAT logic games.

“We saw it tackling more substantive, multi-faceted, analysis,” Arredondo told TechCrunch. “Our automated testing also showed gains against a wide range of simple tasks.”

In a qualifying exam for the International Mathematics Olympiad, a high school math competition, o1 correctly solved 83% of problems while GPT-4o only solved 13%, OpenAI claims. The company also claims that the model reached the 89th percentile of participants in the online programming contests known as Codeforces competitions.

In general, o1 should perform better on problems in data analysis, science and coding, OpenAI says.

Now, there is a downside. o1 can be slower than other models, query depending; Arredondo tells us the model can take over ten seconds to answer some questions. (Helpfully, the chatbot version of o1 shows its progress by displaying a label for the current subtask it’s performing.)

Given the unpredictable nature of generative AI models, o1 likely has other flaws and limitations (Brown admitted that o1 also trips up on games of tic-tac-toe, for example, and doesn’t answer as well as other models on factual knowledge questions). We’ll no doubt learn about these in time — and once we get a chance to test the model ourselves.

We’d be remiss if we didn’t point out that OpenAI is far from the only AI vendor investigating these types of reasoning methods to improve model factuality. Google DeepMind researchers recently published a study showing that, by essentially giving models more compute time and guidance to fulfill requests as they’re made, the performance of those models can be significantly improved without any additional tweaks.

OpenAI might be first out of the gate with o1. But assuming rivals soon follow suit with comparable models, the company’s real test will be making o1 widely available.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *