OpenAI has rolled out GPT 5.4, a frontier model built to power agentic workflows and reduce factual mistakes, marking a sharper turn toward autonomous AI that can operate computers and work across apps with minimal hand-holding. The company says GPT 5.4 consolidates advances in reasoning, coding, and tool use, while delivering fewer hallucinations and more dependable results in everyday professional tasks.
What’s New in GPT 5.4: Agentic Control and Upgrades
The headline capability is deeper “agentic” control. GPT 5.4 can write and execute code to operate a computer, reacting to on-screen context and issuing mouse and keyboard inputs. In practice, that means an AI agent can navigate file systems, pull data from one application, transform it in another, and deliver outputs without constant prompts—useful for RPA-style workflows, QA checks on web dashboards, or triaging IT tickets.
On the OSWorld-Verified benchmark, designed to measure how well AI systems navigate desktop environments, GPT 5.4 scored 75%, up from 47.3% for GPT 5.2. OpenAI notes the model even edges past the average human score of 72.4% on the same tasks, underscoring a rapid step toward reliable software operation under real-world constraints.
Factuality and Research Gains in GPT 5.4
OpenAI calls GPT 5.4 its “most factual model yet.” On a set of de-identified prompts previously flagged by users for factual errors, the company reports GPT 5.4’s individual claims are 33% less likely to be false and full responses are 18% less likely to contain any errors versus GPT 5.2. The model also deepens web research, particularly for highly specific, multi-part questions, while maintaining more context across longer sessions.
These improvements track with a broader industry push—championed by organizations like NIST and the Partnership on AI—for rigorous evaluation and transparent reporting on model reliability. While external auditing remains an open challenge industry-wide, OpenAI’s published deltas give teams clearer signals on where GPT 5.4 outperforms its predecessors and where human review is still prudent.
New Chat Experience With “Thinking” Mode
Inside ChatGPT, the model appears as GPT 5.4 Thinking. A small but meaningful UX change lets users adjust an answer mid-generation—interrupting to clarify constraints or shift direction without restarting the conversation. That reduces wasted time and tokens, and it better reflects how analysts and developers actually iterate. The feature is live on Android and the web, with iPhone support coming soon.
Upgrades for Work Docs and Excel Boost Output Quality
For professional output, GPT 5.4 tightens quality across AI-generated spreadsheets, documents, and presentations. In internal tests mimicking a junior investment banking analyst, OpenAI says spreadsheets produced by the model achieved a mean success rate of 87.3% with human raters—an indicator that formatting discipline, formula integrity, and narrative coherence are improving in tandem.
OpenAI is also introducing ChatGPT for Excel, a dedicated tool to bring workbook data directly into model prompts. The company says it can run scenarios and generate outputs based on cells and formulas, which should help with tasks like sensitivity tables, expense categorization, KPI rollups, and quick what-if models without leaving Excel. Together with the model’s agentic skills, it’s a blueprint for end-to-end workflows spanning data import, transformation, and reporting.
Availability and Model Lineup for the GPT 5.4 Release
GPT 5.4 Thinking is rolling out now to Plus, Pro, and Team subscribers, replacing GPT 5.2 Thinking. The older model moves to Legacy Models before removal on June 5. For developers, a GPT 5.4 Pro option is available via the API to Pro and Enterprise customers. OpenAI hasn’t said whether GPT 5.4 Thinking will reach the free tier.
The release arrives days after GPT 5.3 Instant, signaling a two-track strategy: a lighter, faster model for everyday tasks and a more capable flagship tuned for reasoning-heavy, tool-using work. It also lands amid competitive pressure as other labs pursue agentic systems that can plan, act, and self-correct across software environments.
Why It Matters: Impact on Autonomous AI and Work
GPT 5.4 pushes past chat into competent computer operation, a prerequisite for trustworthy autonomous agents. The gains in factuality and document structure reduce cleanup time, while mid-response steering brings the product closer to how people really work. As enterprises weigh deployment, the practical questions shift from “Can it do the task?” to “How do we govern it?”—with policy, logging, and human-in-the-loop checkpoints becoming the decisive factors for production use.