Even OpenAI’s support assistant doesn’t have the basics down of the product it supports. In multiple such exchanges reported by users, the app’s robotic helper suggested an (also non-existent) in-app bug reporting feature with assurance, and then emphasized it a second time by email. It makes a point about the stubborn fact of AI-powered customer service: no matter how sophisticated they may be, systems that aren’t grounded in live product information can lead users astray at scale.
A Hallucinated Feature in the ChatGPT iPad App
This problem originally became apparent for an iPad Pro user, who was able to get a consistent crash in the ChatGPT app after resizing the size of the window. In search of assistance, they reached out to the in-app Help Center chat, which suggested that offending content should be reported using either “Send Feedback” or “Report a Problem” from within the app’s Account or Support menus. That path doesn’t exist. The only report option in a chat I can see is for content moderation. Not for reporting bugs or viewing diagnostics.
After being corrected, the bot admitted it was wrong — but not before OpenAI’s automated email support provided the same advice. It was later clarified by a human spokesperson that the ChatGPT app does not have its own consumer-facing bug reporting portal. Users can search a very small selection of troubleshooting articles, leave thumbs-down feedback on answers, or submit to a security-focused bug bounty program run with Bugcrowd for vulnerability disclosures — none of which matches day-to-day app bug reports.
In follow-up exchanges, it seems the support assistant may be new to say there’s no in-app bug report and then offer to send a description along to developers.
That would indicate OpenAI modified prompts or retrieval rules. Still, the opening hallucination is telling: when AI support doesn’t rely on authoritative, current product metadata, it generates friction rather than relief.
Why AI Support Bots Fail Without Reliable Grounding
Large language models are probabilistic systems, which means they simply predict what is likely to come next based on the patterns they have learned and are not employed to infringe copyright or to break other laws. Stanford HAI researchers have established that hallucinations can vary from single-digit to over 20% values, respectively, through domain and prompt design. In customer support, where users want precise answers corresponding to the app build they’re using, that sort of variance becomes untenable.
Three failure modes tend to overlap in instances like this. One, stale knowledge: an LLM might have been taught on outdated product descriptions or community posts calling out common mobile feedback flows, and projected that learning poorly. Second, weak grounding: if the bot’s retrieval system didn’t surface the one true help article that says “no in-app bug reporting,” the model will hallucinate a plausible-sounding path. Third, incentives: all too often assistants are trained to maximize helpfulness and completeness, which means that the bias is towards answering rather than saying “I don’t know”.
The paradox is only heightened when taken in the context of the broader industry story. Generative AI has been hyped as a force multiplier in service delivery operations — with McKinsey predicting that generative AI could automate significant portions of work across customer care processes. But NIST’s AI risk guidance singles out hallucination and overconfidence as systemic risks that merit control, particularly in user-facing settings.
What Effective AI Support Should Be and How It Works
Fixing mistakes in AI help desks is more engineering than magic. At a minimum, the base state of grounding should be this: the bot may only respond with an answer retrieved from a canonical versioned knowledge base connected to that app build/platform. If a fact is not present or obvious — let’s say whether iPadOS supports in-app bug submission — the assistant should default to “I don’t know” and escalate to a human, not make stuff up.
Live product metadata matters. For mobile apps at least, this translates to surfacing structured functionality (menus, toggles and so on) through an internal API that the assistant can hit. Combine that with retrieval-augmented generation, strict citation down to the exact help page used, and UI-aware guardrails that prevent the model from proposing options not available in the current interface.
There is also something operational. Mature support organizations measure not only deflection rate but also the false-assurance rate — the percentage of interactions in which a confident bot gives incorrect answers. Weekly, they review a sampling of conversations, add missing knowledge articles, and run red team prompts to stress-test hot spots for hallucination. When bugs occur, an in-app “Report a Bug” access point that captures device, app version and steps to reproduce can be wired up into a triage queue without exposing security channels or inundating support staff.
The Trust Gap in AI Customer Service and Support
Customer service is a trust engine. When the bot says there is a feature and there isn’t, not only did they waste time — they lost confidence in the brand’s ability to provide help when it counts. It is widely “documented” in industry surveys that one unsuccessful support experience leads to churn and channel change. For a flagship AI company, the optics are even harsher: if your own assistant messes up on your own app, what does that say about enterprise readiness?
To its credit, OpenAI seems to have fixed the support script to match reality. That is progress. But the takeaway is bigger than a single bug or app. Reliable AI assistance requires specific product grounding, conservative answer policies, and effortless handoffs to humans. Otherwise, the world’s most advanced model can end up driving users to search for buttons that simply aren’t there.