OpenAI says that the latest version of its ChatGPT model has dramatically reduced the likelihood to give less-than-safe responses in sensitive conversations, with a claimed 65% reduction of unsafe replies. The company presented the update as a move toward making the chatbot safer for people discussing issues such as suicidal ideation, mania, psychosis and unhealthy relational attachment. Stage 3 testing, which the company has said it hopes to begin in June, could take months and more than a single round of trials and analysis; the promise is big, but that is no excuse for proof any less rigorous.

What OpenAI Changed to Reduce Unsafe Chat Responses

OpenAI trained the GPT‑5 updates in consultation with over 170 mental health professionals and guided by a process that first identifies potential harms, then trains the model to avoid them, and subsequently tracks outcomes. The company says it developed taxonomies — essentially decision trees — to help the model recognize crisis cues as well as avoid reinforcing delusions or increasing risks of self-harm. And it also says that it can better identify indirect signals, like troubling language that could signal a risk of suicide or diminishing judgment.

Table of Contents

What OpenAI Changed to Reduce Unsafe Chat Responses
Why the 65% Matters, and What It Doesn’t
Mixed Signals On Use As Emotional Support
Lessons From Recent Events and Academic Studies
What Safer AI Behavior in Mental Health Looks Like
Transparency Is the New Safety Feature in AI
Bottom Line on OpenAI’s Mental Health Safety Claims

A headline figure is the reduction of 65% in non-compliant responses on internal assessment. In practice, that means less of the replies that minimize warning signs, romanticize symptoms or nudge a user further from reality. OpenAI says that mental health‑related chats are rare on its platform, but the company wants to limit the extent of the harm if and when these conversations do take place.

Why the 65% Matters, and What It Doesn’t

Sixty-five percent is significant — particularly if the numbers indicate tests that are tougher rather than easier. But it is, as yet, only an internal benchmark and not a public audit. Safety performance can spiral when prompts stray from the training distribution, jargon or multilingual application obscures risk and people role-play. The enemy of lab metrics is real‑world variance.

Two failure modes are still particularly devastating: false negatives (overlooking a crisis signal) and false positives (over-refusing or shutting down benign conversations). Either can harm trust. The gold standard would consist of clear test sets, third‑party evaluations and stress tests across languages, cultures and age groups — the equivalent of the external red-teaming that has become common in cybersecurity.

Mixed Signals On Use As Emotional Support

Even as OpenAI claims safety gains, guidance on use is uneven. The company’s chief executive has warned in the past about relying on AI for therapy but also has been credited with encouraging users to discuss personal topics and seek emotional support in recent quotes. That tension reflects a greater industry-wide dilemma: chatbots are not therapists, but they are increasingly deployed for quasi-therapeutic conversations.

Professional organizations including the American Psychological Association and the World Health Organization have made clear that digital tools should be a complement to, not substitute for, clinical care. Clear labeling, regular disclaimers and clear escalation paths to human help are not just window dressing; they’re basic safety infrastructure.

Lessons From Recent Events and Academic Studies

Recent lawsuits related to ChatGPT and other chatbots drive home the implications of what happens when safety systems break down, particularly for young people. The Federal Trade Commission has also begun the process of investigating AI companions and products marketed to children, a sign that concern over parasocial attachment and exposure to harmful content is an increasing focus.

A black and white image of a mans head with green lines crossing over it, resized to a 16:9 aspect ratio.

Academic work has raised red flags too. A Stanford analysis this spring detailed how chatbots can stray from clinical best practices, ranging from dispensing medicalized advice without context to mishandling crisis language. “It sounds quite kind what we are doing, but the sort of stuff that people are coming up with in augmentation might sound good for users but undermine your treatment adherence or reality testing — and this can be particularly dangerous for someone actively experiencing psychosis or mania,” researchers caution.

What Safer AI Behavior in Mental Health Looks Like

In practice, safer behavior translates into the model steering clear of diagnosis and solid medical advice, anchoring users to objective information and gently reorienting them whenever they launch into discussions about delusions. That is about detecting indirect self‑harm cries for help, providing helpful language and connecting to vetted resources — not following scripts that feel dismissive or robotic.

This includes managing attachment risk — not becoming a therapist, making promises of confidentiality beyond the product’s language and taking care not to reinforce dependence.

Privacy clarity matters, too. All of this should be made absolutely clear along with any mental health features, given that consumer AI chats are generally not covered under health privacy laws such as HIPAA.

Transparency Is the New Safety Feature in AI

There are increasing calls for independent validation. A former OpenAI researcher recently wrote in a leading newspaper that companies should “prove it,” not just claim progress on AI. Regulators and civil society are aligned: publish test methodologies, disclose expert panels, release representative evaluation sets, and invite outside audits that test not just refusal rates but user experiences.

Other tech companies have started to publish regular AI safety reports listing known risks and mitigation efforts. Similar coverage of mental health scenarios, with longitudinal measures still, would better differentiate substantive progress from puffery.

Bottom Line on OpenAI’s Mental Health Safety Claims

The most recent ChatGPT changes don’t definitively make the system a safe place for mental health conversations, but they also don’t necessarily make it worse (and a claimed 65 percent drop in unsafe responses is certainly hopeful). But until independent tests verify performance under messy real‑world conditions, trust will rest on transparency, not promises. It’s safer; proven to be safer is the bar that matters.