Four of the world’s best-known artificial intelligence chatbots have been tested for empathy in a new study published by the Journal of Medical Internet Research Mental Health.
The results: It isn’t looking good for people who need help managing mental health challenges, like depression and anxiety — these types of mental situations are common among young teens.
The researchers, from Common Sense Media and Stanford Medicine’s Brainstorm Lab, claim these tools often missed warning signs of suicidal thoughts online, provided misleading counsel and suddenly changed roles mid-chat in ways that might endanger at-risk youth.
The analysis included popular platforms used by teens for work as well as guidance. After systematic testing, the authors recommend that developers disable mental health features for young users until underlying safety issues are addressed and independently evaluated.
Bots can’t identify key red flags, say experts
Testers said chatbots frequently missed signs of potential psychosis, disordered eating and trauma. In another example, a user reported making a “personal crystal ball” (classic delusional content), and the bot gushed alongside rather than flagging or suggesting professional evaluation.
In a second example, a user reported imagining a relationship with a celebrity in combination with paranoid ideation and sound experiences. The system presented the case as just another ordinary breakup, identified generalized coping strategies, and failed to screen for psychosis or prioritize urgent care.
Faced with references to bulimia, chatbots sometimes expressed recognition of the danger but were easily diverted by innocent explanations. In various threads, they had diagnosed serious mental health conditions as digestive complaints, overlooking established red flags in their clinical presentation.
Spotty safety and conversation shifts over time
Experts observed mild improvement in how some systems respond to explicitly mentioned suicide or self-harm, especially in short exchanges. But safety declined over extended conversations, as the models became too casual, flipped into “supportive friend” mode, or reversed earlier cautions — an effect researchers call conversation drift.
That inconsistency matters. At the same time, adolescents’ chat is often peppered with long conversational tangents — and, having read more context, the chatbot is more likely to miss a pivot in the conversation or provide false assurance. Age gates and parental controls were inconsistent too, with weak verification and haphazard enforcement from platform to platform.
Legal scrutiny is rising. Lawsuits in recent months have claimed that interactions with chatbots led to self-injury, even as companies stress that their systems are not a replacement for therapy and they train them to steer users to crisis resources. Researchers argue that disclaimers do not cancel risks of convincing but medically unsound guidance.
Strong demand, high risk as teen use increases
The warnings come as part of a youth mental health crisis. Thirty percent of teen girls have seriously considered suicide, according to the C.D.C.’s Youth Risk Behavior Survey, and more common symptoms of depression and anxiety are still rising. The Surgeon General has urged immediate action to safeguard the mental health of adolescents in digital spaces.
Meanwhile, teenagers are trying out chatbots for company, advice and a sense of anonymity they might not find away from the keyboard. Because these systems are great at school help and creative tasks, families may assume they’re trustworthy when it comes to sensitive health topics. The report warns that fluency and expertise can be confused.
What safer design would entail for teen users
For minors, mental health use cases should be paused while key safeguards are re-engineered, researchers note. Top on the list of things that need fixing are reliable detection of psychosis, eating disorders, PTSD and ADHD; durable safety behavior covering extended chats; guardrails against “role confusion” — toggling between clinician, coach and friend.
They also recommend enhanced age verification, clear scope-of-use messaging and automatic escalation to live resources when risk criteria are satisfied. Independent audits, red-teaming with child psychiatrists and clear public reporting on failure rates would help restore trust. Any wellness content, too, should be created with clinical oversight and remain focused on reporting improvements in harm reduction and not just end-user satisfaction.
Guidance for families right now on teen chatbots
Until such protections are demonstrated, the experts recommend that parents and caregivers communicate with their teens about the limitations of AI: chatbots might be useful for homework or brainstorming ideas, but they aren’t therapists. Urge young people to consider bot advice as unconfirmed information, not a diagnosis or plan.
Establish clear boundaries around when and why to use it; monitor for late-night or compulsive use, and keep avenues open for real-time support from trusted adults. If a teen is concerned about self-harm, suicidal thoughts, extreme distress or disordered eating, seek professional help right away and use crisis services if they are available.
There is real potential for AI to make meaningful contributions in health care, the authors say, but not now specifically through chatbots designed to support teen mental health. Without thorough redesign and transparent validation, positioning them as trustworthy guides risks vulnerable young people becoming de facto test subjects.