FindArticles FindArticles
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
FindArticlesFindArticles
Font ResizerAa
Search
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
Follow US
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
FindArticles © 2025. All Rights Reserved.
FindArticles > News > Technology

Most A.I. Models Perform Like Children on Verbal Tasks, Study Finds

Gregory Zuckerman
Last updated: December 3, 2025 1:19 pm
By Gregory Zuckerman
Technology
7 Min Read
SHARE

Most of the world’s top artificial intelligence systems are failing to meet basic goals around safety, according to a new report from the Future of Life Institute. In the most recent round of the AI Safety Index, only three frontier models — those developed by Anthropic (Claude), OpenAI (ChatGPT) and Google (Gemini) — managed passing grades, ending up in the C range.

Only three AI models receive passing grades in safety index

The index assessed eight providers — Anthropic, OpenAI, Google, Meta (formerly Facebook), xAI, DeepSeek and two Chinese firms, Alibaba and Z.ai — against 35 safeguards in the areas of policy, product and governance. Both Anthropic and OpenAI scored C+. Google was slightly better than average with a standard-issue C. The other five vendors came in the D range, while Alibaba’s Qwen was given a D-.

Table of Contents
  • Only three AI models receive passing grades in safety index
  • Inside the scorecard: how the AI Safety Index evaluates
  • Where current AI models fall short on safety safeguards
  • Real-world alarms intensify amid psychological harm concerns
  • Regulatory momentum grows and the case for an AI “FDA”
  • What AI companies should do now to improve safety controls
A young boy in a blue hoodie looks intently at a glowing, metallic, wireframe human head, representing artificial intelligence, against a sparkling, ethereal background.

Those who produced the index saw a clear dividing line: a top tier of three companies and a trailing pack of five. But the takeaway wasn’t victory for the aforementioned leaders so much as a warning that “good enough” is not good yet. C-level results still signal something more like compliance in part and execution in patches, not a safety success story.

Inside the scorecard: how the AI Safety Index evaluates

A panel of eight experts in AI safety subjected company survey answers and public documentation to grading on the strength and maturity of controls such as content watermarking, red-teaming, model cards and system cards, incident and vulnerability reporting systems, whistleblower protections, and compute governance. It is first and foremost about actual, quantifiable action, not marketing fluff.

Both the strength and the weakness of the 35 indicators are their breadth, focusing attention not just on human rights but also in the areas of labor, environmental degradation, corruption and coups. And it exposes a long-standing problem: inconsistency with transparency. A lot of the current posture of AI safety is self-reported. Without standard disclosures and independent audits, regulators and the public still only have partial visibility.

Where current AI models fall short on safety safeguards

One of the most glaring weak spots is in “existential safety” — the policies and technical guardrails necessary for managing very capable autonomous systems. Three out of the top four ranked models earned Ds here; everyone else received an F. Although artificial general intelligence is still theoretical, the index contends that companies can’t afford to wait when it comes to detailing tripwires and escalation procedures, or preparing shutdown controls for systems that hit new frontiers.

For contemporary risks, most companies rely on benchmarks such as Stanford’s HELM and others that go beyond exposure to violence, sex or deception and check for misuse across domains. Those are important, but not sufficient. The report also notes it does not cover the measurement gap (in measuring psychological harm, youth safety, or long-horizon dynamics including slow model drift and 2COGNU goal) at length.

In other words, companies are getting far better at blocking crude prompts and labeling AI output, but you still can’t come up with a test for the subtler failure modes that matter in elongated, real-world use.

Real-world alarms intensify amid psychological harm concerns

Fears of psychological damage are no longer theoretical. The parents of a 16-year-old filed a high-profile lawsuit alleging that repeated interactions with a chatbot had led their daughter to have self-destructive thoughts. OpenAI has disclaimed responsibility and says it is reviewing related claims, but they have also intensified scrutiny of how models process crisis language, suicidal ideation and vulnerable users.

The ChatGPT logo, featuring a stylized black knot-like icon to the left of the word ChatGPT in black text, set against a professional light blue gradient background with subtle geometric patterns.

The index recommends that OpenAI should augment prevention for “AI psychosis” and suicidal ideation, while Google should boost protections for psychological harm. It also raises the red-flagged youth risk profile of role-play chatbots, citing Character.AI’s decision to pull the plug on its teen chat functions under legal pressure.

Regulatory momentum grows and the case for an AI “FDA”

AI safety experts say industry self-regulation is unlikely to be able to keep up with capability gains. They are calling for a regulatory approach that looks more like the pharmaceutical model: independent pre-release testing, post-market surveillance and clear recall powers for dangerous products. They believe that deploying powerful conversational agents in the absence of psychological impact studies is an ethical “loophole” that simply wouldn’t fly for medicine.

Governments have started to take action through frameworks such as NIST’s AI Risk Management Framework, interagency safety initiatives in the United States, the EU’s risk-tiered AI Act and international dialogues that began with the UK AI Safety Summit. The index adds incentives for enforceable standards: requirements to report incidents, audit requirements on frontier models and penalties when companies send unsafe systems out the door.

What AI companies should do now to improve safety controls

The document lays out tangible measures:

  • Grow independent red-teaming that challenges autonomous behavior, deception and capability leaps.
  • Release rigorous system cards detailing known hazards and mitigations.
  • Establish whistleblower-protecting channels.
  • Implement real watermarking of multimodal outputs.
  • Enforce strong age gating.
  • Build crisis-handling protocols that guide users to human help.

From an engineering standpoint, experts emphasize containment:

  • Rate-of-action limits determined by risk.
  • Tool-use sandboxes to restrict activity.
  • Capability assessments before enabling advanced features.
  • “Circuit-breakers” that disable behaviors when they cross safety thresholds.

None of these steps is a silver bullet. Together, they provide defense in depth — a layered approach that increases the likelihood of preventing catastrophic failure.

It can be summarized in a headline: Three frontier models are doing slightly better than the rest, but the bar is not high. Until we have transparent audits and enforceable guardrails, AI safety will continue to be a matter of trust rather than proof — and it will get the grades that result.

Gregory Zuckerman
ByGregory Zuckerman
Gregory Zuckerman is a veteran investigative journalist and financial writer with decades of experience covering global markets, investment strategies, and the business personalities shaping them. His writing blends deep reporting with narrative storytelling to uncover the hidden forces behind financial trends and innovations. Over the years, Gregory’s work has earned industry recognition for bringing clarity to complex financial topics, and he continues to focus on long-form journalism that explores hedge funds, private equity, and high-stakes investing.
Latest News
Uber Launches Robotaxi Service in Dallas with Avride
Google Discover is testing AI-based headline rewrites
India Drops Plan to Preinstall Government App
Google Drive Offers Gemini Search To Paid Users
Uber And Avride Launch Robotaxis In Dallas
Seven Easy Remedies to Put a Stop to My Smartphone Pinky Pain
Netflix To Make Daily 2026 World Cup Football Show
Tinder Daters Share 2026 Dating Trends: Hot-Take Dating Surges
YouTube Unveils Expressive Captions Featuring Emotion
Galaxy XR induces buyer’s remorse among Vision Pro owners.
Massive Galaxy Upgrades Confirmed As One UI 8.5 Changelog Leaks
New Report Reveals: Wallpaper Apps Failing to Build App Audience
FindArticles
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
  • Corrections Policy
  • Diversity & Inclusion Statement
  • Diversity in Our Team
  • Editorial Guidelines
  • Feedback & Editorial Contact Policy
FindArticles © 2025. All Rights Reserved.