FindArticles FindArticles
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
FindArticlesFindArticles
Font ResizerAa
Search
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
Follow US
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
FindArticles © 2025. All Rights Reserved.
FindArticles > News > Technology

AI Still Struggles With Social Media Fights, Study Finds

Gregory Zuckerman
Last updated: November 9, 2025 9:03 pm
By Gregory Zuckerman
Technology
7 Min Read
SHARE

AI can code, summarize, and even counsel, but it’s still bad at sounding like a snarky human in a heated reply thread. A multi-university study finds that big language models fail to reproduce the emotional punch and off-the-cuff snap of actual human users across Bluesky, Reddit, and X, with evaluators correctly identifying AI-generated responses 70–80% of the time.

The telltale distinction is not one of grammar or familiarity of subject matter. It’s a matter of tone — or, more precisely, the methodically measured absence of one. AI replies are reliably less toxic, less biting, and more homogenized in style, making them easy to recognize when the conversation gets heated.

Table of Contents
  • Why Bots Don’t Get the Tone Right in Online Arguments
  • Inside the Cross-Platform Test of Nine AI Models
  • Toxicity and Alignment Are the Giveaway Signals
  • Where AI Stumbles on Different Platforms
  • The Moving Target of Personality in Chatbots
  • What This Means for Moderation and Misinformation
A black stylized letter X logo on a white background, resized to a 16:9 aspect ratio.

Why Bots Don’t Get the Tone Right in Online Arguments

According to university researchers in Zurich, Amsterdam, Duke, and New York, models mimic the form of online conversation but lack its spirit. Humans improvise; we change registers, escalate or de-escalate on the fly, and use sarcasm with cultural nuance. Models optimized to be safe and broadly polite default to “smoothed out” language that is short of the spontaneous, affect-laden edge more typical in social spats.

That gap appears most clearly in toxicity scores — measures used to quantify hostile or insulting language. You can see it in scores from 1–3; human responses to one another in a heated thread spike, while AI responses don’t. The result is a signature pattern: grounded sentence length and construction, but with bite conspicuously missing.

Inside the Cross-Platform Test of Nine AI Models

They tested nine open-weight models across six families — working up from Apertus, DeepSeek, Gemma, Llama, and Mistral to the Qwen family, plus a bigger version of one of the Llama styles. They produced responses adapted to each platform and asked reviewers to evaluate which posts appeared human. The AI texts were “easily distinguishable” across sites, pushing well above chance and grouping around 70–80% for correct identifications.

Importantly, the research found that models ape features of style — such as word counts and sentence lengths — much more accurately than they do for less explicit social cues. When the conversations required emotional expressiveness, especially of a negative or sharply humorous kind, the machines stumbled.

Toxicity and Alignment Are the Giveaway Signals

Across platforms, AI responses were substantially less toxic than human posts, and the toxicity score was a primary discriminator. Tools commonly used in content moderation, like classifiers that detect toxicity, might aid in flagging machine-written replies not because they’re worse than regular ones but because they tend to be tamer in the very spaces where humans often turn up the volume.

Yet another twist: several of the base models, tune-free — like Llama-3.1-8B, Mistral-7B, and Apertus-8B — outperformed their instruction-tuned counterparts in the task of imitation of humans. The authors hypothesize that alignment training might even be imposing stylistic regularities on the text that would not occur in language, therefore making the text less human-like. It’s the paradox of safety work: the more models learned to be polite and predictable, the more they stuck out in unruly, even human, stampedes.

The letter X in white with a blue Twitter bird perched on its upper right side, set against a dark, textured background.

Where AI Stumbles on Different Platforms

Context matters. The study discovered that models like these had trouble injecting positive emotion on X and Bluesky, an irony on feeds heavy with ironic praise and backhanded compliments. The politics on Reddit was even tougher: the site’s massive subcommunities impose their own unique norms, inside jokes, and rhetorical tics that fly over generic models.

In aggregate, models performed best on X, worst on Bluesky and Reddit. That ranking follows platform culture: short, punchy responses on X are easier to ape structurally than Reddit’s longer, context-heavy exchanges that require deeper social calibration.

The Moving Target of Personality in Chatbots

Complaints by consumers who feel chatbots volley from overly deferential to gruffly brief illustrate the delicate game of style tuning at work. Small changes in policy or safety barriers spread like ripples into how models argue — or not. As those model providers tweak tone for safety, brand voice, or regulation compliance, the potential to “perform” human-style antagonism does too.

That volatility also has implications for adversaries. If astroturfers are dependent on oriented models, then the “look” of their messaging risks being bland and easily detectable. Purpose-built, more modestly aligned generators might close the gap, but they run the risk of crossing lines in content moderation and getting filtered out by platforms.

What This Means for Moderation and Misinformation

The results should be a source of encouragement for platform trust and safety teams. Less toxic language and back-and-forth alignment “tells” form robust bot-detection signals alongside network and behavioral markers. Outside projects like Botometer at Indiana University have demonstrated how persistent stylistic quirks can combine with engagement patterns to reveal automated accounts.

But the race isn’t over. As open-weight models continue to advance and fine-tuning gets more focused, some will be trained to mimic human volatility — including of the darker variety. Even so, humor and context and cultural timing refuse to cede their humanity. If a response makes you wince and laugh at once, it’s unlikely to have come from a machine — not yet, anyway.

Gregory Zuckerman
ByGregory Zuckerman
Gregory Zuckerman is a veteran investigative journalist and financial writer with decades of experience covering global markets, investment strategies, and the business personalities shaping them. His writing blends deep reporting with narrative storytelling to uncover the hidden forces behind financial trends and innovations. Over the years, Gregory’s work has earned industry recognition for bringing clarity to complex financial topics, and he continues to focus on long-form journalism that explores hedge funds, private equity, and high-stakes investing.
Latest News
Elon Musk Shares Grok Web Video Expressing Love
YouTube TV Offers $20 Credit During Disney Blackout
Trump Mobile’s Gold T1 Phone Still Has Yet to Ship
Roborock Qrevo Edge Weekend Sale at 43% Off
Pocket 1TB Drive Launches With Dual USB Connectivity
Apple’s Satellite-Powered iPhone Expansion Plan
Oracle-Affiliated Data Breach Cripples The Washington Post
Vodafone and AST SpaceMobile strengthen EU Starlink challenger
Vodafone and AST SpaceMobile Announce European Center
FolderFort Offers One-Time 1TB Cloud Storage
Expert Offers 12 Android Apps For Non-Tech Users
Garmin Venu 4 has become the best athlete smartwatch
FindArticles
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
  • Corrections Policy
  • Diversity & Inclusion Statement
  • Diversity in Our Team
  • Editorial Guidelines
  • Feedback & Editorial Contact Policy
FindArticles © 2025. All Rights Reserved.