Indeed, the AI assistant embedded into X was found spreading nonsense about the shooting at Bondi Beach, failing to adequately label videos of events that were inaccurately labeled, wrongfully identifying individuals, and conflating different incidents.
The mistakes, magnified by widely shared clips, underscore the larger challenge of deploying generative AI during fast-moving crises.
What happened as Bondi Beach clips spread widely on X
As videos from the Bondi Beach episode circulated through X, Grok auto-replied under dozens of posts with confidently wrong answers. In one high-profile instance highlighted by Gizmodo, the bot continued on about how this clip of a man neutralizing an attacker is actually from back in the day when someone scaled a palm tree in a parking lot to trim branches, even though that makes no sense.
On another comment under separate footage, Grok claimed the vision was of storm surge at Currumbin Beach with waves taking cars away from a car park. The clip did no such thing. The assistant also misidentified a photo of the civilian who shot, incorrectly naming another man and attributing it to his family and outlets including Times of Israel and CNN—sources that did not report what the bot claimed.
Making matters worse, Grok combined details from different events, conflating parts of a university shooting in the United States with Bondi Beach. Erroneous responses continued to be visible on its platform, showing how content generated by AI is moderated when it appears directly below viral posts.
Why chatbots struggle with fast-moving breaking news
Generative models are good at generating fluent text, not verifying that it is reality. When faced with vague or low-context video, they frequently pattern-match to familiar online tropes—palm-tree mishaps, storm footage, or memeable stunts—and supplement from a range of plausible-sounding details. This is the standard “hallucination” failure mode: confident answers with no basis.
Two factors present particularly high risks to breaking news. First, the details in very fast-moving cases shift quickly, so any cached knowledge or scraped snippets could be outdated or wrong. Second, video understanding in these huge language models remains brittle; frames without any metadata are easily misread, and a model trained to be helpful often likes to speculate rather than say “I don’t know.” Those soft spots have been recorded again and again in studies filed on the AI Incident Database, as well as assessments from media watchdogs, which show they’re often apparent when attaching prompts to newsy items.
Then add on platform design and the risk balloons. On X, Grok’s answers can appear “embedded” under replies to posts for the masses, expressing a credible snippet of context. In the absence of clear disclaimers, citations, or deliberate uncertainty, many users will interpret those responses as vetted knowledge.
Why it matters for X and its users during crises
Misinformation during crises can feed panic, misdirect attention from victims and responders, and frustrate law enforcement. Much of the time (63 out of 100 instances), a second exposure to fake news reinforced belief in it, even among those who knew that it came from an untrustworthy source; similarly, seeing a corrected version of a faked item confirmed false beliefs in 45 out of every 100 cases.
As Grok and X hold ownership in common, accountability is intermingled. In systems like the EU’s Digital Services Act, which extensively prescribe very large platforms to mitigate systemic risks, including disinformation. When platform-native AI products generate misinformation inside news threads, we need nets wider than model tweaks—safeguards at a platform level.
How to minimize harm from AI replies on breaking news now
There are some simple fixes that could reduce error rates and minimize the fallout:
- Turn off or slow AI replies on developments in violent events and public safety until a human moderator has approved them.
- Demand provenance: no assertion of people, places, or dates without citations a reader could validate.
- Force uncertainty: if the model is unable to confirm, it should withhold judgment or explicitly report that it does not have enough information.
- Level up platform fact-checks: add Community Notes or verified newsroom inputs prior to model answers.
- Enhance incident reporting: regularly publish transparency metrics about the accuracy of AI replies and the speed of takedowns.
News organizations provide a model here. The Associated Press and the Poynter Institute recommend a high level of human intervention, including banning generative tools from creating unconfirmed breaking stories. The same discipline would benefit consumer chatbots that live in social feeds.
The bottom line: verification must come before speed
Grok’s Bondi Beach mistakes weren’t edge cases; they were the inevitable result of deploying an assertive text generator in the real-time news cycle. Until platforms and model creators design for verification first, then speed, treat AI replies in live news as unverified assertions rather than context. The fix has less to do with smarter prose and more with clearer guardrails.