Technology Review, p. 4

“I spent the week testing Grokipedia, xAI’s new AI-built encyclopedia pitched as an anti-Wikipedia. It’s fast, slick, and confident—yet my sessions confirmed a simple truth: neither a crowdsourced wiki nor an AI-curated index can guarantee reliability. Each system solves one bias only to inherit another.”

Table of Contents

What Grokipedia is and how it works in practice
User feedback without editors and how reports are handled
How Grok models rank on hallucination and benchmark tests
Bias and provenance: comparing Grokipedia and Wikipedia
How to use both without getting burned by bias and errors

What Grokipedia is and how it works in practice

Grokipedia—this new tool looked more like a stripped-down search portal, and everything about it was expressly labeled “v0.1.” You’d enter a topic, see a list of entries, and click through to get a declarative summary, each stamped with a big flag declaring this content was written and “fact-checked” by Grok, which is xAI’s flagship model. A little running counter in the corner told me we were looking at more than 885,000 entries in my testing—a quick ramp for a new tool.

There are catches, however. Observers at The Verge found some pages quietly noting that they were adapted from Wikipedia, in accordance with the Creative Commons Attribution-ShareAlike 4.0 license. In other words, the anti-Wikipedia admits it uses Wikipedia, but then passes all future editorial decisions through an LLM. It’s probably legally okay, but it complicates the idea of it being a fresh-start alternative.

User feedback without editors and how reports are handled

Grokipedia lets users with X accounts flag statements via an “It’s wrong” button, in keeping with the Community Notes approach. Wikipedia relies more on human editors, talk pages, and transparent revision histories. The first model is algorithm-first with crowd signals but community-first with policy and provenance.

Both models do not guarantee ground truth, but each fails differently: Grokipedia tends to silent model drift; Wikipedia tends to editorial capture and inconsistent enforcement of sourcing rules. xAI notes that Grok tries to maximize truth and objectivity.

The broader record on LLMs preaches caution. A recent EBU–BBC study indicates that approximately half of news-related chatbot answers are riddled with major issues—hallucinations, old facts, or dubious sourcing. This report is consistent with what most newspapers experience on a daily basis: the AI will produce flawed copy even when the facts are rotten.

Testing xAIs Grokipedia: an anti-Wikipedia built on Wikipedia content

There is also the issue of data. xAI describes its access to public posts on X as a challenge to configuration. However, a study posted on arXiv not too long ago found that training on high-engagement, low-quality social content was linked to a decline in output trust scores and a surge in dark traits like manipulativeness. Social platforms are full of signals, but they are also noisy, divided, and, in many cases, more prepared for virality than evidence.

How Grok models rank on hallucination and benchmark tests

Independent hallucination leaderboards hosted on GitHub show Grok models performing unevenly across tasks. For instance, while the widely watched summarization benchmark showed Grok 2 as more hallucination-prone than several peers, the newer Grok 4 hovered around the 99th slot among frontier models at the time I observed, near OpenAI’s o4-mini-high and Microsoft’s Phi-4. These rankings move often, but the takeaway is steady: Grok is competitive, not infallible.

Bias and provenance: comparing Grokipedia and Wikipedia

Any knowledge base reflects its curators. Wikipedia’s volunteers have long grappled with systemic biases, including topic coverage gaps, demographic skews, and editorial wars. Grokipedia simply shifts the locus of bias to model training choices and prompt-time steering, plus the personal imprint of its most visible backer. In my tests, entries echoed priorities familiar from Elon Musk’s posts. I encountered a page on societal collapse that devoted heavy attention to falling birth rates and the compounding effects of mass immigration, a frame that isn’t prominent in Wikipedia’s long article.

The Grok entry itself repeatedly styled the system as unusually “truthful,” “rebellious,” and free of conventional political biases. That narrative coherence is branding; it isn’t validation. Wikipedia’s strengths are legibility and provenance: you can scrutinize citations, diffs, and talk-page debates, then trace a claim to the source. Its weaknesses are uneven coverage, disputes that simmer, and policies that can be gamed. Grokipedia’s strengths are speed and synthesis at scale; its weaknesses are opaque sourcing, model hallucinations, and the risk of importing platform and authorial biases without clear audit trails.

How to use both without getting burned by bias and errors

Treat either site as a map, not the territory. For breaking or contested topics, triangulate with primary documents, peer-reviewed research, and reputable newsroom reporting. On Wikipedia, check the talk page and the citations before accepting a claim. On Grokipedia, look for licensing notes, compare against multiple sources, and beware of definitive language without attribution.

The practical rule is simple: trust mechanisms, not vibes. Wikipedia’s mechanism is transparent human editorship; Grokipedia’s is model-mediated synthesis plus crowd flags. Until LLMs demonstrably cut hallucinations to near 0% and expose their sourcing with robust audits, they’re best considered intelligent aggregators—useful, quick, and occasionally brilliant, but never a substitute for verification.