Google has removed its AI-powered Overviews from some health-related Searches following criticism over how it promoted questionable medical information — and plenty of examples of AI providing bad advice. The measured reversal illustrates the tenuous nature of generative responses when nuance and clinical context count.
The decision was a response to reporting that had pointed out some of the AI-generated summaries would display what were essentially “normal” numbers for liver function tests without adjusting for factors like age, sex, ethnicity, or lab methods. In subsequent tests, similar questions appeared to be briefly pulling AI summaries before vanishing as Google tweaked search engine results. The company has not commented on which specific removals it implements, instead saying only that it makes broad-based improvements, and adding that its in-house clinicians had reviewed flagged examples and found many to be congruent with high-quality sources. The AI Mode prompt in Search (prompt-optional) remains available for some users.
Why Google scaled back AI in health search results
The immediate threat is the safety of patients. There’s also the risk that providing one “normal range” for liver panels will lull a patient into false reassurance when values are at the edge of what you’d expect in a lab’s population-adjusted range, or cause undue alarm when methodology or demographics account for the difference. The British Liver Trust welcomed the takedown but cautioned that the larger dangers of AI-generated summaries of people’s health issues remain unaddressed.
The stakes are high: Google has said in the past that about 1 in 20 searches are health-related ones.
That volume amplifies any systematic error, even if a majority of the summaries are technically from decent pages. In reality, the act of distilling complicated guidance into a couple of sentences on paper or the screen can obscure subtleties that doctors find to be crucial.
How something called a reference range went so wrong
Reference intervals for enzymes like ALT, AST, ALP, and Gamma-GT differ by laboratory method, equipment (manufacturer), age, sex, and—for some of these also—ethnicity and pregnancy. Professional bodies (for example, the Royal College of Pathologists) and guidance organizations (such as NICE) advocate that locally validated ranges and clinical context should be employed. A sole, stand-alone number could potentially lead to miscategorization.
Generative models are often “range compressed”: they average across sources and trim off qualifiers (“depends on the lab” or “higher in adolescents”) to produce a tidy-sounding answer that looks authoritative but isn’t safe.
In health, where outliers can be clinically significant, this summarization bias can misinform both lay people and rushed clinicians who are looking on the fly.
How this fits into Google’s health content safety playbook
Google’s search quality evaluator guidelines place medical content in the category of “Your Money or Your Life,” requiring high levels of expertise, experience, authoritativeness, and trust signals. The company has said that it suppresses AI Overviews for some sensitive topics and that it’s constantly fine-tuning triggers. The quiet elimination of AI summaries for some risky query classes—lab reference ranges, say—is consistent with that policy.
The company defers, too, to expert review. It states that clinicians internally evaluate edge cases to inform adjustments, and it gives preference to citations from established authorities. Google already gets some medical information from sources like the Mayo Clinic for condition panels in its Knowledge Graph, and it favors structured content written by clinicians over freeform narratives where accuracy is crucial.
What this change means for search users and publishers
Before this change, when you looked up lab ranges there would be traditional results and authoritative resources on top of a generated overview. That can minimize confusion and surface the lab-specific nuance that AI typically lobs off. It nudges people toward their own test report—where the appropriate range is printed—and consulting their clinician.
For publishers, the change might modestly restore traffic to high-quality reference pages in delicate categories. AI can hoover up clicks by answering inline; deleting them in medicine re-balances visibility toward peer-reviewed guidelines, clinical society pages, and patient charities with relevant expertise.
What to watch next as Google adjusts health AI results
The central questions now are the scope and durability of those measures. Will Google de-emphasize removal to other delicate knowledge systems—drug dosing, pregnancy risks, pediatric thresholds, differential diagnoses—or tweak AI Overviews to yield stronger disclaimers and lab-specific logic? Public health groups and clinicians are expected to push for clearer guardrails as well as transparency around query classes from which generative answers are prohibited.
Regulators in various regions have called upon major platforms to address the systemic risks of health misinformation. With that context, a disciplined retreat is about product safety and risk management. The practical best practice is the same: use search to find trusted sources, read the lab’s own reference range, and treat AI summaries when they appear not as prescriptions but as pointers.