Over the past two years, enterprises have competed to bolt generative AIs onto every conceivable workflow. The payoff, however, continues to prove stubborn. A review by the MIT Media Lab concluded that most businesses probably aren’t realizing much from their deployments, no matter how bloated their budgets or urgent executive mandates to “do something with AI.”
One explanation manages to be both simple and enraging: workslop. Why? As researchers at BetterUp Labs wrote in Harvard Business Review, it is polished-looking AI output that doesn’t actually move work forward. It seems like a good idea at first but generates more downstream labor than it saves.

What ‘workslop’ looks like inside modern teams
Operationally, workslop is the AI-authored email a manager has to redraft, the well-structured brief that omits important details, or the chatbot-inferred synopsis that ignores edge cases.
BetterUp Labs estimates employees on average spend almost two hours a day — roughly 1 hour and 56 minutes — fixing or clarifying this content. The costs do not end with the author; others — peers and managers — get drawn into the cleanup, adding to the drag on productivity.
The researchers tie this back to cognitive offloading: people hand over thinking to tools. With workslop, that mental load doesn’t get internalized by the machine — it falls on a coworker. According to their results, the phenomenon most often manifests among peers (in around 40 percent of cases) and, interestingly, does tend to go top-down from leaders to followers in a significant minority of instances.
Scaled across a large organization, the math gets ugly. BetterUp Labs estimates companies with more than 10,000 employees could lose millions each year to the churn of workslop. They also estimate that about 40 percent of AI-produced workplace output falls into this mold — sobering if your leaders are using “prompts per day” as a measure of progress.
The ROI formula that executives fail to see
On paper, generative AI literally creates time. The truth is that much of the time saved in drafting simply disappears down the so-called black hole of verification, correction, and compliance with brand, legal, and regulations. Just compliance review is enough to turn a positive business case into a negative — especially in finance, healthcare, and public-sector realms where small factual inaccuracies have big financial consequences.
There’s also error propagation. A somewhat flawed analysis, generated by AI, can sprout an industry-trend slide deck and then a strategy memo and then an entire team’s quarterly plan. Every new resolution is also costlier than the last. “Gartner has been warning for some time that most AI projects never actually get off the ground, let alone scale,” said analysts at Gartner. “Every discussion with them about why they haven’t made more progress has snowballed and turned into quality debt.”

Meanwhile, surface-level metrics — above all else, number of pilots or prompts — hide what’s important: cycle time to verified outcome, edit distance between AI output and final delivery, and the amount of work that never involves rework. Measure those, and you can immediately see where value is dissipating.
It’s context, not capability, that gets stuck
Most of these models are fluent (but not situated). They don’t magically grok a company’s definitions, risk thresholds, entitlements, or canonical data sources. Without a strong connection to trusted knowledge, via retrieval-augmented generation, well-governed data catalogs, and human review, workslop is inevitable.
That’s why “copilot everywhere” solutions often come up short. Tools that have been trained on general internet text fall flat when it comes to industry-specific nuance. Researchers at Stanford’s Institute for Human-Centered AI and similar academic organizations have repeatedly raised the alarm about hallucinations and brittleness when tasks require grounding in a domain. The gap is not one of prompts; it’s a matter of context.
Where AI does show promise, it’s generally because the task is narrowly defined, there is clear ground truth and objective success criteria. Imagine — i.e., converting accepted KB articles into draft support responses or auto-generating test cases from approved requirements. Softer domains — strategy, legal interpretation, customer promises — have a way of reproducing workslop unless guardrails are tight.
Reducing workslop and surface real value
- Begin with use cases in which errors are inexpensive and verification is trivial. Prioritize specific, high-volume and plain tasks with expected results. If a person ends up having to read every line, the automation bar has not been cleared.
- Instrument your deployments. Measure the edit distance between AI output and final content, rework minutes per artifact, and the percentage of AI drafts that make it past an initial review. Put these metrics in the public domain and allow teams to kill or repair failing flows fast.
- Create explicit “AI no-go” zones. In anything related to regulatory interpretation, new customer commitments, or sensitive employee communications, opt for human-first drafting with AI restricted only to helping with style and format.
- Base models in your organization’s truth. Invest in retrieval pipelines on sanctioned sources with metadata, permissions, and provenance. Ensure a feedback loop so that corrections by reviewers feed back into the knowledge base, not just into the document of the day.
- Clarify ownership. The individual uploading AI-generated content is still responsible for accuracy. This should be a norm that leaders role model; nothing erodes trust faster than exec workslop.
- Train for collaboration, not substitution: when to employ AI, how to frame verifiable prompts, and when to abandon them. Add to that small, well-designed pilots and sunset criteria; scale only what outperforms baseline, do-no-harm testing.
The bottom line: AI gains require systems that prevent workslop
Generative AI can speed up work, but only if the system around it — data, processes, metrics, and culture — holds back workslop from proliferating.
Until content and verification are taken as seriously by companies, the value story will be something of a mirage: much to do, little effect.
