FindArticles FindArticles
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
FindArticlesFindArticles
Font ResizerAa
Search
  • News
  • Technology
  • Business
  • Entertainment
  • Science & Health
  • Knowledge Base
Follow US
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
FindArticles © 2025. All Rights Reserved.
FindArticles > News > Technology

OpenAI Asks Contractors To Upload Real Past Work

Gregory Zuckerman
Last updated: January 10, 2026 10:02 pm
By Gregory Zuckerman
Technology
8 Min Read
SHARE

OpenAI is said to be asking third-party contractors applying for a gig to submit examples of “real, paid work on production-level projects” its developers have produced in their previous or current jobs, an indication both of how desperate the industry has become for good-quality training data and, depending on whom you ask, how slippery ethical standards are regarding confidentiality, copyright, and risk management.

What OpenAI Is Requesting From Contractors Now

Now, Wired reports that OpenAI and data provider Handshake AI are telling contractors to describe the work they did as part of their employment and upload actual artifacts — Word documents, PDFs, PowerPoint files, spreadsheets, images, or code repositories — instead of summaries. The idea seems to be that the corpus would be filled with “real-world examples from a specific domain,” reflecting the kind of office work you might do every day, such as project proposals, analysis decks, customer emails, and technical documentation.

Table of Contents
  • What OpenAI Is Requesting From Contractors Now
  • Why Real-World Data Matters For Training AI Models
  • The Legal and Ethical Tripwires in This Approach
  • A Broader Industry Pattern Emerging in AI Training
  • What Contractors and Businesses Should Do Now
  • The Bottom Line on Training Data and Ownership Risks
A man in a suit with a pixelated face, working between two computer monitors, rendered in blue and black with a grainy texture.

According to contractors, they are instructed to sanitize proprietary and personally identifiable information from files before they are uploaded, with instructions referencing a ChatGPT-powered “Superstar Scrubbing” tool designed to assist in redaction. OpenAI did not respond to a request for comment from Wired.

Why Real-World Data Matters For Training AI Models

AI models today are starved for instances that capture the nuance of real workflows. Public web data doesn’t live up to that standard — either too generic, unreliable in quality, or lacking the formatting, formulas, and context found in business documents. Internal memos, product specs, budget models, and process docs reveal how knowledge work actually gets done. That’s exactly the kind of signal models need to perform tasks such as drafting a market analysis, constructing a financial spreadsheet, or turning meeting notes into an actionable plan.

The data squeeze is real. Some analysts, like the ones at Epoch AI, have cautioned that a data wall could come soon for high-quality texts and force labs into partnerships and licensing existing archives or creating bespoke datasets. “Eighty plus percent of business data today is unstructured,” says Gartner, referring to PDFs, slides, and emails as the bulk of enterprise-level learning resources, thus not available for OpenAI’s web-text training. One way to help narrow that gap, without outright raiding a customer’s private corpus, would be to pay contractors to curate realistic examples.

The Legal and Ethical Tripwires in This Approach

This method is not without risk, even with redaction. Any lab that relies on contractors to decide what’s safe to upload is “placing themselves at great risk,” intellectual-property attorney Evan Brown told Wired. Confidentiality depends on context, and the potential for mishaps is high. Scrubbing tools can overlook subtle information buried in metadata, document revision histories, once-hidden spreadsheet tabs, or comments. “Just because you black out names and numbers does not remove protections under trade secrets or copyright if the underlying content is still identifiable,” he wrote.

There’s a compliance angle, too. Many employment agreements and NDAs (and perhaps just intuition) say you cannot share job product beyond your employer — no matter how many private fields are stripped. And copyright doesn’t go away when you remove PII: a report’s structure, phrasing, charts, novel analysis — all of that can still be covered by copyright. Recent lawsuits over AI training — from the news publishers and authors to image libraries — are a reminder that permission and provenance do matter. Though those are for upconversion and dataset use, the methodology is the same for curated uploads.

A black, stylized knot-like logo with six interconnected loops forming a central hexagon, set against a professional flat design background with soft blue and purple gradients and subtle geometric patterns.

A Broader Industry Pattern Emerging in AI Training

OpenAI is not the only one seeking better training material. Data vendors and AI labs have also increasingly employed managed workforces to generate, label, and evaluate complex examples — legal reasoning problems, spreadsheet modeling tasks, and software debugging questions — in hopes of increasing performance on tasks that drive enterprise interest. Companies like Scale AI and Surge AI have popularized expert labeling. Others license archives from publishers or arrange access to private repositories. The common thread between those approaches is the same: synthetic or scraped text just can’t substitute for task-grounded data that reflects actual jobs.

But the line between “representative” and “repurposed” is slender. If contractors regenerate templates, or paraphrase employer work too similarly, we might still be feeding models protected expression. On the other hand, if the guidance is to generate fiction and some of that little-used dataset omits messy details, then we can’t forget to include messy details in office-type documents when training. It’s finding that balance — between authenticity and not violating the law — where things get tricky.

What Contractors and Businesses Should Do Now

What’s safe for contractors is not to upload anything that was produced under an employment or client agreement unless you own it and have secured the rights to repurpose the materials.

If you are asked for samples, think about creating sample documents — short, made-up ones that display your abilities without coming from previous work product — and strip away anything traceable to any employer, client, or project.

For AI companies, a defensible approach would involve an unambiguous prohibition on employer documents, automatic and manual processes to check that there is no sensitive material in the artifacts, explicit permissions for any licensed content you make use of, and an audit trail tracking provenance. Human-in-the-loop review needs to consider more than just text; it must also look at metadata and embedded objects. Legal teams will also desire policies that seem defensible if regulators or plaintiffs ask about how a dataset was obtained.

The Bottom Line on Training Data and Ownership Risks

OpenAI’s reported request highlights a fundamental tension in AI development: to train on white-collar tasks, models require real-world, high-signal data — but the best examples are often someone else’s protected work. Until licensed pipelines become the industry norm, or we come up with better ways to approximate real-world artifacts without borrowing expression, more stickiness is coming at the juncture of innovation, privacy, and IP.

Gregory Zuckerman
ByGregory Zuckerman
Gregory Zuckerman is a veteran investigative journalist and financial writer with decades of experience covering global markets, investment strategies, and the business personalities shaping them. His writing blends deep reporting with narrative storytelling to uncover the hidden forces behind financial trends and innovations. Over the years, Gregory’s work has earned industry recognition for bringing clarity to complex financial topics, and he continues to focus on long-form journalism that explores hedge funds, private equity, and high-stakes investing.
Latest News
Microsoft Permits Admins to Uninstall Copilot With Conditions
Scams on Instagram Offering Password Reset Services Increase
Indonesia Blocks Grok for Nonconsensual Deepfakes
CES 2026: Nine Crazy-Cool Prototypes Worth Watching
Imagiyo AI Image Generator Standard Plan costs $34.97 for life
Emoji 18.0 candidates may add a pickle and a meteor
CES 2026 Highlights of the Top Desktops and PC Hardware
Spotify halts ICE recruitment ads after backlash
Lenovo Executive Shares Qira Cross-Device AI Strategy
iOS 26 Adoption Trails at 15% as Users Wait
Surfshark: 3-Year VPN Plan for $67.19 with Code
CES Robots Hiccup While ChatGPT Health Debuts And Grok Stumbles
FindArticles
  • Contact Us
  • About Us
  • Write For Us
  • Privacy Policy
  • Terms of Service
  • Corrections Policy
  • Diversity & Inclusion Statement
  • Diversity in Our Team
  • Editorial Guidelines
  • Feedback & Editorial Contact Policy
FindArticles © 2025. All Rights Reserved.