OpenAI’s Sora has gone viral because it can create videos starring you — your face, your voice, your expressions — using nothing but pure code. That wow factor raises a hard question: once you have surrendered your image, how safe is it, and how much control do you think you retain?

What Sora Collects, and Why It Needs Your Data

For Sora’s Cameo function to work, OpenAI explains that it needs to save your face and voice data, so the system can produce new, consistent clips of you. Consider it a “faceprint” or a “voiceprint”: numerical representations generated from your samples. They are what make Cameo’s business work, but could also be classified as biometric data — one of the most sensitive kinds under recent privacy regulations.

Table of Contents

What Sora Collects, and Why It Needs Your Data
Where Your Face Lives Within Sora’s Systems
Deletion and the ChatGPT Link: What to Expect
Can Your Own Picture Train the Model in Sora
Real Risks Beyond the Hype of Biometric Video
How to Protect Yourself If You Try Sora’s Cameo
The Bottom Line on Sora, Biometrics, and Control

OpenAI’s general privacy policy says that the organization holds onto user data “for as long as it is needed to provide products and services, fulfill any applicable legal or reporting requirements, resolve disputes, and enforce agreements.” The company has also indicated that there will be user-level controls, and that you control who can see your Cameo and can take it back. Those guardrails count — but they do not in and of themselves answer how long raw assets, embeddings, and backups linger.

Where Your Face Lives Within Sora’s Systems

OpenAI has not released a detailed data flow (per service) for Sora publicly. Most production AI pipelines encrypt assets in transit and at rest, and save the content to cloud object storage while providing access from constrained services that compute and cache embeddings. The risk isn’t just theoretical: biometric databases are lucrative targets for hackers, who can use often-opaque breach disclosure laws to keep the public in the dark.

Security benchmarks such as NIST and ISO standards advise strong key management, rigorous access logging, data minimization, and short retention times for sensitive identifiers. The closer Sora sticks to those norms — and the more transparent OpenAI is about source file deletion, embeddings, derivatives, and so on — the more secure your data is in practice.

Deletion and the ChatGPT Link: What to Expect

One unexpected thing for early users: Deleting your Sora presence seems to cascade across your broader OpenAI account — including ChatGPT and related API access — and you can no longer use the same email or phone number afterward. That tight integration makes it easier for OpenAI to enforce identity and safety, but at the expense of citizens who want to remove their Cameo data without losing everything else.

Deletion clarity is especially important for biometric data. Best practice is a published retention schedule that tells you how fast the company will purge originals, embeddings, cached outputs, and backups after it learns that you revoked consent or deleted your account.

A family of wool ly mammoths walking through a snowy, mountainous landscape with evergreen trees.

Can Your Own Picture Train the Model in Sora

Users want to understand if their face or voice is being used to help train future systems. OpenAI provides opt-outs for some training in its products, but the nuances can matter: service-improvement versus model-training, or operational uses such as abuse detection, are treated differently. By the provisions of GDPR and CPRA, processing of biometrics usually needs that level of explicit consent and lawful basis; in these terms regulators in the EU or UK have pressed for companies to separate optional model training from core service delivery.

As for deepfake-capable tools, the upcoming EU AI Act will oblige clear notification (for synthetic media) and more stringent risk management. That doesn’t address all questions about storage, but it does increase the cost of noncompliance with sloppy data practices.

Real Risks Beyond the Hype of Biometric Video

OpenAI admits to a nonzero probability of policy-violating outputs, including sexual deepfakes featuring someone using the software to recreate their own likeness. It’s rare, but the cost of harm is high. History explains why caution is advised: Clearview AI’s mass facial scraping spurred regulatory actions in the European Union and UK over data, and litigation under Illinois’ Biometric Information Privacy Act, which taught us about the strictness with which biometrics are treated by the law.

Misuse isn’t the only concern. Secure systems may still suffer from insider misuse, over-broad internal access, or retention creep. Transparent audits, external red-teaming, and public transparency reports — techniques promoted by organizations such as the IAPP and civil-society watchdogs — are increasingly table stakes in systems processing biometrics.

How to Protect Yourself If You Try Sora’s Cameo

Use the least data necessary. Upload as few face and voice samples as necessary to get good quality, and make sure that children or bystanders aren’t in your source clips.
Lock down sharing. To the extent it’s possible to do so with Cameo, limit who can message you and generate shit with your face on it, check activity logs regularly, and revoke tokens or permissions quickly if something seems sketchy.
Exercise your rights. Search for where you can opt out of model training and request the deletion of originals, embeddings, and outputs attached to your identity. You may also be able to request access to your data and, in some cases, receive a copy of that data or erase it.
Plan for account coupling. If deleting Sora means nuking a larger OpenAI account, consider snapshotting anything from ChatGPT or the API you need first, and whether you’re fine with that kind of integration before you sign up.

The Bottom Line on Sora, Biometrics, and Control

Sora’s magic is based on biometric data that is one hundred percent yours and one hundred million percent sensitive. OpenAI has signalled controls — user consent, revocation, and policy enforcement — but the real measure of safety is service-specific transparency: retention timelines, training uses, auditability, and deletion including raw files and embeddings. Until that is precisely rendered, consider your face and voice as a password you can never change — disclose sparingly, opt out where feasible, and retain the power to pull the plug.