OpenAI has released a new open-source safety prompt pack designed to keep AI systems from serving underage sexual content and harmful self-harm information, marking one of the most concrete attempts yet to operationalize teen protections across third-party apps built on large models.
How the New Safety Prompt Pack Translates Youth Protections
The pack translates high-level youth safety goals into step-by-step instructions that can be dropped directly into model system prompts. It covers four pressure points for teen users: sexual content and romantic role play, self-harm and suicide risk, hazardous viral challenges, and unhealthy body ideals. Each section contains recommended refusals, safer redirections, and age-tailored explanations that prioritize de-escalation over drama.
- How the New Safety Prompt Pack Translates Youth Protections
- A Companion to Open-Weight Safety Tools from OpenAI
- Why these safeguards matter for teen safety and wellbeing
- What “good” responses look like in real-world practice
- Built with outside partners focused on youth safety
- Limits, law, and the road ahead for teen-focused AI safety
- What developers should do next to implement the prompt pack
OpenAI says the prompts are a sturdier alternative to generic policy summaries that leave too much up to interpretation. Instead of vague rules like “avoid mature content,” developers get explicit patterns such as when to block, when to offer supportive language, and when to nudge users toward trusted adults or professional resources—without straying into clinical diagnosis or determinative advice.
The company’s Under-18 principles, added to its model specification late last year, underpin the pack. They spell out stricter defaults for minors, narrower allowances for sensitive topics, and guidance for age staging (e.g., under 13 versus mid-teens) to prevent blanket filtering that frustrates older adolescents while still protecting younger users.
A Companion to Open-Weight Safety Tools from OpenAI
The prompts arrive alongside OpenAI’s open-weight model gpt-oss-safeguard, which reads a platform’s policy text and reasons about intent rather than relying solely on keyword lists. That matters for edge cases, like distinguishing educational discussion of puberty from sexualized content, or empathetic crisis support from content that could normalize self-harm.
By pairing the prompt pack’s operational rules with a policy-aware classifier, developers can create a two-layer defense: the base model is steered away from risky outputs, and a separate safety model double-checks both user inputs and model responses before anything reaches the screen. This “belt and suspenders” approach is becoming standard in safety-critical deployments.
Why these safeguards matter for teen safety and wellbeing
Teens use AI tools for schoolwork, health questions, and relationships—often after hours and alone. Public health bodies have warned that online environments can compound risk. The World Health Organization reports suicide as a leading cause of adolescent death globally, and child-safety organizations regularly log massive volumes of reports on online sexual exploitation.
Unlike a single consumer chatbot, the larger risk surface sits with thousands of third-party apps and bots powered by foundation models. Consistent safeguards frequently break in translation, either because policies are too abstract or because developers overcorrect and block legitimate, age-appropriate information. The prompt pack aims to normalize a safety floor so useful content gets through while obviously harmful content does not.
What “good” responses look like in real-world practice
In testing scenarios shared by safety practitioners, a well-configured system should reject underage sexual role play and steer conversations to healthy boundaries, without shaming the user. For self-harm disclosures, the model should avoid graphic content and instead respond with calm, nonjudgmental language, encourage seeking help from a trusted adult or professional, and provide supportive coping suggestions that are general and safe.
For body image and diet topics, the prompts encourage age-appropriate, evidence-based guidance that avoids extreme regimens and appearance pressures. When faced with dangerous stunts or challenges, the system should clearly explain the risks and refuse to provide instructions.
Built with outside partners focused on youth safety
The developer pack was crafted with input from Common Sense Media, which has long rated digital experiences for kids and teens, and everyone.ai, a safety-focused research group. Child-safety and digital-wellbeing organizations, including the National Center for Missing & Exploited Children and UNICEF, have advocated for operational guidance that reflects how teens actually use tech, not just idealized scenarios.
Crucially, the materials are open-source. That enables audits, red-teaming, and adaptation for sector-specific contexts like education, youth mental health apps, and family devices. It also lets global researchers benchmark the prompts against regional norms and regulatory expectations.
Limits, law, and the road ahead for teen-focused AI safety
Prompts alone will not eliminate risk. Age assurance remains imperfect, adversarial users evolve quickly, and safety systems can overblock content teens need—like factual sex education or compassionate crisis support. Regulators are raising the bar: the EU’s Digital Services Act demands systemic risk mitigation for platforms, the UK’s Online Safety Act elevates child protection duties, and U.S. policymakers continue to push for stronger guardrails around youth data and design.
Best practice now looks like a stack: age-sensitive prompts, policy-aware safety models, server-side filters, transparent logging, and routine external audits. Independent evaluations from groups such as the Partnership on AI and academic labs can catch blind spots, especially for multilingual and culturally nuanced contexts.
What developers should do next to implement the prompt pack
Teams integrating the pack should start with clear age buckets, define escalation paths for high-risk disclosures, and run adversarial tests focused on underage sexual content and self-harm. Measure false positives and false negatives, not just aggregate block rates. Document decisions in model and safety cards so parents, educators, and auditors can see how the system behaves.
OpenAI frames the release as a floor, not a ceiling. Still, by turning abstract principles into drop-in instructions—and pairing them with a policy-aware safety model—the company is pushing the ecosystem toward a more consistent baseline for teen protections, exactly where the risks are highest and the margins for error are thinnest.