OpenAI is developing a model which can transform text and audio prompts into original music, The Information reports in what is yet another signal of a deeper move into creative audio. Early descriptions indicate that the tool might at one point score videos on-the-fly or augment instrumental backings to vocals — possible workflow solutions for creators, editors, and musicians.…
It’s unclear when and how the product will be packaged, but the company is said to be considering whether to make it a stand-alone app or incorporate it into existing products like ChatGPT or the Sora video model. One interesting detail: OpenAI has brought in students from the Juilliard School to manually annotate scores, suggesting that the training mix will involve structured musical data rather than raw audio alone.
- What the new AI music tool can do for creators and editors
- Why training data matters for AI-generated music
- A crowded field, and clear benchmarks in music AI
- Technical hurdles and costs of high-fidelity audio AI
- Business model and creator impact for music tools
- What to watch as OpenAI expands into music creation

What the new AI music tool can do for creators and editors
The most practical use cases are pretty obvious. Imagine ordering up “an 85 bpm warm lo-fi beat” to place under a vlog, or asking for “the addition of nylon-string guitar in a bossa nova style” behind an existing vocal. The tool also was capable of style transfer, tempo-aware accompaniment and adaptive scoring that matches the arc of a given video scene. If OpenAI integrates text, audio, and perhaps MIDI prompts together, their creators could iterate rapidly — hum in a melody, describe a vibe, tweak from brief textual hints.
A more advanced system would also facilitate stem-aware editing (ability to isolate vocals, bass, drums), key matching and dynamic lengthening (so the music ends on a cadence when the video cuts), etc. These are the pain points of non-musician creators who now depend on stock libraries and manual edits.
Why training data matters for AI-generated music
That OpenAI is using annotated scores points to the primacy it gives symbolic structure — that is, notes, chords, rhythm and form — rather than only learning from audio waveforms. That serves to improve coherence, lessen looping artifacts and force the model to follow musical rules at longer durations. In application, by combining symbolic data with paired recordings one can obtain improved and more musical-quality phrase timing as well as instrument timbre and better morphing between transitions.
The data strategy also comes with legal and licensing downsides. Rights holders have made it difficult for music AI models trained on commercial catalogs. The recording industry has already sued the top music generators, for unauthorized copying of sound recordings. Against that background, curated data sets, licensing deals and human-provided annotations do more than just improve quality — they are risk management.
The stakes are high. According to the IFPI, the global recorded music market in 2019 was worth approximately $28.6 billion and streaming accounted for around 67% of that revenue. The value chain from composition through production, distribution and monetization is the one all music AI must intersect if it scales to consumer or creator markets.
A crowded field, and clear benchmarks in music AI
OpenAI is not first. We also note that the MusicLM research from Google as well as YouTube’s experimental Dream Track (which used the Lyria model) have demonstrated plausible text-to-music generation and artist style conditioning in controlled pilots. Meta’s open MusicGen as well as startup products from Suno and Udio have shown viral-ready song creation within minutes — with verses, choruses, catchy hooks and all.

Those products established user expectations: fast generation, radio-like fidelity, and editability. They also set legal guardrails. Startups have battled litigation risks and gone under the microscope over the issue of training data provenance, and platforms experiment with content provenance tracks and watermarking to keep synthetic audio traceable. If OpenAI releases a music tool, provenance, opt-outs and label partnerships will feature prominently.
Technical hurdles and costs of high-fidelity audio AI
High-fidelity audio generation is compute-heavy. To produce a minute of stereo audio at 48 kHz, one must generate over five million frames, and doing so interactively with low latency and editability is not trivial. The models have to manage long-range structure (song sections) with spiky transients (drum hits, pick attacks), and maintain phase coherence so instruments don’t smear.
The winning stack fuses hierarchical modeling (planning structure at a high level, then rendering with diffusion or autoregressive decoders) along with tools for inpainting, melody conditioning and source separation. If OpenAI weaves this into Sora, or other multimodal systems, it could synchronize music to on-screen action using scene embeddings and offer creators soundtrack-quality soundtracks without manually inputting cue sheets.
Business model and creator impact for music tools
Pricing will signal intent. There could be a consumer-friendly tier for mass adoption among short-form video makers and podcasters, and a “pro” tier offering stems, higher bitrates, commercial licensing and DAW integrations aimed at studios. Revenue sharing or licensing pools — perhaps in collaboration with publishers and labels — could ease industry tensions and expand access to catalogs.
For musicians, the short-term value is speed: sketching arrangements, auditioning styles and creating tidy demos. Access, for those who aren’t musicians: a pathway from a prompt to a finished track without passing through music theory. The risk, as ever, is dispossession; the opportunity is more work for human creatives who come to direct and refine AI outputs.
What to watch as OpenAI expands into music creation
Key signals include whether OpenAI gets deals with the major labels and publishers, launches opt-in programs for artists and ships watermarking or provenance metadata by default. Also keep an eye out for integrations with ChatGPT and Sora, DAW plugins and more fine-grained editing tools. If it provides high-grade, legally sourced generation and physical controls that you don’t really have to think about much, it could very well establish itself as the default soundbed generator for the internet in short order.