The AI’s demand for compute has ballooned, dispersing workloads across a variety of clouds, regions and specialized GPU providers. Storage, however, is anchored to centralized clouds that were not built for the rigors of distributed compute. A startup founded by the team that built Uber’s data stack is banking on this, with an AI-native storage network that will track workloads to wherever GPUs call home—and take a chunk of Big Cloud in the process.
What Distributed Storage Means for AI Today
Training multimodal models and performing real-time inference require fast access to data near compute. Centralized storage becomes a bottleneck when models are scheduled across runtime providers, e.g., CoreWeave, Together AI, and Lambda. The result is increased latency, unpredictable throughput and costly movement of data — particularly when teams are bursting across regions or clouds.
The startup’s pitch is simple: supernet, a storage fabric that automatically places and replicates data close to active GPU clusters, reducing egress and round-trips. It’s designed to manage high file-count workloads common in generative systems — think image shards, audio chunks, and vector embeddings — as opposed to the large-object bias of traditional cloud buckets. The goal is to reduce tail latency for training, inference, and agent-driven applications.
This live strategy mirrors the way that real-time schedulers place compute today. If a model spins up on a new region or different provider for GPUs, the storage layer should react to that quickly and not require teams to backhaul datasets across an entire continent.
Avoiding The Cloud Tax And The Latency Trap
Exit fees — often referred to as the industry’s cloud tax — have long punished teams that need to run workloads across multiple clouds. Regulators are paying attention: the UK communications regulator cited egress charges and committed spend discounts as hindrances to switching, while the EU’s Data Act forces providers towards more possible portability. Market pressure is rising from competitors such as Cloudflare’s R2 and Wasabi, which sell low or zero-egress models.
But fees are just part of the pain. Long-haul data paths inject jitter that can ruin real-time responsiveness. Voice assistants that stream local audio and video generation pipelines need to work with sub-100ms round-trips; anything beyond starts to become noticeable lag. By integrating data near GPUs, the startup hopes to maintain tight median and p95 latency without rewriting application logic.
One customer, Fal.ai, says egress used to account for a majority of its cloud bill. Having embraced the distributed storage layer, it purports to be able to scale out across providers whilst having a consistent view of its file system, without being penalized with hefty data movement fees. That pattern has become more prevalent as teams balance out shortages of GPUs by spreading across the clouds.
How the Platform Works Across Clouds and Regions
The company runs a set of local data centers that are built to handle hot AI workloads. Data placement policies make copies and cache dataset subsets in proximity to active training/inference jobs, and rebalance when compute shifts. Developers use a single namespace rather than copying across buckets and regions.
Internally, the system is designed for high concurrency yet a small-object performance profile that many traditional object stores struggle with. The architecture values parallel I/O, aggressive caching, and close-at-hand routing to maintain the flow of throughput when thousands of workers inevitably hammer the same dataset. The design goal: performance at scale that you know will be predictable, not only theoretical peak bandwidth.
Crucially, it is multi-cloud out of the box. Instead of tethering data to a hyperscaler, the platform provides exposure of the same dataset to compute running in varying regions and providers with placement logic designed to minimize cross-cloud hops. That decoupling is the fundamental distinction from centralized cloud storage that’s closely integrated with in-house compute.
Who Is Using It, and How Are They Using It?
The company claims it has over 4,000 customers in service, leaning toward generative AI startups constructing image, video and voice systems where latency can be the same as user experience. Real-time inference pipelines, retrieval-augmented generation, and continual fine-tuning all find utility in data where GPUs live—#GPU everywhere readily comes to mind when speaking about teams pursuing availability across spot markets.
Next, the ability to control where their all-important data is located is a factor for organizations especially in healthcare and finance. And with increasingly stringent residency requirements and growing sensitivity about who has control over training data, organizations want clear lines of custody. High-profile cases in the news, such as Salesforce limiting its rivals’ access to Slack data, highlight how strategic data ownership has become in the AI age.
Growth, Footprint And The Competitive Set
The startup, which was founded in late 2021, is growing eight times year-over-year — a velocity that mirrors the mad dash for GPU power. It already has facilities in Virginia, Chicago and San Jose with plans for further expansion to London, Frankfurt and Singapore to serve core AI hubs across the US, Europe and Asia. The playbook looks less like traditional cloud region rollouts than it does GPU availability maps.
The competitive environment is changing fast. Hyperscalers have the ability to shrink egress fees and push edge storage near accelerators. Challenger storage providers sell zero-egress models, while data platforms such as Databricks and Snowflake invest in cross-cloud sharing. To win, the challenger has to deliver price and performance but with enterprise-grade safeguards around durability, uptime and recoverability — in areas incumbents have deep track records.
The bet is obvious: as compute gets truly distributed, storage must move from a central anchor to a fluid fabric. If this model consistently lowers egress, cuts latency and keeps developers locked into a single data view across clouds, it’s going to give Big Cloud’s most deeply entrenched profit centers fits and provide AI teams the consistency and agility they’ve lacked.