The private cloud returns for AI workloads

A North American manufacturer spent most of 2024 and early 2025 doing what many innovative enterprises did: aggressively standardizing on the public cloud by using data lakes, analytics, CI/CD, and even a good chunk of ERP integration. The board liked the narrative because it sounded like simplification, and simplification sounded like savings. Then generative AI arrived, not as a lab toy but as a mandate. “Put copilots everywhere,” leadership said. “Start with maintenance, then procurement, then the call center, then engineering change orders.”

The first pilot went live quickly using a managed model endpoint and a retrieval layer in the same public cloud region as their data platform. It worked and everyone cheered. Then invoices started arriving. Token usage, vector storage, accelerated compute, egress for integration flows, premium logging, premium guardrails. Meanwhile, a series of cloud service disruptions forced the team into uncomfortable conversations about blast radius, dependency chains, and what “high availability” really means when your application is a tapestry of managed services.

The final straw wasn’t just cost or downtime; it was proximity. The most valuable AI use cases were those closest to people who build and fix things. Those people lived near manufacturing plants with strict network boundaries, latency constraints, and operational rhythms that don’t tolerate “the provider is investigating.” Within six months, the company began shifting its AI inference and retrieval workloads to a private cloud located near its factories, while keeping model training bursts in the public cloud when it made sense. It wasn’t a retreat. It was a rebalancing.