The Math Behind the GPU Hustle
Running a "Neocloud"—a specialized cloud provider renting out GPUs for AI training—seems like a modern gold rush. But the underlying economics are incredibly sensitive to utilization and scale.
If you buy a single Nvidia H100 for $25,000 (plus $5,000 for operational expenses) and rent it out at a market rate of $2.30/hour, a 100% utilization rate over the card's 4-year lifespan yields an impressive 28% compounded annual growth rate (CAGR). However, if utilization drops below 55%, you would be better off putting that money into broad market index funds.
Because compute is heavily commoditized, customers relentlessly seek the lowest prices. Without unique features, the only way to survive is through operational efficiency.
The Scheduling "Tetris"
Efficiency in a Neocloud largely comes down to scheduling. Unlike traditional cloud computing (SaaS/PaaS) where virtualization makes load balancing simple, "GPU as a Service" operates much closer to the metal.
Customers don't just want one GPU. Even older models like GPT-3 require more VRAM than a single H100 provides, and trillion-parameter models require massive parallelization. But distributing a workload across fragmented GPUs over a network introduces severe communication bottlenecks.
- Using 8 GPUs in parallel is only about 77% as efficient as running on a single theoretical mega-card.
- Scaling to 512 GPUs drops efficiency to 74%.
Because of these bottlenecks, customers demand their compute in specific geometric shapes—GPUs that physically share the same node and high-speed interconnects. Neocloud operators must play a high-stakes game of Tetris, reserving blocks of interconnected hardware for irregular demand peaks without leaving smaller fragments of the cluster idle.
Fragmentation and The Collateralization of Compute
The problem compounds when you factor in hardware diversity. A customer's codebase optimized for Nvidia's CUDA will not seamlessly run on AMD's ROCm or Google's TPUs. If a Neocloud stocks various hardware types to capture a broader market, they risk catastrophic underutilization of specific clusters.
Despite these risks, the sheer demand for compute has created a unique financial phenomenon: asset-backed loans using graphics cards as collateral. Historically, high-end GPUs have retained their aftermarket value so well that banks are willing to finance companies like XAI and emerging Neoclouds based entirely on the residual value of the chips themselves.
As long as the hardware holds its value, the downside of the Neocloud business remains surprisingly limited.