GridScale debuts QuantumCache: hybrid memory for AI models and scores $160M in Series C

Tech · 6 min read

QuantumCache orchestrates memory across GPU HBM, pooled DDR, and fast NVMe to serve activation and parameter shards with minimized stalls. The system includes scheduler heuristics that adapt to request patterns and model topology, reducing effective latency and lowering hardware waste for large-model inference.

GridScale’s $160 million Series C, led by Summit Ridge with participation from hyperscale infrastructure investors, will be used to deploy hardware-reference designs, partner with cloud providers, and extend the scheduler to support heterogeneous accelerator fleets.

Benchmarks presented by GridScale showed notable latency improvements for batching-sensitive tasks and cost reductions in scenarios where full in-memory model hosting was prohibitive. Enterprise customers with mixed workloads cited improved utilization and lower inference TCO.