Alibaba Group Holding Limited has made headlines following a significant technological breakthrough in AI infrastructure, leading to a 1.19% increase in its stock price, closing at $167.05. The company’s latest computing pooling solution, known as Aegaeon, has slashed the usage of Nvidia GPUs by an astonishing 82%, positioning Alibaba Cloud at the forefront of optimized AI deployment at scale.
Aegaeon, Alibaba Cloud’s innovative system, permits a single Nvidia H20 GPU to serve up to seven large language models (LLMs) simultaneously. This remarkable advancement has reduced GPU usage from 1,192 units to just 213 during internal testing, showcasing a significant enhancement in operational efficiency.
The technology achieves this by implementing token-level auto-scaling during model inference across concurrent AI workloads. This dynamic resource reallocation allows a GPU to transition between models in real-time, resulting in a remarkable 97% reduction in model-switching latency. The solution was rigorously beta-tested for over three months in Alibaba Cloud’s Bailian marketplace, successfully managing dozens of models with up to 72 billion parameters without any service degradation.
Alibaba Cloud’s research has shown that only a fraction of AI models are frequently utilized in real-world scenarios. In fact, data indicated that 17.7% of GPUs were allocated to models that accounted for a mere 1.35% of total inference requests. With Aegaeon, this inefficiency is being addressed through strategic pooling and scaling, ensuring consistent GPU utilization and preventing idle processing associated with underused models. This innovation not only enhances throughput but also improves overall hardware efficiency for enterprise applications.
Collaborating with researchers from Peking University, Alibaba Cloud co-authored a technical paper outlining these breakthroughs, which was presented at the SOSP 2025 conference in South Korea. The findings emphasized that traditional methods of serving concurrent workloads with GPUs incur excessive costs, further supporting China’s objectives for modernizing its AI infrastructure under resource constraints.
In the context of Nvidia’s current landscape, the H20 GPU was specifically designed for AI inference within China, adhering to U.S. export restrictions. However, recent scrutiny by Chinese regulators regarding potential security vulnerabilities in the chip has impacted its market position. As a result, Chinese companies such as Huawei and Cambricon are intensifying efforts to develop domestic GPU alternatives, aiming to reduce reliance on foreign technology. In fact, Nvidia’s CEO acknowledged that the company’s market share for advanced AI chips in China has plummeted to zero.
Alibaba’s strategic advancements with Aegaeon not only bolster its market position but also align with national strategies promoting technological self-sufficiency. By minimizing dependence on U.S. chips, Alibaba is solidifying its foothold within China’s rapidly evolving AI ecosystem. The notable rise in Alibaba’s stock reflects heightened investor confidence in the company’s technology-driven approach to achieving cost savings and scalability.
