Cool Startup: Moore Threads Powers China’s Domestic GPU Ambitions

As U.S. export controls tighten around high-end silicon, China’s search for a domestic equivalent to Nvidia has shifted from a strategic goal to an existential necessity. While several contenders have emerged, Moore Threads has recently claimed the spotlight not just through engineering, but through one of the most explosive public debuts in the history of the Shanghai Stock Exchange.

The company’s thesis is that the future of computing requires a “Universal GPU,” a single architecture capable of handling the disparate demands of AI training, 3D rendering, and large-scale scientific simulation, independent of Western supply chains.

Background

  • Company: Moore Threads (摩尔线程)
  • Founded: 2020
  • HQ: Beijing, China
  • Employees: ~1,000
  • Funding/Valuation: Raised ~$1.1B in its Dec 2025 IPO; Market Cap ~$7.5B+ (post-IPO surge)
  • Product: Universal GPUs, MUSA software platform, and KUAE AI clusters

Moore Threads was founded by James Zhang, the former Global Vice President and China GM of Nvidia. Leveraging a team of veteran engineers from Nvidia, Microsoft, and Intel, the company moved at a breakneck pace, launching its first production-ready GPU architecture within 18 months of its founding. In late 2025, Moore Threads went public on Shanghai’s STAR Market (Ticker: 688795), where its shares surged over 400% on the first day, fueled by a frenzy for domestic AI hardware.

What Moore Threads Does

Moore Threads builds the hardware and software layers necessary to replace the Nvidia ecosystem in China. Their core innovation is MUSA (Moore Threads Unified System Architecture), a unified programming model designed to be the Chinese answer to Nvidia’s CUDA.

The Product Stack:

  • MTT S-Series (Gaming & Desktop): Consumer-grade GPUs like the MTT S80, which was the first domestic Chinese card to support DirectX 11.
  • MTT S4000 (AI Inference & Training): High-end accelerators for data centers, featuring 25 TFLOPS of FP32 performance and dedicated hardware for Large Language Models (LLMs).
  • KUAE Clusters: Large-scale “computing farms” that link tens of thousands of Moore Threads GPUs into a single fabric for training frontier AI models.
  • MUSIFY: A critical software tool that allows developers to port existing CUDA code to the MUSA platform with minimal manual rewriting, lowering the barrier for teams to switch away from Nvidia.

Why It Matters: The Threat to Nvidia in China

Nvidia historically controlled over 90% of the Chinese AI chip market. However, a “perfect storm” of factors has positioned Moore Threads as a legitimate threat to that dominance within the region:

  1. The Regulatory Vacuum: US bans on the H100 and B200 chips have left Chinese tech giants (Alibaba, Tencent, ByteDance) with billions in unmet demand. Moore Threads is filling this void by providing hardware that, while currently a generation behind in raw power, is “good enough” and, crucially, available.
  2. Architectural Parity (Huagang Architecture): In late 2025, Moore Threads unveiled its “Huagang” architecture. The company claims its next-gen AI chip, Huashan, will offer compute density and memory bandwidth approaching Nvidia’s Blackwell (B200) levels, supporting clusters of up to 100,000 interconnected chips.
  3. The “Sovereign AI” Mandate: The Chinese government has increasingly mandated that local enterprises move toward 50% domestic chip utilization. Moore Threads’ status as a “national champion” gives it preferential access to state-backed data center projects.

Spec Comparison: The Domestic Challenger vs. The Global Standard

To help visualize where the newly announced Huashan GPU sits in the competitive landscape, here’s a comparison against Nvidia’s high-end Hopper and Blackwell architectures. While these are forward-looking specs for Moore Threads, they underscore their ambitious targets.

Feature Moore Threads Huashan (2025) Nvidia H200 (Hopper) Nvidia B200 (Blackwell)
Architecture Huagang (Flower Harbor) Hopper Blackwell
Chip Design Dual-Chiplet Monolithic Multi-Die (Dual-Chip)
Memory Capacity Up to 256GB+ HBM 141GB HBM3e 192GB HBM3e
Memory Bandwidth ~8.0 TB/s (Claimed) 4.8 TB/s 8.0 TB/s
Peak FP4 Compute ~18-20 PFLOPS N/A 20 PFLOPS
Peak FP16/BF16 ~1.0 PFLOPS 0.99 PFLOPS 2.25 PFLOPS
Interconnect MTLink (1.3 TB/s) NVLink 4 (900 GB/s) NVLink 5 (1.8 TB/s)
Max Cluster Size 100,000 GPUs 32,768+ GPUs 100,000+ GPUs
Software Stack MUSA 5.0 (Triton/TileLang) CUDA 12.x CUDA 12.x + Blackwell Libs

Key Takeaways from the Comparison:

Moore Threads is directly targeting several critical performance vectors. The Huashan GPU’s projected memory capacity (up to 256GB+ HBM) and memory bandwidth aim to address the growing needs of increasingly large language models. Furthermore, it’s claimed FP4 compute performance places it squarely against Nvidia’s cutting-edge Blackwell architecture, indicating a strategic focus on next-generation quantized AI workloads. The MTLink interconnect and support for massive cluster sizes demonstrate their ambition to build an entire, scalable AI infrastructure, not just standalone chips.

Challenges and Risks

  • The Manufacturing Bottleneck: Under US sanctions, Moore Threads is restricted from using advanced foundries like TSMC. They rely on domestic foundries like SMIC, which face lower yields (reportedly ~20% for 7nm) and higher costs.
  • Ecosystem Inertia: While MUSIFY helps, CUDA’s decade-long lead in libraries and developer familiarity remains a massive “moat” that is difficult to cross.
  • Competitive Crowding: Moore Threads is not alone. Domestic rivals like Biren Technology and MetaX are also racing for the same market share, leading to a war for talent and capital in the Beijing/Shanghai corridors.

Final Thoughts

Moore Threads is no longer just a startup; it is the cornerstone of China’s attempt to decouple from the global semiconductor status quo. By building a full-stack “compiled cloud” and hardware ecosystem, they are betting that the future of Chinese AI will not be written in CUDA, but in MUSA. If their next-generation Huashan chips deliver on their performance promises, the “threat to Nvidia” will shift from a theoretical possibility to a commercial reality across the world’s second-largest economy.

Scroll to Top