Furiosa AI Unveils New GPU Server for Inference

In a world still largely governed by NVIDIA’s GPU dominance, Furiosa AI is pushing something different: a purpose-built inference appliance designed for data centers, not massive power budgets. Their newly announced NXT RNGD Server is positioning itself as a more efficient alternative for enterprises that want local inference without breaking their cooling or power constraints.

Background

  • Company: Furiosa AI
  • Founded: 2017
  • Founders: Baek Jun-ho, Jeehoon Kang, and Kim Han-joon
  • Funding: $246M
  • # of Employees: 142 (LinkedIn)
  • Product: GPUs for inference

What’s the Big Deal?

Furiosa’s NXT RNGD Server is built around eight RNGD cards, delivering about 4 petaFLOPS of FP8 compute (or equivalently 4 petaTOPS INT8). Alongside that raw power, the system emphasizes energy efficiency: it consumes ~3 kW, which is a big contrast to high-end GPU systems that often demand 10 kW+ per node.

That efficiency means you can pack more inference throughput into existing racks without needing to revamp your entire data center’s power or cooling setup. For example, Furiosa claims you can place five RNGD servers in a 15 kW rack, compared to only one GPU server of comparable scale in many cases.

They support a rich memory configuration: 384 GB of HBM3 memory at 12 TB/s bandwidth across the system. At the heart of the hardware is their Tensor Contraction Processor (TCP) architecture, which is intended to more directly support the tensor operations typical of deep learning workloads (rather than relying solely on traditional matrix multiply primitives).

The RNGD cards themselves are built on TSMC’s 5 nm process, with each card offering 48 GB of HBM3 and a modest 180 W TDP.

Real-World Use & Partners

Furiosa is making progress. They’ve already announced that LG AI Research is working with RNGD for inference on their EXAONE models. According to LG, RNGD delivers 2.25× better inference performance per watt compared to GPU setups, while meeting latency and throughput needs.

Interestingly, Furiosa also partnered with OpenAI in a demo: they ran gpt-oss-120B (open-weight) across two RNGD cards using MXFP4 precision.

Software & Ecosystem

Hardware alone doesn’t win: Furiosa backs RNGD with a full software stack: compiler, runtime, profiling, serving, etc. They aim for drop-in compatibility with some existing tools: their SDK supports OpenAI-style APIs, integration with frameworks (PyTorch, etc.), quantization (BF16, FP8, INT8, INT4, MXFP4), multi-chip scaling, tensor parallelism, and runtime optimization. They also emphasize secure boot, model encryption, containerization, Kubernetes support, and virtualization-friendly features (SR-IOV).

Strengths & Challenges

What they get right:

  • Efficiency: 3 kW vs 10 kW+ is a meaningful claim in real-world data centers, making deployment more practical.
  • Density: More inference capacity per rack means lower infrastructure cost per throughput.
  • Local inference / sovereignty: For enterprises that can’t or won’t send all inference to the cloud, having a capable on-prem server helps.
  • Architectural divergence: By designing a tensor-contraction native architecture rather than relying on GPUs’ conventional matmul-dominated designs, they may expose optimizations that are harder in legacy hardware.

Open Challenges:

  • Ecosystem adoption: Can they attract enough software and framework support to make developers comfortable?
  • Comparative latency & real-world edge cases: Efficiency is great, but latency, worst-case behavior, and tail performance (e.g. long context windows) will matter heavily.
  • Model compatibility / migration: Convincing users to port or optimize models (quantization, custom kernels) is always a barrier.
  • Maturity & reliability: As a newer entrant, they’ll need to prove robustness, tooling stability, debugging, and long-term support.

Why It Matters

We’re in a period where inference is becoming as critical as training. Many enterprises are realizing that sending everything to large cloud APIs is either too expensive, too slow, or too risky (data privacy, sovereignty, SLAs). Devices like RNGD push the possibility envelope: what if you could get high-end inference inside an existing data center, without doubling power budgets?

Furiosa AI is staking a claim: that the next wave of AI infrastructure won’t just be about bigger models and faster chips, but about efficient, deployable inference at scale. RNGD is their pitch at that frontier.

Scroll to Top