The AI Factory is both the foundation of AI innovation and a critical bottleneck, enabling rapid progress while exposing the core limitations of existing infrastructure.
But what exactly is the AI Factory, and what does it encompass?
In this series of briefs, we’ll explore the AI Factory, the next-generation data center designed to support power-hungry GPU clusters in extreme high-density racks. We’ll start with an overview, then dive into power, advanced cooling, site selection, and permitting in future Data Center posts.
Defining the AI Factory
An AI Factory is a data center explicitly optimized to house dense GPU clusters and their required network fabric.
A typical high-density setup features a single rack with multiple U-servers, each containing eight or more GPUs. With each modern GPU (such as the Nvidia H100 or H200) drawing up to 700 watts of power, these racks require a base of 30–45 kW. However, the latest rack-scale systems, like those based on the Nvidia GB200/GB300, can demand between 100 kW and 120 kW per rack, a massive increase over traditional server racks (10–15 kW).
Building an AI Factory from scratch, or retrofitting an existing data center to support these clusters, comes with severe constraints. Grid capacity and interconnection timelines are the major limitations, but others include procuring custom generators, highly specialized UPS systems, and liquid cooling solutions designed to handle the colossal heat load, sometimes exceeding 200 MW in a single data center campus.
The Foundation: Power and Cooling Systems
Before we dive deeper, here is a high-level illustration of the key systems within a data center.

Data Center Power Chain Flow
| Component Group | Power Flow | Role in the Data Center |
| Utility Entrance | Utility –> Transformer/Meter –> Switchgear (Substation) | Receives, steps down, and manages the primary power feed. |
| Emergency Transfer | ATS connects to UPS and the Generator | The Automatic Transfer Switch (ATS) monitors the utility and automatically switches the load from the utility source to the generator in case of a power outage. |
| Backup Sources | ATS connects to UPS and Generator | The Generator is the long-term backup power. The UPS (Uninterruptible Power Supply) provides the instantaneous bridge power until the generator starts. |
| Instantaneous Power & Conditioning | UPS –> PDU/RPP | The UPS converts DC battery power to AC, conditions the power, and supplies the critical load instantaneously. |
| Bulk Distribution | UPS –> PDU/RPP | The Power Distribution Unit (PDU) or Remote Power Panel (RPP) takes conditioned bulk power, steps down the voltage, and distributes it to the rows of racks. |
| Rack Level Distribution | PDU/RPP –> rPDU/Rack PDU | The Rack PDU (rPDU) takes the power feed and provides the final monitored outlets (C13/C19) to which the individual servers plug in. |
The Redundant Power Chain
The data center power chain is a highly redundant, multi-stage system designed to ensure continuous, clean power reaches the mission-critical IT equipment. It begins at the Utility Entrance, where high-voltage power is received, stepped down by a Transformer, and managed by the main Switchgear at the facility’s substation.
This power then flows to the Automatic Transfer Switch (ATS), which constantly monitors the utility and serves as the emergency failover mechanism. If utility power fails, the ATS instantly commands the Generator to start while switching the load to the Uninterruptible Power Supply (UPS). The UPS provides crucial, instantaneous battery power to bridge the gap until the generator is stable and assumes the long-term load.
This conditioned power is then delivered to the Power Distribution Units (PDUs) or Remote Power Panels (RPPs), which handle bulk distribution and voltage step-down across the data hall. Finally, the power is delivered to the individual Rack PDUs (rPDUs), which sit inside the server cabinets, providing the final outlets to power the compute servers. This sequential process guarantees the servers receive reliable, high-quality power 24/7.
Cooling: The New Bottleneck
The extreme density of AI hardware makes traditional air cooling obsolete. High-density racks require liquid cooling, typically implemented through:
- Direct-to-Chip (DTC) Liquid Cooling: Coolant is pumped directly to cold plates mounted on the GPUs. This requires in-rack Coolant Distribution Units (CDUs) and large external Chillers and Cooling Towers to reject the heat.
- Immersion Cooling: Entire servers are submerged in a non-conductive dielectric fluid.
The cooling infrastructure for an AI Factory must manage a far greater thermal load than its legacy counterpart, driving a fundamental shift in data hall design and the overall cost structure.
Navigating the AI Factory Bottlenecks
Procuring a site, securing permits, planning the layout, and designing the power and cooling infrastructure all require major assumptions. Most importantly, the entire build must be future-proof for the next decade, with a clear path to scale from hundreds of megawatts (MW) to multiple gigawatts (GW).
Once a site is selected, the project must obtain permits and approvals from both the city and the surrounding community. Local resistance has become common, and data center projects increasingly face delays over noise concerns, massive water usage, traffic impact, and the optics of non-local operators consuming large amounts of regional power. After approvals are secured, developers can begin ordering equipment and tailoring designs for the target power envelope.
Long-lead equipment is often the biggest bottleneck, especially for systems engineered to support hundreds of megawatts. Supply chain constraints, limited global manufacturing capacity, and surging demand for heavy electrical infrastructure all contribute to extended timelines.
| System Component | Approximate Lead Time | Key Drivers for Long Lead Times |
| Large Power Transformer (Utility Interconnect) | 52 – 130+ Weeks (1 to 2.5+ Years) | Custom engineering, copper shortages, size/weight/shipping complexity, manufacturing backlog. This is often the longest lead item. |
| Generators (1 MW+) | 72 – 104+ Weeks (1.4 to 2 Years) | High testing/compliance burden, engine manufacturing capacity, and massive demand for large-scale backup power. |
| Low/Medium Voltage Switchgear | 45 – 80 Weeks (Up to 1.5+ Years) | Custom configuration, global electrical component shortages (circuit breakers, relays), and manufacturing backlogs. |
| UPS Systems (Large Capacity) | 30 – 40 Weeks (7 to 10 Months) | Varies significantly by capacity and technology (rotary vs. static), component availability, and complex integration requirements. |
| Automatic Transfer Switch (ATS) | 45 – 80 Weeks (Up to 1.5+ Years) | Similar to switchgear, dependent on component availability and size/rating for a multi-megawatt system. |
Strategic Takeaways
- Transformers Drive the Critical Path: The utility-grade transformer that connects the site to the grid often carries a lead time of more than two years (in some cases, up to four years for large units). Developers try to place transformer and medium-voltage switchgear orders as early as possible, sometimes during permitting or even conceptual design.
- Early Utility Engagement is Essential: The grid interconnection process, including feasibility studies and multi-party agreements, can take three to seven years in constrained regions. This administrative process often dictates the overall construction schedule more than procurement or physical build-out.
- Procurement Strategy Matters: Large AI factory developers increasingly rely on framework agreements, locked manufacturing slots, prefabricated power rooms, and power skids to reduce risk. Many also diversify suppliers or shift to modular construction to avoid bottlenecks caused by single-vendor dependence.
Looking Ahead
The AI Factory is emerging as the new industrial plant of the twenty-first century. Unlike traditional facilities, it scales with compute demand that grows in exponential steps. Every advancement in model size or agentic capability increases the need for high-density racks, advanced liquid cooling, and multi-hundred-megawatt campuses. The bottlenecks are no longer algorithms or silicon, but permitting, power, and the slow pace of heavy-infrastructure manufacturing.
Future sections will examine advanced cooling strategies, high-voltage distribution architectures, critical grid challenges, modular construction approaches, and the economics behind the next generation of AI factories.
