Boss Digital

Elon Musk’s xAI built a 100,000-GPU supercluster in just 19 days – normally takes years


Crazy: Few would argue that Elon Musk is driven. Despite his various detractors, the entrepreneur has built Tesla and SpaceX into major competitors, if not leaders, in their respective industries. This success comes amid various side endeavors like Neuralink and Twitter/X transition. Now, his xAI team has gotten an AI supercluster up and running in just a few weeks.

Elon Musk and his xAI team have seemingly done the impossible. The company built a supercluster of 100,000 Nvidia H200 Blackwell GPUs in only 19 days. Nvidia CEO Jensen Huang called the feat “superhuman.” Huang shared the incredible story in an interview with the Tesla Owners Silicon Valley group on X.

According to Huang, constructing a supercomputer of this size would take most crews around four years – three years in planning and one year on shipping, installation, and operational setup. However, in less than three weeks, Musk and his team managed the entire process – from concept to full functionality. The xAI supercluster even completed its first AI training run shortly after the cluster was powered up.

Huang was almost at a loss for words, struggling to build a head of steam before describing it.

“First of all, [stammers] some [stammers] 19 days is incredible … Do you know how many days 19 days is? It’s just a couple of weeks. And the mountain of technology, if you were ever to see it, is unbelievable … What they achieved is singular. Never been done before. A supercomputer [of comparable size] that you would build, would take, normally, three years to plan – and then they deliver the equipment, and it takes one year to get it all working.”

Huang conveyed his respect for Musk’s engineering expertise, noting the challenges of integrating Nvidia’s cutting-edge hardware.

“The number of wires that goes into one node … the back of a computer is all wires,” Huang remarked, noting that networking Nvidia equipment requires a different level of complexity than traditional hyper-scale data centers.

The project required installing the GPUs and building and getting the permits for an entirely new “X factory,” equipped with advanced cooling systems and power infrastructure to ensure the cluster’s seamless operation of the 200,000 GPUs. The coordination between Musk’s engineers and Nvidia’s team was another monumental feat, ensuring that hardware and infrastructure were delivered, installed, and synchronized flawlessly.

“This level of integration has never been done before, and it may not be done again anytime soon,” Huang remarked.

The supercluster represents a massive leap in AI infrastructure, positioning xAI as a significant competitor in AI research and development. With the computational power now available to it, Musk’s teams could significantly accelerate projects involving advanced neural networks, deep learning, and natural language processing.





Source link

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top