Home| Features| About| Customer Support| Request Demo| Our Analysts| Login
Gallery inside!
Technology

OpenAI’s Expensive Supercomputer Is Built From Tens Of Thousands Of Chips

March 13, 2023
minute read

Microsoft Corp. agreed to construct a sizable, state-of-the-art supercomputer for the study of artificial intelligence startup OpenAI as part of its $1 billion investment in OpenAI in 2019. The main issue was that Microsoft lacked what OpenAI required and wasn't confident it could construct something that substantial in its Azure cloud service without it malfunctioning.

OpenAI was attempting to train a growing number of artificial intelligence (AI) models, or the variables that the AI system has discovered through training and retraining, which were swallowing larger volumes of data. As a result, OpenAI required continuous access to robust cloud computing services. 

Microsoft had to adjust how it arranges computers on racks to avoid power outages in addition to figuring out how to connect tens of thousands of Nvidia Corp. A100 graphics chips, the mainstay for training AI models. The project's cost was not disclosed by Scott Guthrie, the executive vice president of Microsoft responsible for cloud and AI, but he did say that "it's certainly larger" than several hundred million dollars.

"We developed a system design that was capable of reliable operation at a very large scale. That's what made ChatGPT possible, according to Nidhi Chappell, general manager of Azure AI infrastructure at Microsoft.

One model that resulted from it is that. There will be a huge number more.

With the aid of this technology, OpenAI was able to publish ChatGPT, a popular chatbot that gained more than 1 million users shortly after going public in November and is now being incorporated into the business models of other organizations, including those headed by hedge fund magnate and founder Ken Griffin and food delivery service Instacart Inc. Further pressure will be placed on cloud service providers like Microsoft, Amazon.com Inc., and Alphabet Inc.'s Google to guarantee their data centers can deliver the vast computational capacity required as generative AI tools like ChatGPT gain popularity from businesses and consumers.

Microsoft now trains and runs its own substantial artificial intelligence models, such as the new Bing search bot unveiled last month, using the same set of resources it constructed for OpenAI. Additionally, it markets the system to other clients. As part of an enlarged agreement with OpenAI, in which Microsoft increased its investment by $10 billion, the software behemoth is already hard at work on the next-generation AI supercomputer.

The enhancements may be used by anybody who wants to train a languages model, according to Guthrie. "We didn't construct them a custom thing; it started off as a bespoke thing, but we always created it in a way to generalize it," he said in an interview. "That's definitely improved us as a cloud for,"

A huge cluster of interconnected graphics processors in one location, such to the AI powerhouse Microsoft built, is necessary for training a sizable AI model. Answering all user requests, or inference, calls for a slightly different configuration once a model is in use. Microsoft also uses graphics chips for inference, but the company's more than 60 region of data centers house these processors, hundreds of thousands of them. Microsoft announced in a blog post on Monday that the business is now adding the most recent Nvidia graphics hardware for AI workloads, the H100, as well as the most recent iteration of Nvidia's Infiniband network technologies to share information even more quickly. 

Microsoft is steadily adding more individuals from a waitlist to the new Bing while it is still in preview. A daily meeting is held by Guthrie's team with the roughly twenty workers they've dubbed the "pit crew" after the team of mechanics who tune race vehicles in the middle of the race. The group's task is to identify speedy ways to bring more computer power online and to address any issues that may arise.

'Hey, anybody have a good idea, let's put it on the agenda today, let's debate it and let's find out OK, can we save a few minutes here? Can we cut the time down a bit? Many days? Guthrie said, '. 

A public cloud depends on hundreds of distinct components, including the individual computers, pipes, cement for the buildings, and various metals and minerals, and even the smallest delay or shortage can cause major problems. The basket-like devices that hold the wires flowing off the machines, known as cable trays, have recently been in limited supply, which has caused problems for the pit crew. Thus, they created a new cables that Microsoft could produce or purchase. In order to avoid having to wait for new structures, they have also focused on ways to cram as many servers as they can into the existing data centers throughout the globe, according to Guthrie. 

A huge AI model gets trained all at once when OpenAI or Microsoft is working on it. It is divided among all of the GPUs, and occasionally the units must communicate with one another to share the job they have completed. Microsoft had to ensure that the networking hardware that manages communications among all the chips could support that load for the AI supercomputer, and it had to create software that makes the best use of the GPUs and the network interface. To train models with thousands of billions of parameters, the business has now developed software.

Microsoft had to consider where the machines were placed and where the power supplies were because every machine starts up simultaneously. Guthrie remarked that if you put on a microwave, toaster, and vacuum at the same moment in the kitchen, you will wind up with the data center equivalent of that situation.

According to Alistair Speirs, director of Azure global infrastructure, the corporation had to make sure it could cool off all of those servers and processors. In temperate climes, evaporation is used, while high-tech swamp coolers are used in hotter ones.

In order to extract any performance improvements, efficiency benefits, and cost reductions it can, Microsoft will continue working on specialized server and chip architectures as well as ways to improve its supply chain.

The computer we started creating a few years ago is the foundation of the model that is currently stunning the world. The new computer we're learning today, which is much larger and will permit even more sophistication, will be used to build the new models, he added.

Tags:
Author
Cathy Hills
Associate Editor
Eric Ng
Contributor
John Liu
Contributor
Editorial Board
Contributor
Bryan Curtis
Contributor
Adan Harris
Managing Editor
Cathy Hills
Associate Editor

Subscribe to our newsletter!

As a leading independent research provider, TradeAlgo keeps you connected from anywhere.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Explore
Related posts.