Hardware and chip makers scramble to meet AI demands
Every industry today is exploring artificial intelligence (AI).
The enterprises that are forging ahead with the technology want to get models into production — now. They want to rapidly and efficiently train AI. And they want results ASAP.
But that all requires a tremendous amount of energy. Hardware serves as the foundation of every AI application, and it has its physical limitations.
With demand for improved memory and enhanced processors dramatically increasing as AI development accelerates at a breakneck pace, AI hardware and memory chip providers are scrambling to keep pace.
Demand for data and its quick transport is “just growing in an unrelenting fashion right now,” said Steve Woo, fellow and distinguished inventor with memory interface chip company Rambus. “The amount of data and the value of that data is growing. People care about both securing it and moving it very, very quickly from place to place.”
We need more memory, and we need it yesterday
He added drolly, “This is in stark contrast to human intelligence, which is carbon-based and operates without the need for any transistors.”
The growing challenge is providing that infrastructure with the memory and speed demanded by increasingly complex AI and machine learning (ML) workloads.
“AI/ML involves substantial computational power, data storage and other related tasks,” Fox said. “Furthermore, consumer demand for products leveraging AI technology is growing at an exponential rate.”
Indeed, the global AI market is forecasted to grow to an astounding $390.9 billion by 2025, representing a CAGR of 55.6%. In tandem, the generative AI market is set to hit $51.8 billion by 2028, representing a CAGR of nearly 36% from 2023 ($11.3 billion).
In response to this, companies like Nvidia (whose valuation now sits at $1 trillion and whose CEO Jensen Huang has been the headliner at nearly every large tech conference this year) and other AI hardware and semiconductor providers, including Intel, Broadcom and AMD, are clamoring for higher and higher speed memory.
As Woo put it, “it’s just this unrelenting demand for higher performance.”
Move fast to AI, don’t break things
In AI, a context window refers to the range of tokens a model can consider when generating responses to prompts, Fox explained. This can be likened to short-term memory in the human brain.
Current AI models’ context windows range from 4,000 tokens (approximately 800–1,000 words) to 32,000 tokens (roughly 7,500–8,000 words), and even as high as 100,000 tokens (or 20,000–25,000 words).
Consumers are increasingly demanding even larger context windows, which may necessitate retraining some models, Fox said. This is because context window size is determined during pretraining.
Furthermore, consumers are seeking faster inferencing speeds — or the time it takes for an AI model to respond to a user’s query or prompt. Techniques designed to accelerate inferencing times, such as distillation (transferring knowledge from a large model to a smaller one), pruning (trimming parameters) and quantization (model size reduction), will invariably require more computational resources, he said.
Lastly — and perhaps most significantly — is the matter of scale, Fox said. These technologies are expanding horizontally (that is, more generalized AI products) and vertically (more industry or company-specific applications or services).
Woo agreed that models just keep getting bigger, with some having parameters in the billions and trillions. And in training them, it’s all about throughput.
“The more examples I can push through, the faster I can train a model,” said Woo. “The faster you can deploy a model, the faster you can get to revenue.”
He pointed out that until four or five years ago, training involved a lot of human intervention around labeling, understanding and classifying data. Now with generative AI, neural networks are training neural networks.
“With that, you’re really limited by the speed of the hardware,” said Woo.
Challenges with memory
The primary job of a memory device is to store data, Woo explained. With models getting larger and larger, the memory industry is being asked to produce devices that can store more bits of information.
“But they can only go so big,” he said.
More data to be stored means more capacitors to store it. Capacitors must then be made smaller to fit on fixed-size chips. This means less storage of electrons that provide charge.
Also, once data is on the chip, it needs to be able to quickly move on and off. Another challenge is managing power and signal integrity (or reliability) of data transmission, said Woo. Technologies must be able to signal faster and rapidly communicate between memory and processors.
“The challenge is how do you find this balance between being able to store your data reliably, but be able to signal very, very quickly as well?” said Woo.
Rambus is working with memory manufacturers to understand what they’re capable of, he said, then design circuits that can go on the processor side “that can absorb this data much more quickly and push the data back out to the memory much more quickly.”
Recovering precious power
Along with this, the amount of energy it takes to move data from a memory chip to a processor is becoming extreme, Woo noted. In the highest-performance memories, about two-thirds of power is used to simply move data between chips. The other one-third is used to get data in and out of capacitors.
This is because connections are long (10 to 15 millimeters), he said. To tackle this problem and recover some of that power, memory providers are exploring the idea of stacking chips on top of each other rather than positioning them side by side. That can shorten connections to 100 or 200 microns.
“So you’re talking about a couple of orders of magnitude shorter connections, which of course makes the power so much better,” said Woo.
There’s much work to be done in the area and it’s not going to happen tomorrow, he emphasized, “but it’s one of the paths of evolution that people are looking at in the memory world.”
He also pointed out that companies like Samsung and SK Hynix are experimenting with placing processing directly into dynamic random-access memory (DRAM). The thinking is that this will provide the shortest connections because “you’re literally right next to where the data’s stored.”
Fox agreed that the development of more energy-efficient chips not only improves performance but helps mitigate significant energy consumption required to train and run ML models.
“Manufacturers are exploring new types of chips with entirely different architectures that are better suited for training and executing neural networks,” he said.
More of everything – more data, more AI
Looking ahead, demand will simply continue to grow in all areas, Woo said. “We’ll need more and it’ll be more of everything,” he said. “It’ll be more capacity, more bandwidth, more power efficiency.”
Chip providers will continue to explore traditional paths to shrink technologies, he said, as well as nontraditional paths including processing in memory and stacking.
Similarly, IEEE researchers identify several areas of opportunity for chip manufacturers, including the following:
- Nonvolatile memory (which can retain stored information even if power is removed)
- Workload-specific AI accelerators
- High-speed interconnected hardware
- High-bandwidth memory
- On-chip memory
- Networking chips
Importantly, “investing in research and development while building relationships with AI software providers will help chip manufacturers capture their share of these markets — if they can meet the coming demand,” IEEE researchers posit.
Woo agreed there are many opportunities and challenges ahead, and that there are both benefits and trade-offs in different systems and techniques.
Ultimately, he said, “what will eventually win in the market will be based on what the end consumers [of AI] really want.”