Texas at heart of Amazon's AI push in United States
Tech titan Amazon is working to step out of Nvidia's shadow with custom "Trainium" chips designed specially for machine learning as billions of dollars are poured into artificial intelligence (AI).
Amazon subsidiary Annapurna Labs in Austin, Texas, was testing the longevity of its latest generation Trainium during a recent visit by AFP to the facility.
Texas is emerging as a US tech world El Dorado, luring investments with cheap energy, relaxed regulations, tax incentives and reasonably affordable real estate for massive data centers.
Amidst a deafening roar, UltraServers packed with 144 of the Trainium AI-accelerator chips were being put through their paces at Annapurna in a routine check prior to delivery.
After years of relying on suppliers for chips, the e-commerce powerhouse's Amazon Web Services (AWS) cloud computing unit began designing its own, acquiring Israeli startup Annapurna Labs in 2015.
First came Graviton and Inferentia chips in 2018, the former for general cloud computing and the latter for powering AI models.
The first Trainium debuted in 2020, followed by a second generation that touted a big boost in performance.
Trainium 3 chips put into action in December are touted as doubling the capabilities of the second generation despite being smaller than a credit card.
Kristopher King, head of the Annapurna lab in Austin, contended that the latest Trainium chips can cut the cost of developing and running generative AI models by as much as 40 percent compared to using graphics processing units (GPUs) that are now deemed the "gold standard" for AI.
- Failure not an option -
Along with pricing Trainium chips competitively, AWS is out to make reliability a selling point since data centers need to operate non-stop for long stretches at a time.
AI development requires hundreds of thousands of chips operating simultaneously for weeks, according to Annapurna head of engineering Mark Carroll.
"If there's a failure or unavailability during this phase you have to go back, or even start from scratch," Carroll said.
Unlike other major players in AI processors, AWS doesn't sell its chips.
Instead, AWS uses Trainium exclusively in its own data centers, leasing computing capabilities to customers.
AWS opted to customize its chips to harmonize them with its software, particularly a Bedrock platform that lets customers chose from a wide range of competing AI models including Anthropic, OpenAI and other rivals, according to the lab.
Trainium is positioned as a cost-saving option in an AI market considered "supply constrained" because of insatiable appetite for high-performance GPUs from industry leader Nvidia and competitors such as AMD.
Even though Trainium 3 is only a few months old, Annapurna is already designing a new generation of the chip.
A launch date for Trainium 4 has yet to be disclosed, but Carroll says it will have six times the processing performance of its predecessor.
As Google, Microsoft, OpenAI, Meta and other tech rivals race to field ever-improved AI models, pressure is intense for chips to make the technology smarter, faster, cheaper and less power-hungry.
Nvidia began manufacturing its industry-leading Rubin grapics processing unit less than a year after the release of then top-of-the-line Blackwell.
The first version of Trainium took about 18 months to create, while the second generation was readied in nine months and Annapurna is "trying to maintain that pace", Carroll said.
G.Abbenevoli--RTC