New ClearML Report Reveals Cost and Governance Concerns Dominate as Nearly Half of Enterprises Waste Millions on Underutilized GPU Capacity
ClearML's State of AI Infrastructure Global Survey: Wasted GPU Capacity Costs Millions as 35% of Enterprises Struggle with Utilization and 44% Prioritize Infrastructure Flexibility as Cost Control and Future-Proofing Top Infrastructure 2025-2026 Priorities
SAN FRANCISCO, CA / ACCESS Newswire / December 16, 2025 / ClearML, the leading solution for GPU management and unleashing AI in the enterprise, today released its annual State of AI Infrastructure at Scale 2025-2026 report, revealing that enterprises are wasting millions in GPU capacity even as cost control and governance dominate infrastructure strategy. The comprehensive survey of large enterprises and Fortune 1000 IT leaders and AI infrastructure decision-makers found that 35% rank increasing GPU and compute utilization as their top priority for the next 12-18 months, yet 44% of survey respondents admit to manually assigning workloads to GPUs or having no specific strategy for managing GPU utilization. The result is costly capacity going to waste while AI builders wait for access, slowing innovation and delaying business impact.
Pressures are converging from multiple directions. Cost control is the top workload management challenge cited by 53% of survey respondents and stands as the leading infrastructure planning priority for 2025-2026 (70%). Nearly one-third of organizations also list stronger governance controls across data, models, and compute as their top operational priority. Yet enterprises face a strategic dilemma that threatens their AI investments.
An overwhelming 44% rate flexibility and avoiding vendor lock-in as "very important" when selecting infrastructure solutions, with 63% reporting that proprietary dependencies have already directly delayed or constrained their ability to scale AI initiatives. As enterprises navigate the rapid evolution toward AI agents, with 89% planning deployments within six months, the convergence of maximizing infrastructure ROI, enforcing governance, and preserving strategic flexibility have become mission-critical requirements for success.
"Enterprises face critical AI infrastructure challenges from wasted and underutilized GPU capacity costing millions, paired with the rising need for flexibility, security and governance," said Moses Guttmann, CEO and Co-founder of ClearML. "With the survey showing the prioritization of infrastructure flexibility and cost control, enterprises need platforms that deliver both. ClearML's unified approach maximizes utilization across any type of hardware while preserving the flexibility to choose what's best for each workload, without bottlenecks and at enterprise-scale."
Key Findings Reshaping Enterprise AI Infrastructure Strategy:
1. The Operational-Technical Disconnect: Manual Workflows Undermine Advanced Capabilities
Despite large GPU optimization investments, operational bottlenecks persist: only 27% have implemented automated resource sharing dashboards, while 23% rely on manual ticketing systems for compute provisioning. A notable 35% report that providing resource access to AI/ML teams remains "difficult" or "very difficult," with 31% still manually assigning workloads to specific GPUs. Considering the expectations placed on AI builders and IT teams to execute quickly, the research indicates that a significant percentage of enterprises will experience difficulties scaling AI in development and production.
2. Cost Concerns Persist Despite GPU Efficiency Advances
Cost control emerged as the overwhelming infrastructure priority, cited by 53% of respondents as the primary workload management challenge and by 70% as the top planning priority for 2025-2026. This persistent economic pressure exists alongside the rising need for GPU optimization solutions, with 35% of enterprise IT leaders identifying maximizing GPU efficiency across existing hardware as their top priority for the next 12-18 months. Better GPU utilization delivers immediate AI infrastructure ROI, while eliminating the need to purchase additional hardware to compensate for poor resource management.
3. Flexibility and Future Proofing AI Infrastructure Investments Becomes Mission-Critical
Almost half (44%) of survey respondents rated flexibility and avoiding vendor lock-in as "very important," with 63% reporting that proprietary dependencies have directly delayed or constrained AI scaling. This drives organizations toward multi-cloud strategies (37%) and exploring all available hardware options, including training or inference-specific chips. We conclude that an AI infrastructure control plane that can agnostically manage and observe a diversity of hardware will become critical to mitigating the overhead of IT teams and cluster admins.
4. AI Agent Ambitions Outpace Organizational Readiness
While 89% of enterprise IT leaders plan AI agent implementations within six months, split between custom-built (49%) and off-the-shelf solutions (40%), most lack foundational capabilities for success. When asked what operational readiness gaps concern them most, enterprise IT leaders cite security and compliance (53%), insufficient internal expertise (46%), and credential propagation challenges (46%). These concerns underscore the critical need for transparency and control over resource access when launching AI agents at scale.
5. AI Sovereignty and Security Governance Emerge as Key Priorities
Full stack control is critical to achieving AI sovereignty. The ability to prove domestic provenance, development, and deployment of AI requires complete transparency into all aspects of the AI lifecycle. Nearly one-third of organizations identify enforcing stronger user policies, permissions, and governance controls across data, models, and compute resources as their top operational priority. The most pressing security concerns center on credential management: 58% worry about automatic propagation of sensitive credentials to compute nodes, while 38% cite credential sharing between users as a major vulnerability.
Methodology
ClearML surveyed AI infrastructure and IT leadership at global enterprises and F1000 organizations ranging from 2,000 to 10,000+ employees across North America, Europe, and Asia-Pacific. Respondents included Chief AI Officers, VPs of AI Infrastructure, IT Executives , and Directors of DevOps.
Access the Full Report
The complete State of AI Infrastructure at Scale 2025-2026 report is available for download at https://go.clear.ml/state-of-ai-infrastructure-report-25-26.
About ClearML
As the leading infrastructure platform for unleashing AI in organizations worldwide, ClearML is used by more than 2,100 customers to manage GPU clusters and optimize utilization, streamline AI/ML workflows, and deploy GenAI models effortlessly. ClearML is trusted by more than 300,000 forward-thinking AI builders and IT teams at leading Fortune 500 companies, enterprises, academia, public sector agencies, and innovative start-ups worldwide. To learn more, visit the company's website at https://clear.ml.
Media Contact
Noam Harel
Chief Marketing Officer
ClearML
[email protected]
SOURCE: ClearML, Inc.
View the original press release on ACCESS Newswire
O.Valdez--RTC