Lenovo’s AI Inferencing Servers to enable more powerful AI use cases

“Our AI inferencing servers are purpose-built to help enterprises operationalize AI at scale - turning trained models into real-time, actionable intelligence while controlling cost, performance, and energy consumption,” says Sumir Bhatia, President, Asia Pacific, ISG at Lenovo.

As Lenovo Tech World at CES 2026 in Las Vegas, Lenovo unveiled a suite of purpose-built enterprise servers, solutions and services for AI inferencing workloads. The new products not only expand the Lenovo Hybrid AI Advantage portfolio but also advance the field of inferencing, eliminating hurdles to power more AI use cases.

Organizations are expected to mark a focus on AI inference infrastructure, with Futurum estimating the global AI inference infrastructure market to reach US$48.8 billion by 2030. AI Inferencing marks a significant pivotal shift from training Large Language Models (LLMs) to leveraging fully trained models to analyze unseen data to make instant decisions.

In email comments to CRN Asia, Sumir Bhatia, President, Asia Pacific, ISG at Lenovo said inference optimization is where AI starts to unlock real business impact. Bhatia added that according to IDC, over a model’s lifetime, AI inferencing can cost up to 15 times more than training, making efficiency at inference a critical CIO priority.

“This is where Lenovo sees the greatest value for customers. Our AI inferencing servers are purpose-built to help enterprises operationalize AI at scale - turning trained models into real-time, actionable intelligence while controlling cost, performance, and energy consumption. With architectures optimized for GPU performance, memory bandwidth, networking, and latency, customers can significantly improve utilization and ROI compared to running inferencing on general-purpose infrastructure,” he explained.

Bhatia also pointed out that the impact is strongest in environments that demand continuous, low-latency decision-making - from enterprise data centers running large language models to distributed edge locations where AI must operate close to the data.

“By removing key bottlenecks that drive inferencing cost and complexity, Lenovo enables customers to scale AI responsibly and sustainably,” he added.

New AI Inferencing Servers

The new AI inferencing servers unveiled are expected to provide a range of offerings for business workloads of all sizes, featuring state of the art GPU, memory and networking capabilities.

These include:

“The portfolio is intentionally broad because inferencing is no longer limited to a few industries - it’s becoming foundational across enterprise operations. That said, certain sectors see immediate and outsized impact,” Bhatia explained.

Bhatia highlighted tha in large data center environments, servers like the ThinkSystem SR675i are well suited for running full large language models at scale. The server supports use cases such as advanced simulation in manufacturing, AI-assisted diagnostics and treatment planning in healthcare, and real-time risk and fraud analysis in financial services.

Meanwhile, he said the ThinkSystem SR650i enables high-density GPU inferencing that fits easily into existing data centers, helping organizations scale without major infrastructure disruption.

“At the edge, the ThinkEdge SE455i brings inferencing directly to where data is created. This is critical for industries like retail, telecommunications, and industrial operations, where ultra-low latency and reliability matter. Use cases include real-time personalization, computer vision-driven quality inspection, and network optimization. Processing data locally not only accelerates insights but also improves security and operational resilience by reducing reliance on centralized cloud resources,” he said.

Meeting the server market demand

For Bhatia, the new servers also enable Lenovo to have a stronger presence in the server market, especially the increasing demand for both more powerful and sustainable servers.

“The market is no longer choosing between power and sustainability. Enterprises now need both, especially as inferencing becomes the dominant AI workload. Lenovo’s inferencing servers are built for this shift. They are designed to deliver higher inferencing performance while improving performance per watt, which directly addresses rising energy costs and data center constraints. With support for Lenovo Neptune liquid cooling, customers can run dense, high-performance AI workloads more efficiently, making large-scale inferencing viable without compromising sustainability goals,” Bhatia explained.

Bhatia also pointed out that what strengthens Lenovo’s position is the end-to-end approach.

“Through the Lenovo Hybrid AI Factory and TruScale consumption models, customers are not just buying servers - they are getting validated infrastructure, services, and flexible scaling to move AI into production with confidence. As enterprises transition from AI pilots to always-on inferencing, this combination positions Lenovo as a long-term infrastructure partner, not just a hardware supplier,” he concluded.