Akamai takes AI inference to the edge with Nvidia-powered grid across 4,400 locations
Launches AI Grid orchestration in Inference Cloud to route AI workloads across edge, regional and core infrastructure based on cost, latency and performance.
Akamai Technologies has rolled out a distributed AI inference platform built on Nvidia infrastructure, enabling enterprises to run AI workloads across over 4,400 edge locations alongside regional and core data centres.
The company has introduced AI Grid intelligent orchestration as part of its Inference Cloud, positioning it as a control layer that routes AI workloads in real time across its global network.
The platform integrates Nvidia AI Grid reference architecture with Akamai’s edge infrastructure, combining compute, networking and software to support large-scale AI inference.
Akamai is deploying thousands of Nvidia RTX PRO 6000 Blackwell Server Edition GPUs, creating a mix of edge nodes and high-density GPU clusters for different workload requirements.
The company is targeting a shift from centralised AI infrastructure towards a distributed model, where inference runs closer to the point of user interaction instead of relying on distant data centres.
At the centre of the platform is an orchestration engine that acts as a real-time broker for AI requests.
It evaluates latency, cost and performance parameters and routes workloads to the most efficient compute location.
The system optimises key metrics including cost per token, time-to-first-token and throughput, allowing enterprises to balance performance with infrastructure costs.
The company applies techniques including semantic caching and intelligent routing to reduce unnecessary GPU usage.
The platform directs latency-sensitive workloads to edge nodes while reserving high-performance GPU clusters for compute-intensive tasks.
Combines edge, core and GPU clusters for real-time AI
Akamai structures the platform as a continuum of compute across edge and core infrastructure.
At the edge, the company uses its network of over 4,400 locations to process AI requests closer to end users.
The platform supports low-latency inference for use cases of AI agents, physical AI systems and real-time applications.
It uses serverless compute capabilities, including EdgeWorkers and Akamai Functions, along with caching to maintain consistent performance at the point of interaction.
At the core, Akamai runs multi-thousand GPU clusters powered by Nvidia Blackwell architecture to handle large language models, continuous post-training and multi-modal inference workloads that require sustained compute.
The infrastructure runs on Nvidia AI Enterprise software and uses Nvidia BlueField DPUs to accelerate networking and security functions across the platform.
According to the company, it is already seeing adoption across industries with latency-sensitive requirements.
Gaming companies are using the platform to deliver sub-50 millisecond AI-driven interactions. Financial institutions are deploying it for real-time fraud detection and personalised recommendations during user sessions.
Media companies are using the infrastructure for live transcoding and real-time dubbing, while retailers are deploying AI at point-of-sale and in-store environments.
Akamai said it has secured a $200 million, four-year agreement to deploy a multi-thousand GPU cluster in a metro-edge data centre.
The company positions the platform to move beyond traditional AI factory models, which rely on centralised GPU clusters for both training and inference.