AWS launches Flexible Training Plans for inference endpoints in SageMaker AI

November 28, 2025

However, the auto-scaling nature of these inference endpoints might not be enough for several situations that enterprises may encounter, including workloads that require low latency and consistent high performance, critical testing and pre-production environments where resource availability must be guaranteed, and any situation where a slow scale-up time is not acceptable and could harm the application or business.

According to AWS, FTPs for inferencing workloads aim to address this by enabling enterprises to reserve instance types and required GPUs, since automatic scaling up doesn’t guarantee instant GPU availability due to high demand and limited supply.

FTPs support for SageMaker AI inference is available in US East (N. Virginia), US West (Oregon), and US East (Ohio), AWS said.