AWS SageMaker now lets customers send small inference requests directly to async endpoints without uploading to S3 first, cutting network delays and simplifying code for payloads up to 128,000 bytes.

Amazon AI Blog14h ago2 min read

Summaries like this, in your inbox every morning.

3 Key Points

1
What happened: Amazon SageMaker AI Async Inference now accepts a new Body parameter that allows inference payloads to be sent inline in the API request itself, removing the requirement to upload input data to Amazon S3 before each invocation. The feature supports payloads up to 128,000 bytes and is available in 31 commercial AWS Regions.
2
Why it matters: Previously, every async inference request required two steps—uploading to S3 and then invoking the endpoint—which added latency, complexity, and cost even for small payloads. The inline approach removes one network round-trip, eliminates the need to provision an input S3 bucket, and avoids per-request S3 PUT charges. It also simplifies error handling by returning validation feedback synchronously in a single API call.
3
What to watch: The feature works only for payloads up to 128,000 bytes; larger payloads (images, audio, multi-MB documents) still require the S3 InputLocation approach. Customers can adopt the new Body parameter immediately by updating to the latest Boto3 SDK, with no changes needed to existing async endpoints.

No discussion yet for this article

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack