Posts Tagged ‘CloudArchitecture’
[AWSReInvent2025] Amazon S3 Performance: Architecture, Design, and Optimization for Data-Intensive Systems
Lecturer
Ian Heritage is a Senior Solutions Architect at Amazon Web Services, specializing in Amazon S3 and large-scale data storage architectures. With deep expertise in performance engineering and distributed systems, Ian Heritage helps organizations design and optimize their storage layers for high-throughput and low-latency applications, including machine learning training and real-time analytics. He is a prominent figure in the AWS storage community, known for his technical deep-dives into S3’s internal mechanics and best practices for performance at scale.
Abstract
This article explores the internal architecture and performance optimization strategies of Amazon S3, the industry-leading object storage service. It provides a detailed analysis of the differences between S3 General Purpose and the newly introduced S3 Express One Zone storage class, highlighting the architectural trade-offs between regional durability and sub-millisecond latency. The discussion covers advanced request management techniques, including prefix partitioning, request routing, and the role of the AWS Common Runtime (CRT) in maximizing throughput. By examining these technical foundations, the article offers practical guidance for architecting storage solutions that can handle millions of requests per second and petabytes of data for modern AI and analytics workloads.
S3 Storage Class Selection for High Performance
The performance of an S3-based application is fundamentally determined by the selection of the storage class. For over a decade, S3 General Purpose (Standard) has been the default choice, offering 99.999999999% (11 9s) of durability by replicating data across at least three Availability Zones. While this provides extreme reliability, the regional replication introduces a baseline latency that may be too high for certain “request-intensive” applications, such as machine learning model checkpoints or high-frequency trading logs.
To address these needs, AWS introduced S3 Express One Zone. This storage class is designed for workloads that require consistent, single-digit millisecond latency. By storing data within a single Availability Zone and utilizing a new, purpose-built architecture, Express One Zone can deliver up to 10x the performance of S3 Standard at a 50% lower request cost. This class is ideal for applications that perform frequent, small I/O operations where the overhead of regional replication would be the primary bottleneck. The choice between Standard and Express One Zone is thus a strategic decision between geographic durability and extreme performance.
Request Routing, Partitioning, and the Scale-Out Architecture
At its core, Amazon S3 is a massively distributed system that scales out to handle virtually unlimited throughput. The key to this scaling is “partitioning.” S3 automatically partitions buckets based on the object keys (names). Each partition can support a specific number of requests: 3,500 PUT/COPY/POST/DELETE requests and 5,500 GET/HEAD requests per second per prefix. For many years, users were advised to use randomized prefixes to ensure even distribution across partitions.
Modern S3 architecture has evolved to handle this automatically, but understanding prefix design remains crucial for performance. When an application’s request rate increases, S3 detects the hot spot and splits the partition to handle the load. However, this process takes time. For workloads that burst from zero to millions of requests instantly, pre-partitioning or using a wide range of prefixes is still a best practice. By spreading data across multiple prefixes (e.g., bucket/prefix1/, bucket/prefix2/), an application can linearly scale its throughput to accommodate massive concurrency, limited only by the client’s network bandwidth and CPU.
Client-Side Optimization with AWS CRT and SDKs
While the S3 service is designed for scale, the performance experienced by the end-user is often limited by the client-side implementation. To bridge this gap, AWS developed the Common Runtime (CRT) library. The CRT is a set of open-source, C-based libraries that implement high-performance networking best practices, such as automatic request retries, congestion control, and most importantly, multipart transfers.
'''
Conceptual example of enabling CRT in the AWS SDK for Python (Boto3)
'''
import boto3
from s3transfer.manager import TransferConfig
'''
The CRT allows for automatic parallelization of large object transfers
'''
config = TransferConfig(use_threads=True, max_concurrency=10)
s3 = boto3.client('s3')
s3.upload_file('large_data.zip', 'my-bucket', 'data.zip', Config=config)
The CRT automatically breaks large objects into smaller parts and uploads or downloads them in parallel. This utilizes the full network capacity of the EC2 instance and mitigates the impact of single-path network congestion. For applications using the AWS CLI or SDKs for Java, Python, and C++, opting into the CRT-based clients can result in a significant throughput increase—often double or triple the speed of standard clients for large files. Additionally, the CRT handles the complexities of DNS load balancing and connection pooling, ensuring that requests are distributed efficiently across the S3 frontend fleet.
Case Study: Optimization for Machine Learning and Analytics
Machine learning training is a premier use case for S3 performance optimization. During the training of large language models (LLMs), hundreds or thousands of GPUs must simultaneously read training data and write model “checkpoints.” These checkpoints are multi-gigabyte files that must be saved quickly to avoid idling expensive compute resources. By combining S3 Express One Zone with the CRT-based client, researchers can achieve the throughput necessary to saturate the high-speed networking of P4 and P5 instances.
In analytics, the use of “Range Gets” is a critical optimization. Instead of downloading an entire 1GB Parquet file to read a few columns, an application can request specific byte ranges. This reduces the amount of data transferred and speeds up query execution. S3 is optimized to handle these range requests efficiently, and when combined with a partitioned data layout (e.g., partitioning by date or region), it enables sub-second query responses over petabytes of data. This architectural synergy between storage class, partitioning, and client-side logic is what allows S3 to serve as the foundation for the world’s largest data lakes.