As we all know, Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. Businesses across industries are leveraging the power of Amazon S3 for data storage across a wide range of use cases such as mobile apps, websites, data lakes, enterprise applications, cloud-native applications, etc. In this article, we walk you through some of the key pointers one should consider while using Amazon S3. This checklist has been created based on our experience of working on several projects wherein we have used Amazon S3 to store and protect data.
Before getting started with the checklist, let’s quickly understand the two resources, buckets and objects, which we make use of while storing data in Amazon S3. A bucket is a container for objects, and an object is a file and any metadata that describes that file.
Enabling Seamless Amazon S3 Bucket Configurations
1. Follow the naming rules while creating S3 buckets.
2. Choose an AWS region that is geographically closer to optimize latency, minimize costs, or address regulatory requirements.
3. Enable versioning in a bucket if there is a need to retrieve and restore every version of every object stored in a particular bucket.
4. Versioning a bucket will keep all the versions in the same bucket itself and the versioning will be applied to all objects in the bucket.
Server Access Logging
5. Enable server access logging in a bucket if there is a need to audit every request made to the bucket. Each record provides details about single access requests like requester, bucket name, request time, request action, response status, etc.
6. The target bucket which is used to store the logs should not have a default retention period configuration.
7. The normal cost for storage will apply for storing the access logs in the bucket.
8. Provide lifecycle policies for expiration in the target bucket.
9. Enable encryption of data at rest and in transit.
10. Using server-side encryption will encrypt the object when saved to the bucket and decrypt during the download.
11. Data in transit can be protected using Secure Socket Layer/Transport Layer Security (SSL/TLS) or client-side encryption.
12. Block public access unless and until the requirements state otherwise. Use Amazon S3 block public access to limit public access to S3 resources regardless of how they are created.
13. Implement least privilege access to reduce security risk.
14. Use IAM roles for applications and services that require access to S3 resources.
15. Enable lifecycle policies on a bucket for cost-effective storage of objects.
16. Define the transition and expiration actions using the lifecycle policy. Transition action will move objects between various storage classes like S3 Standard, S3 Glacier, etc. Expiration action will delete the object from S3.
17. Policy can be defined on certain prefixes of objects with the defined time period. Transition action will move the objects to the defined cost-efficient storage class after the defined time period. Expiration action will delete the object from the bucket after the defined period.
18. For versioned buckets, separate rules can be set for current and previous object versions.
19. If multi-part uploads are used in a bucket, define an abort multipart upload lifecycle policy (AbortIncompleteMultipartUpload) to minimize the storage costs. This policy will direct S3 to stop the multipart uploads that do not complete within a specified number of days after being initiated, and the parts associated with the upload will be deleted from the bucket.
CloudTrail / CloudWatch Metrics
20. Enable monitoring needs for maintaining the reliability, security, availability, and performance of Amazon S3.
21. Enable CloudWatch metrics for bucket activity like Put, Delete Requests, or Errors.
22. Define CloudWatch alarms when a certain type of activity occurs in a bucket like delete object etc.
23. Enable AWS CloudTrail for retrieving the record of actions taken by a user, role, or an AWS service in S3.
24. Configure AWS CloudTrail to log data events which will record object-level API activity for individual buckets.
Managing Amazon S3 Objects Effectively
Prefix for Object Name
25. Choose prefixes for the object name in such a way that the read and write performance is increased.
26. The S3 read and write performance can be increased by parallelizing the reads by having more prefixes in the bucket.
27. Use the latest version of AWS SDKs to obtain the latest performance optimization features.
28. The SDK automates horizontal scaling of connections to achieve thousands of requests per second, using byte-range requests where appropriate.
Timeout and Retry
29. Configure timeout and retry in AWS SDK for S3 requests.
30. Give less timeout for the requests and retry the slow ones. Given the large scale of Amazon S3, if the first request is slow, a re-tried request is likely to take a different path and quickly succeed.
31. Ideally, if the object size to be uploaded is more than 100 MB, use multipart upload to improve throughput.
32. Multipart allows you to upload a single object as a set of parts. Each part can be uploaded in parallel, and if one part is failed, it can be retried individually.
Byte Range Fetch
33. Leverage byte-range fetch while getting only a specified portion of a file.
34. Concurrent connections can be used to Amazon S3 to fetch different byte ranges from within the same object, thereby achieving higher aggregate throughput versus a single whole object request.
35. If a file is uploaded using multipart upload, each part can be retrieved using the GET request by specifying the part name.
We have shared some of the key points one should consider while implementing Amazon S3. There are additional security, audit, and performance best practices that can be followed like, using Amazon S3 transfer acceleration, multi-factor authentication, trusted advisors, S3 inventory, etc. From building high-performance, scalable applications to providing unmatched storage options and security, Amazon S3 has emerged as the ultimate object storage service to store data for millions of applications for companies all around the world.
Associate Architect, RapidValue