Mastering AWS lifecycle configuration: How long is a year, anyway?

Michelle Crane | Last updated on September 4, 2024 | 6 minute read

What I told you was true, from a certain point of view.”

– Obi-Wan Kenobi

At Rewind, we retain the current version of any particular object for as long as you are an active subscriber. Older versions are kept for 365 days. For example, if an object is set to expire on April 1, 2024, you might expect it to be gone the next day—but actually, that’s not the case!

This article explains why not, delving into the subtleties of AWS S3 lifecycle expiration actions.

Amazon S3

At Rewind, we use Amazon Simple Storage Service (S3) to store our customers’ data. Amazon is an industry leader in providing secure, performant, and scalable data storage. Amazon S3 includes the concept of objects (e.g., text files, images, etc.) and buckets, which are containers that hold objects. Every object is contained in a bucket; together, buckets and objects are called ‘resources’.

Rewind uses versioning-enabled buckets, which lets us store multiple versions of an object in the same bucket. AWS keeps one current version and any number of older, noncurrent, versions. For example, without versioning, I might save three copies of this article: ‘article-draft.txt’, ‘article-final.txt’, and ‘article-final-final.txt’. With versioning, I would just save ‘article.txt’ and still be able to access the older versions.

When I want to delete this document, I instruct S3 to delete it. AWS then places a ‘delete marker’ on the object. I can no longer access the current version of the document because it is marked as ‘deleted’. However, I still have access to the older versions. Using versions in this manner allows customers to update their objects without worrying about keeping track of changes. Rewind takes advantage of S3’s functionality that allows users to restore to older versions.

Lifecycle Policies

AWS offers multiple ways to help us manage our storage, including the ability to create lifecycle configurations at the bucket level. A configuration is a set of rules, which can be used to transition objects from one class of storage to another or to delete them.

We use AWS Lifecycle rules to handle the automatic deletion of out-dated objects. With trillions of objects stored in S3, there is no way for Rewind to handle this manually; we must make use of the automation provided by AWS. You can see one of our lifecycle configurations below.  After 90 days, we transition current versions to a slower—but more economical—tier. After 365 days, we delete noncurrent objects.

Lifecycle configuration

When we talk about deleting objects in storage, we are not saying that we delete the current versions (the versions that customers see in their platforms). We keep the current version for as long as a customer is subscribed to our service. Instead, we are talking about deleting the older versions, which have been ‘overwritten’ by changes the customer has made.

In general, deletion from S3 happens in two phases. When deleting the current version, AWS first adds a delete marker to that version, making the current version noncurrent, and stops billing us for its storage. The second phase is that AWS sweeps through all of the objects, looking for delete markers and removing those objects. This asynchronous step starts at midnight on the day after the object is marked for deletion, and continues for some amount of time.  

We have noticed that this second phase can take days, or weeks, depending on how many objects have been marked for deletion.

For noncurrent versions, AWS uses the NoncurrentVersionExpiration action to permanently remove the older, noncurrent, versions of objects. In this case, a delete marker is not created.

Surprisingly Old Version

So far, so good. Why then, when I looked at one of our demo accounts, was I seeing a version that was over two years old?

Vault showing a version of P that is over two years old

The versions we show in the vault are the versions of objects stored in AWS S3. When we crack open S3, we see these same three versions of the object. This very old version is definitely in S3.

AWS S3 bucket showing same versions as the vault

Our 365-day retention is based on the Lifecycle rule on this S3 bucket. This bucket clearly states that after 365 days, we permanently delete noncurrent versions.

Lifecycle rule showing that after 365 days all noncurrent versions are deleted

The next thing we contemplated was checking to see if there was any kind of delete marker on this object. Delete markers are visible in the AWS console, but we didn’t see any. In retrospect, this makes sense, since the delete marker would be on the current version, not the noncurrent version.

How about an expiry header? These are not visible in the console; we have to use the API to check these. Another red herring—expiry headers are only used on current versions as well.

We were stumped! We had a lifecycle rule that clearly stated ‘365 days’ and we clearly had a version that was over 365 days old. We reached out to AWS support for some clarification. It turns out that everything is working as designed, but there was a subtlety in the wording that we were missing.

The lifecycle configuration states, “All other noncurrent versions are permanently deleted” after 365 days. Let’s look at our version dates again.

Versions and dates of objects in S3

We understand that the April 18 version of the object was ‘current’ and that the two older versions were ‘noncurrent’. Because the September 12 version was within 365 days, we didn’t expect it to disappear. But the January 31 version was older than 365 days, so we expected it to disappear.

However, for noncurrent versions, the date that they become eligible for deletion has nothing to do with their last modified date. Instead, it is the number of days from when the object was overwritten.

From the AWS documentation:

When specifying the number of days in the NoncurrentVersionTransition and NoncurrentVersionExpiration actions in a Lifecycle configuration, note the following:

The value that you specify is the number of days from when the version of the object becomes noncurrent (that is, when the object is overwritten or deleted) that Amazon S3 will perform the action on the specified object or objects.

In other words: the January 31, 2022 version became noncurrent when the next version was added, i.e., on September 12, 2023. It will be ready for deletion not on 31 January, 2023, but on September 12, 2024.

This timeline explains how the versions of this object were added, and when the noncurrent versions can be deleted:

Timeline of object versions

Summary

We have learned several interesting lessons while researching this issue:

  • In versioning-enabled buckets, each object has a current version, and zero or more noncurrent versions.
  • It is important to keep the concept of current/noncurrent in mind when reading the AWS documentation. For instance, the documentation around deletion tends to focus on the deletion of current versions, but to learn about the deletion of noncurrent versions, we need to research the NoncurrentVersionExpiration action.
  • Deletion of current objects happens in two phases: first, the object is assigned a deletion marker, and second, at some point in the future, the current version is deleted (marked unavailable).
  • Only current versions get expiry headers; noncurrent versions do not.
  • Although deletion markers are visible in the AWS console, expiry headers are only available through an API call.
  • When using object age for a lifecycle rule affecting noncurrent versions, the age of the object is not calculated based on its last modified date. Instead, the age is based on when the object became noncurrent.

Conclusion

Understanding the subtleties of AWS S3 lifecycle configurations helps Rewind provide better service to our customers. The distinction between an object’s modification data and when it becomes noncurrent is subtle, yet significant. By understanding these nuances, you can have greater confidence in our automated processes for handling your data.

References:


Profile picture of <a class=Michelle Crane">
Michelle Crane
Michelle is enjoying her second career as a Software Development Manager at Rewind. Her first career was as an officer in the Royal Canadian Air Force, where her specialty was Logistics, specifically fourth-line air/sea movement control. After honing her leadership skills, she went back to school and completed an MSc and PhD in Computer Science at Queen’s University. She enjoys working with development teams focused on designing, creating and delivering complex, leading-edge software products that solve real-world customer problems.