As a DevOps engineer, our #1 priority is to ensure our service remains available – and Rewind customers can restore their data – around the clock. Serviceability, availability, and scalability are always top of mind and have a large influence in our software architecture. For this reason, we recently began reworking the infrastructure for one of our services to better fulfil these priorities.
Let’s dive into the issue, how we solved it, and the lessons learned along the way.
The Problem:
At Rewind, one of the services we operated ran within a single Amazon Elastic Compute Cloud (EC2) instance using docker-compose. For data persistence, we made use of Amazon’s Elastic Block Store (EBS), attaching an EBS volume to our EC2 instance. The volume was mounted by Docker and made available to our containerized service. While it worked well at a small-scale, it was lacking in a couple areas:
Serviceability
Whenever a security or system update was available, it was a lengthy process to patch the host instance. The compose stack had to be manually stopped, then we had to patch the instance and restart the stack. This was a menial task that required lots of effort and impacted availability.
Availability
Another manual aspect of this solution were deployments. Deploying new code also required manually stopping the stack to update the docker image. This prevented continuous deployments and a maintenance window always had to be scheduled to deploy changes. As this was cutting into Rewind’s service availability, it was identified as an issue.
Scalability
Due to a limitation of EBS and our volume type dependency, our volume could only be attached to a single EC2 instance at a time. This meant we could only vertically scale our instance size to handle spikes in traffic. Furthermore, we couldn’t scale our containers, as only one instance of our compose stack could run. This meant our system lacked redundancy.
The Solution:
So the question arose — how could we make our service more easily updatable and scalable, with less downtime? Note that our data could not be migrated, so our solution needed to support EBS.
ECS & REX-Ray
What we needed was a container orchestration tool that enabled autoscaling and simplified deployments. If you work in this space, you know there are many options: k8s, Docker Swarm, Rancher, etc. For us, Amazon Elastic Container Service (ECS) ticked most of our boxes.
With ECS, many of our problems disappeared. Servicing or vertically scaling the host instance was as simple as creating a new launch template with an upgraded AMI (Amazon Machine Image) or different instance size. An autoscaling group would then spin up a new instance for us to deploy our code to. ECS also allowed for rolling or blue/green deployments, which reduced downtime. Lastly, an autoscaling policy could be applied to our containers to respond to jumps in traffic accordingly.
Unfortunately, the only limitation of ECS was that it didn’t natively support EBS volumes. This is where the Docker plugin REX-Ray comes in. As a vendor-agnostic storage orchestration engine, it allows for data persistence between container lifecycles. Below, we outline how we made use of REX-Ray and ECS, and deployed our changes in Terraform.
Terraforming
We’ll provide some Terraform code snippets as we work through the solution of how we mounted our EBS volumes to an ECS cluster. To see a full implementation, check out my repo! As you follow along, don’t forget to back up to GitHub as you go so you don’t lose your progress.
Launch Template
Start by creating a user data template to bootstrap the EC2 container instances. This will be used to install REX-Ray and configure the ECS agent. We called ours user_data.config.tpl:
#!/bin/bash echo ECS_CLUSTER=${ECS_CLUSTER} >> /etc/ecs/ecs.config
# install the REX-Ray Docker volume plugin docker plugin install rexray/ebs REXRAY_PREEMPT=true EBS_REGION=${EBS_REGION} --grant-all-permissions
# restart the ECS agent. This ensures the plugin is active and recognized once the agent starts. sudo systemctl restart ecs
Then, pass the user data template (user_data.config.tpl) to a launch template to spin up the container instances:
data "template_file" "user_data" { template = file("user_data.config.tpl") vars = { ECS_CLUSTER = aws_ecs_cluster.ecs_cluster.name EBS_REGION = var.region } } resource "aws_launch_template" "ecs_launch_template" { name = "ecs-lt" image_id = "ami-028f238814e34dfdc" # amz linux 2 ecs-optimized image instance_type = "t2.micro" vpc_security_group_ids = [aws_security_group.ecs_sg.id] user_data = base64encode(data.template_file.user_data.rendered) iam_instance_profile { name = aws_iam_instance_profile.ecs_iam_ip.name } }
You’ll see an instance profile referenced in the last line above. In our file, we used the permissions suggested by the REX-Ray docs, but not all may be required for your specific use case. You can play around with it depending on your needs.
EBS Volume
We created a standalone EBS volume in Terraform and referenced it in our ECS task definition. You can also ‘autoprovision’ the volume in the definition. This will create a volume if it doesn’t exist or mount the existing volume if it’s already been created. Be sure to set ‘scope’ to ‘shared’ to ensure persistence. Otherwise, the volume will be destroyed at the end of the container lifecycle.
resource "aws_ecs_task_definition" "ecs_td" { family = "ecs-task-definition" container_definitions = data.template_file.td_template.rendered volume { name = "ecs-ebs-volume" docker_volume_configuration { scope = "shared" autoprovision = true driver = "rexray/ebs" driver_opts = { volumetype = "gp2" size = 5 } } }
Once you’ve mounted the EBS volume, map the paths in the container definition. Ours is called task_definition.json.tpl:
[ { "essential": true, "memory": 100, "name": "${container_name}", "cpu": 1, "image": "${image}", "environment": [], "portMappings": [ { "hostPort": 80, "containerPort": ${container_port} } ], "mountPoints": [ { "sourceVolume": "${source_volume}", "containerPath": "${container_path}" } ] } ] data "template_file" "td_template" { template = file("task_definition.json.tpl") vars = { container_name = var.container_name image = var.image_url container_port = var.container_port source_volume = "ecs-ebs-volume" container_path = "/mnt/ecs-ebs-volume" } }
Checking the Volume Mount
Way to go! You’ve defined and mounted your EBS volume. Now take a moment to verify that everything went as planned. Check your AWS console to make sure a new instance appears with the attached volumes. Then, follow these steps to verify that your data persists between new tasks or instances:
SSH (or SSM as we Rewinders do!) into your ECS instance using your existing key pair.
ssh -i <identity-file> ec2-user@<instance's public dns>
Next, retrieve the ID of the running container with the command:
docker ps
And ‘exec’ into the container with the container ID you retrieved:
docker exec -it <container_id> /bin/sh
Finally, we’ll create a test file to make sure everything works. You can ‘echo’ anything to a file within the container folder:
echo "test" > /mnt/ecs-ebs-vl/file.txt
Restart the task definition for the EC2 instance. It should automatically mount the existing EBS volume, and the test file should still be there.
Limitations
There are a few caveats to this approach.
- Docker Volumes are only supported when using the EC2 launch type in ECS. There is no support for Fargate for the time being.
- Unless the volume is mounted as read-only, having multiple containers simultaneously write to the same volume may result in data inconsistency. To ensure data resiliency, your application must provide write-ordering. Alternatively, use a clustered filesystem such as GlusterFS.
- Currently, only io1 and io2 volumes support EBS Multi-Attach, which allows a single volume to be attached to up to 16 Linux instances in the same Availability Zone. However, REX-Ray currently lacks support for these volume types. As such, autoscaling can only occur at the container level as only a single EC2 instance can mount the volume. If/when this becomes available, it should also be noted there is still ongoing work in Terraform to enable Multi-Attach for io2 volumes through the AWS API. Our example utilizes the gp2 volume type.
Conclusion
Any DevOps professional knows that serviceability, availability, and scalability are crucial to building an application that gives users a smooth and reliable experience. Using Amazon ECS simplifies the process of deploying containerized services and can be made to work with EBS with little effort.
If you’re interested in solving more problems like this, you should check out Rewind’s open positions to start your career in DevOps, Security, Engineering, and more.