How to Backup EC2 Instances via Automatic Snapshots using AWS Lambda and Cloudwatch

Written By Adam Keogh, Edited By Mae Semakula-Buuza
Tue 10 July 2018, in category Aws

AWS, Python

  
@

In this article, we’ll cover how to use AWS Lambda and Amazon CloudWatch to automatically backup your EC2 servers. Snapshots are a cheap way to back up your servers and contain all the information required to restore data to a new EBS volume.

AWS Lambda offers us the ability to execute code written in a language of our choice, so for this we will use Python to write a script which takes snapshots (as well as deleting older ones).

We will then make a rule on CloudWatch which uses a cron schedule to execute this function every night.

Creating IAM Role

Before making a lambda function, it’s necessary to create an IAM (Identity and Access Management) role. This will define what permissions our lambda function has. From the AWS Management Console (you’ll need IAM permissions associated with your account in order to do this), navigate to Services -> IAM -> Roles -> Create Role.

When selecting what service will use this role, choose AWS Lambda. It will then invite you to attach one or more permissions policies to the role, but we will be making our own custom policy, so select Create Policy.

We want our function to be able to:

You can make this policy using the visual editor or by providing a JSON – below is a JSON policy you can use.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "logs:*",
                "Resource": "*"
            },
            {
                "Effect": "Allow",
                "Action": "ec2:Describe*",
                "Resource": "*"
            },
            {
                "Effect": "Allow",
                "Action": [
                    "ec2:CreateSnapshot",
                    "ec2:DeleteSnapshot",
                    "ec2:CreateTags",
                    "ec2:DeleteTags",
                    "ec2:ModifySnapshotAttribute"
                ],
                "Resource": [
                    "*"
                ]
            }
        ]
    }

Tagging

To label which EC2 Instances we want to snapshot – we will use tags.

Tagging EC2 Instances

Lambda

Now, navigate to the AWS Lambda Management Console. Then select Create Function > Author from Scratch. Name your function, choose Python 3.6 as the runtime, finally for roles select Choose an Existing Role (and select the role we made earlier).

Code

Boto is the Amazon Web Services SDK for Python, it allows us to write code that interacts with AWS Services. See docs here.

Below is the function code, paste this into the visual interface of your lambda function.

import boto3
import collections
import datetime
import time
import sys


today = datetime.date.today()
today_string = today.strftime('%Y/%m/%d')
delete_after_days = 2  # Delete snapshots after this many days

# Except after Monday (at Tuesday ~1am), since Friday is only 2 'working' days away:
if datetime.date.today().weekday() == 1:
    delete_after_days = delete_after_days + 2

deletion_date = today - datetime.timedelta(days=delete_after_days)
deletion_date_string = deletion_date.strftime('%Y/%m/%d')


ec2 = boto3.client('ec2')
regions = ec2.describe_regions().get('Regions',[] )
all_regions = [region['RegionName'] for region in regions]

def lambda_handler(event, context):
    snapshot_counter = 0
    snap_size_counter = 0
    deletion_counter = 0
    deleted_size_counter = 0

  for region_name in all_regions:
      print('Instances in EC2 Region {0}:'.format(region_name))
      ec2 = boto3.resource('ec2', region_name=region_name)

      # We only want to look through instances with the following tag key value pair: auto_snapshot : true
      instances = ec2.instances.filter(
          Filters=[
              {'Name': 'tag:auto_snapshot', 'Values': ['true']}
                  ]
              )

      volume_ids = []
      for i in instances.all():

          for tag in i.tags:  # Get the name of the instance
              if tag['Key'] == 'Name':
                  name = tag['Value']

          print('Found tagged instance \'{1}\', id: {0}, state: {2}'.format(i.id, name, i.state['Name']))

          vols = i.volumes.all()  # Iterate through each instance's volumes
          for v in vols:
              print('{0} is attached to volume {1}, proceeding to snapshot'.format(name, v.id))
              volume_ids.extend(v.id)
              snapshot = v.create_snapshot(
                  Description = 'AutoSnapshot of {0}, on volume {1} - Created {2}'.format(name, v.id, today_string),
                  )
              snapshot.create_tags(  # Add the following tags to the new snapshot
                  Tags = [
                      {
                          'Key': 'auto_snap',
                          'Value': 'true'
                      },
                      {
                          'Key': 'volume',
                          'Value': v.id
                      },
                      {
                          'Key': 'CreatedOn',
                          'Value': today_string
                      },
                       {
                          'Key': 'Name',
                          'Value': '{} autosnap'.format(name)
                      }
                  ]
              )
              print('Snapshot completed')
              snapshot_counter += 1
              snap_size_counter += snapshot.volume_size

              # Now iterate through snapshots which were made by autsnap
              snapshots = ec2.snapshots.filter(
                  Filters=[
                      {'Name': 'tag:auto_snap', 'Values': ['true']
                      }
                  ]
              )


              print('Checking for out of date snapshots for instance {0}...'.format(name))
              for snap in snapshots:
                  can_delete = False
                  for tag in snap.tags: # Use these if statements to get each snapshot's
                                        # cleated on date, name and auto_snap tag
                      if tag['Key'] == 'CreatedOn':
                          created_on_string = tag['Value']
                      if tag['Key'] == 'auto_snap':
                          if tag['Value'] == 'true':
                              can_delete = True
                      if tag['Key'] == 'Name':
                          name = tag['Value']
                  created_on = datetime.datetime.strptime(created_on_string, '%Y/%m/%d').date()

                  if created_on <= deletion_date and can_delete == True:
                      print('Snapshot id {0}, ({1}) from {2} is {3} or more days old... deleting'.format(snap.id, name, created_on_string, delete_after_days))
                      deleted_size_counter += snap.volume_size
                      snap.delete()
                      deletion_counter += 1

  print('   Made {0} snapshots totalling {1} GB\
        Deleted {2} snapshots totalling {3} GB'.format(snapshot_counter, snap_size_counter, deletion_counter, deleted_size_counter))
  return

Every time this function is run, it will do the following:

There is one final thing to change on this page, located under Basic Settings, the default timeout for lambda functions is 3 seconds, but we will need to increase this (I used 59 seconds), otherwise our function will timeout before completing the snapshots correctly.

Basic settings

Click Save to save the function. (And having saved it, use the Test button to verify everything is working correctly.

Cloudwatch Rule

Now that our lambda function is prepared, we need to automate it using a CloudWatch rule. Navigate to the CloudWatch Management Console Services -> CloudWatch -> Rules and select Create Rule.

Below are the settings I used to create the rule – the below cron schedule will run the function every day at 12:05am UTC.

Create rule

Press the Configure Details button and then on the next page select Create Rule.

Logging

Every time this lambda function runs, it will print a log to a folder in CloudWatch. On the AWS Management Console, navigate to Services -> CloudWatch -> Logs.

The function we have just made is auto_instance_snapshotting, shown below.

Logging

Opening this folder gives a series of logs streams, each one from a different time the function was run. I’ll open the most recent one, this shows all logs and print statements generated by our lambda function. If you tested your function earlier, you’ll be able to check the logs here to ensure it ran successfully.

Alarms

A nice feature of CloudWatch is the alarms feature. It can be used to monitor metrics and send email notifications if user-defined threshold is breached. In this case, I have configured an alarm to email me if this function errors.

This can be done via Services -> CloudWatch -> Alarms -> Create Alarm. You will be prompted to select a metric, find the metric by clicking on Lambda Metrics -> By Function Name -> Your Function Name (Metric Name: Errors).

Select Metrics for Alarm

And on the following page, use the below settings to set up the email notification.

Define alarm settings

Points to Consider

Related Links

https://aws.amazon.com/premiumsupport/knowledge-center/ebs-snapshot-billing/