Save Money by Scheduling your AWS EC2 instances using AWS Lambda and Cloudwatch

Written By Adam Keogh, Edited By Mae Semakula-Buuza
Tue 25 September 2018, in category Aws

AWS, Python

  
@

Background

In our office, we use several AWS services, but the vast majority of our spend comes from EC2 instance costs. We'd also often find that from time to time our servers would be left on some nights when they weren't being used, or occasionally on Fridays or bank holiday weekends. If you only use an server for 8 hours a day, that's ~40 billing hours a week. A server left on on a Friday evening will be running for ~60 hours over the weekend. I began to look for solutions to this problem - I was surprised to find a 3rd party service called Skeddly - Skeddly's two main features are scheduling EC2 instance off and on time, as well as scheduling automatic backups (which I have already discussed can be accomplished with AWS Lambda on my other blog here Amazon also have an Instance Scheduler Tool - but after using it briefly for a couple weeks I found it to be overly complex - at least for my use case - and though I could do everything required with a simple lambda function.

Creating an IAM Role

Just as in my previous blog on Lambdas, we will need to create an IAM (Identity and Access Management) role which defines the permissions our lambda function has. From the AWS Management Console (you’ll need IAM permissions associated with your account in order to do this), navigate to Services -> IAM -> Roles -> Create Role.

When selecting what service will use this role, choose AWS Lambda. It will then invite you to attach one or more permissions policies to the role, but we will be making our own custom policy, so select Create Policy.

We want our function to be able to: - Read different types of information from EC2 (so we’ll give it full describe and read permissions) - Start and stop EC2 instances - Write access to CloudWatch Logs

You can make this policy using the visual editor or by providing a JSON – below is a JSON policy you can use.

  {
      "Version": "2012-10-17",
      "Statement": [
          {
              "Effect": "Allow",
              "Action": "logs:*",
              "Resource": "*"
          },
          {
              "Effect": "Allow",
              "Action": "ec2:Describe*",
              "Resource": "*"
          },
          {
              "Effect": "Allow",
              "Action": [
                    "ec2:StartInstances",
                    "ec2:StopInstances"
              ],
              "Resource": [
                  "*"
              ]
          }
      ]
  }

Tagging

Some of our EC2 instances require 100% uptime, for example - the server we host this blog on. But most of our servers only need to be on during UK office hours.

I'm going to tag a few instances on EC2 with the tag 'uk-office-hours' to select those which I want to have this schedule. - Simply select the instance which we want to auto-schedule – then give it the tag Key: ‘Schedule’ and Value: 'uk-office-hours' (see screenshot below). Tagging EC2 Instances

Lambda

Now, navigate to the AWS Lambda Management Console. Then select Create Function > Author from Scratch. Name your function, choose Python 3.6 as the runtime, finally for roles select Choose an Existing Role (and select the role we made earlier).

Code

Below is the function code, paste this into the visual interface of your lambda function.

import boto3
import collections
import datetime
import time
import sys


today = datetime.date.today()
today_string = today.strftime('%Y/%m/%d')

ec2 = boto3.client('ec2')
regions = ec2.describe_regions().get('Regions',[] )
all_regions = [region['RegionName'] for region in regions]

# NB - RULES FOR DEFINING SCHEDULES
# NEW SCHEDULES MUST FOLLOW THE FORMAT OF OTHER SCHEDULES IN THE DICTIONAY BELOW
# Start & stop times are 24 hour UTC+0 times,
# e.g 'start_time': '09:00', 'stop_time': '19:00'.
# The power_on and power_off variables refer to whether this lambda function
# can automatically power the server on and/or off. For example:
# 'power_on': 'disabled' and 'power_off': 'enabled' means that the instance gets
# automatically stopped at its stop time, but not automatically started at start time

SCHEDULES = {'uk-office-hours':
                                {'start_time': '08:00',
                                 'stop_time': '19:00',
                                 'power_on': 'disabled',
                                 'power_off': 'enabled'}
}

def time_in_range(start, end, x):
    """Returns true if x is in the time range (start, end)
       Where start, end, x are datetime objects"""
    if start <= end:
        return start <= x <= end
    else:
        return start <= x or x <= end


def lambda_handler(event, context):
    """This function iterates through the schedules defined above,
       finding instances tagged with that schedule.
       It will then power on / off each instance if certain conditions
       are met"""

    now = datetime.datetime.now().time()
    print('Current time is {0}'.format(now))
     # For each schedule, lets look at instances with that schedule
    for schedule in SCHEDULES:

        # The below lines get the time inputted into the schedules dict
        # and convert it to a python datetime object
        start_time = SCHEDULES[schedule]['start_time']
        start_time = datetime.datetime.strptime(start_time, '%H:%M').time()

        stop_time =  SCHEDULES[schedule]['stop_time']
        stop_time = datetime.datetime.strptime(stop_time, '%H:%M').time()
        print('Looking at tagged instances for schedule {0}, Start Time: {1},'
                  'Stop Time: {2}'.format(schedule, start_time, stop_time))

        for region_name in all_regions:
            print('Instances in EC2 Region {0}:'.format(region_name))
            ec2 = boto3.resource('ec2', region_name=region_name)

            instances = ec2.instances.filter(
                Filters=[
                    {'Name': 'tag:Schedule', 'Values': [schedule]}
                    ]
                )    
            for instance in instances.all():
                for tag in instance.tags:  
                    if tag['Key'] == 'Name': # Get the name of the instance
                        instance_name = tag['Value']

                    instance_state = instance.state['Name'] # and the state

                print('Found tagged instance \'{1}\', id: {0}, schedule: {3}, state:'
                      '{2}'.format(instance.id, instance_name, instance_state, schedule))


                if time_in_range(start=start_time, end=stop_time, x=now):
                    # When the current time lies within our schedule
                    # we may need to start instances, but won't need to stop any
                    if (SCHEDULES[schedule]['power_on'] == 'enabled' and
                        instance_state == 'stopped'):
                            instance.start()
                            print('Instance {0} started'.format(instance_name))
                    else:
                        print('Instance {0} not started, because:'.format(instance_name))
                        print(' power_on = {0} and state = {1}'
                              .format(SCHEDULES[schedule]['power_on'], instance_state))

                elif time_in_range(start=start_time, end=stop_time, x=now) == False:
                    # Likewise, when the current time is out of the schedule
                    # we may need to stop instances, but won't need to start any
                    if (SCHEDULES[schedule]['power_off'] == 'enabled' and
                        instance_state == 'running'):
                            instance.stop()
                            print('Instance {0} stopped'.format(instance_name))
                    else:
                        print('Instance {0} not stopped, because:'.format(instance_name))
                        print(' power_off = {0} and state = {1}'
                              .format(SCHEDULES[schedule]['power_off'], instance_state))


    return

The 'Rules for Schedules' comment block above outlines how the schedules work, with our uk-office-hours schedule defined.

When this function runs, it loops through the user defined schedules, checking if any servers have been tagged with this schedule. If it finds a tagged server - it will check if the current time is within or outside the defined hours. Finally it takes action if warranted - e.g if it finds a server that is running but is scheduled to be off - it will stop the server.

When you have finished pasting the code, there is one more change we must make to this page: located under Basic Settings, the default timeout for lambda functions is 3 seconds but we will need to increase this (I used 59 seconds), otherwise our function will timeout before it completes. Click Save to save the function - having saved it we can press the Test button to verify that it runs correctly (You might want to tinker with the schedule for each test while you do this)

Cloudwatch Rule

Now that our lambda function is prepared, we need to automate it using a CloudWatch rule. Navigate to the CloudWatch Management Console Services -> CloudWatch -> Rules and select Create Rule. Below are the settings I used to create the rule – the below cron schedule will run the function every 2 hours Monday-Friday, though you might want to change this cron schedule to suit your needs better.

Cron Schedule

Logging

Every time this lambda function runs, it will print a log to a folder in CloudWatch. On the AWS Management Console, navigate to Services -> CloudWatch -> Logs.

Opening the relevant folder for our function gives a series of logs streams, each one from a different time the function was run. We can use these logs to confirm that our server turned on / off at the correct time - or to investigate in the case of a mixbehaving lambda.

Points to Consider