>
you're reading...
Amazon Web Services

Automated Attached EBS Volume Backup Solution


There are as many backup solutions on Amazon as there are applications. I have launched many EC2 instances in AWS and, while the storage configurations vary for each instance – instance-store, single EBS volumes with an ext3 filesystem, multiple EBS volumes with an xfs filesystem with a virtual group and logical volume, etc. I needed a common reliable method to execute daily consistent snapshots of my storage volumes. (Solution: create-ebs-snapshots.sh)

UPDATE: This script was recently updated (2013 May 22) to account for operating systems that shift the mount points internally (e.g. sda or xvda, as reported by AWS ec2-describe-volumes as the mount point, becomes xvde or xvde1 when viewing `df` from the server command line). In my brief research it was mentioned that there are multiple potential causes – XEN, Udev, etc. Regardless of that fact, why it happens to me is not necessarily important but I needed to be able to work around that without requiring the user to implement kernel modifications. I therefore added a “detection” by determining if the root mount (after stripped of any ending numbers on the mount point to account for virtualization) ends with the letter “a”. If not then I calculate the offset (e.g. if it ends in “e” then the offset is 4 because “e” is 4 letters after “a”) by subtracting the ASCII character codes. Later, if it cannot identify the mount point as reported by ec2-describe-volumes then it shifts the mount point by adjusting the ending letter (“a” in “xvda”) by the offset – thus “xvda” becomes “xvde” and applies this to the other mounts as well. So far this now works for all of my CentOS EC2 instances that were vulnerable to this block device rewriting … wherever it came from!

Assumptions

  1. The environment is Linux; this is a bash script and will not work on other operating systems.
  2. Environment variables (path definitions) are defined in /etc/profile.
  3. Instance is EBS backed (it would probably run on instance-store instances but it would ignore the root volume).

Dependencies

  1. ec2-api-tools
  2. cloud-utils
  3. pvdisplay / lvdisplay (lvm2)
  4. xfs_freeze (xfsprogs)

Approach

So, I outlined the workflow for this script as follows:

  1. Identify this instance in Amazon EC2.
  2. Fetch a list of all attached EBS volumes.
  3. Identify volume type for each volume (isolated, XFS, virtual groups).
  4. Identify the root volume (in some cases, snapping the root volume could cause the instance to crash in an irrecoverable manner).
  5. Identify existing snapshots related the the volumes identified in step #2.
  6. Mark the existing snapshots for deletion (we don’t want to actually delete them until new snapshots are complete in case the instance crashes during the snapshot process).
  7. Create new snapshots for each volume.
  8. Delete old snapshots (marked in step #6).
  9. Confirm process is complete and display any recorded errors.

This general outline is straightforward (in process, no necessarily the script for it) but due to variations across configurations I had to handle step #6 differently depending on the storage configuration(s) identified in step #3. These are the three simplified processes I would follow based on configuration:

  1. Isolated – create regular snapshot.
  2. XFS – Freeze mount point, snapshot, thaw mount point.
  3. Virtual group – Freeze all mounts in each volume group, snapshot, thaw all mount points.

I added an additional caveat (enabled/disabled with a variable in the script) to “allow root freeze” (default=”no”).

There were some specific challenges that I ran into multiple times depending which AMI I selected and as Amazon changed mount configurations over time (or perhaps depending on the version of Linux you choose):

  1. Identifying the root volume was not always straight forward – you’ll see the different approaches I use to handle this.
  2. Identifying attached devices (two patterns: /sd_ and /xvd_). I first get the root device and then do a string test to detect.

Solution

#!/bin/bash

# EC2 environment variables and JAVA_HOME must already be defined
source /etc/profile

# Configuration
allow_root_freeze="no" # if set to "yes" then root XFS partitions will still be frozen and backed up.

# Dependencies:
#   ec2-api-tools
#   ec2-metadata
#   cloud-utils
#   pvdisplay (for volume group tests)
#   lvdisplay (for volume group tests)
#   xfs_freeze (for XFS filesystems to get consistent snapshots)

# Identify API commands and dependency paths
CMD_LVDISPLAY=/sbin/lvdisplay
CMD_METADATA=ec2metadata
CMD_PVDISPLAY=/sbin/pvdisplay
CMD_SNAPSHOT_CREATE=ec2-create-snapshot
CMD_SNAPSHOT_DESCRIBE=ec2-describe-snapshots
CMD_SNAPSHOT_DELETE=ec2-delete-snapshot
CMD_TAG_CREATE=ec2-create-tags
CMD_TAG_DESCRIBE=ec2-describe-tags
CMD_VOLUME_DESCRIBE=ec2-describe-volumes
CMD_XFS_FREEZE=/usr/sbin/xfs_freeze

# Functions
function contains() {
    local n=$#
    local value=${!n}
    for ((i=1;i < $#;i++)) {
    if [ "${!i}" == "${value}" ]; then
        echo "y"
        return 0
    fi
    }
    echo "n"
    return 1
}

# Fetch EC2 metadata to get current instance id
ec2_instance_id=$($CMD_METADATA --instance-id | awk '{print $2}')

if [ "$ec2_instance_id" == "" ] || [ "$ec2_instance_id" == "" ]; then
    ec2_instance_id=$($CMD_METADATA --instance-id | awk '{print $1}')
fi

if [ "$ec2_instance_id" == "" ] || [ "$ec2_instance_id" == "" ]; then
    echo -e "Unable to determine EC2 instance id, please check this code before continuing in order to prevent corruption or deletion of unrelated snapshots.\r"
    exit -1;
fi

if [ "$ec2_instance_id" == "Command" ]; then
    echo -e "You must first set EC2_PRIVATE_KEY and EC2_CERT environment variables (in /etc/profile) in order for this script to work.\r"
    exit -1;
else
    echo -e "Identified instance as $ec2_instance_id.\r"
fi

# Fetch list of all volumes attached to this instance
volume_list=$($CMD_VOLUME_DESCRIBE | grep "${ec2_instance_id}" | awk '{print $2}')

# Identify root device
root_mount_point=$(mountpoint -d /)
use_xd_device_prefix="no"
for file in $(find /dev)
do
    device_major=$(stat --printf="%t" "$file")
    device_minor=$(stat --printf="%T" "$file")
    if [ "$device_major:$device_minor"  == "$root_mount_point" ]; then
        root_device="$file"
        break;
    else # Try decimal comparison
        device_major=$(printf "%d\n" "0x$device_major")
        device_minor=$(printf "%d\n" "0x$device_minor")
        if [ "$device_major:$device_minor"  == "$root_mount_point" ]; then
            root_device="$file"
            break;
        fi
    fi
done
if [ "$root_device" == "" ]; then
    root_device=$(readlink -f "/dev/root")
fi
if [[ "$root_device" == *"xvd"* ]]; then
    use_xd_device_prefix="yes"
fi
root_device=$(echo "$root_device" | sed 's/[0-9]*//g')
echo -e "Identified root device as $root_device.\r"

root_device_last_character=${root_device#${root_device%?}}

use_block_device_offset="no"
if [[ "$root_device_last_character" != "a" ]]; then
    use_block_device_offset="yes"
    root_device_last_character=$(echo "$root_device_last_character" | awk '{print tolower($0)}')
    ascii_block_device_last_character=$(printf "%d\n" "'$root_device_last_character")
    block_device_offset=`expr $ascii_block_device_last_character - 97`
    echo -e "  Detected that root device is offset (to ___$root_device_last_character) and will shift additional devices $block_device_offset places accordingly [if not found in /dev]."
fi

# Determine which volumes are isolated, XFS, and LVM groups (they need to be backed up differently)
ignored_volumes=()
isolated_volumes=()
xfs_volumes=()
virtual_group_volumes=()
virtual_group_names=()
virtual_groups=()
root_volume="%ROOT%"
skipped_root_volume=""
for ec2_volume in ${volume_list[@]}
do
    volume_virtual_group=""
    volume_mount_point=$($CMD_VOLUME_DESCRIBE $ec2_volume | grep "ATTACHMENT" | awk '{print $4}')
    if [ "$use_xd_device_prefix" == "yes" ] && [ "$volume_mount_point" != *"xvd"* ]; then
        volume_mount_point=$(echo "$volume_mount_point" | sed 's/\/sd/\/xvd/g')
    fi
    volume_filesystem=$(df -T | grep "${volume_mount_point}" | awk 'NF == 1 {printf($1); next}; {print}' | awk '{print $2}')
    if [ -z $volume_filesystem ]; then
        # Test if volume is part of an LVM/VG
        volume_virtual_group=$($CMD_PVDISPLAY $volume_mount_point | grep "VG Name" | awk '{print $3}')
        if [ -z "$volume_virtual_group" ] && [ "$use_block_device_offset" == "yes" ]; then
            volume_mount_point_last_letter=${volume_mount_point#${volume_mount_point%?}}
            ascii_volume_mount_point_last_letter=$(printf "%d\n" "'$volume_mount_point_last_letter")
            ascii_offset_volume_mount_point_last_letter=`expr $ascii_volume_mount_point_last_letter + $block_device_offset`
            offset_volume_mount_point_last_letter=$(awk -v char=$ascii_offset_volume_mount_point_last_letter 'BEGIN { printf "%c\n", char; exit }')
            volume_mount_point_prefix=$(echo "${volume_mount_point%?}")
            volume_mount_point=$volume_mount_point_prefix$offset_volume_mount_point_last_letter
            echo -e "   * Testing offset volume mount point, $volume_mount_point ..."
            volume_virtual_group=$($CMD_PVDISPLAY $volume_mount_point | grep "VG Name" | awk '{print $3}')
        fi
        if [ -z "$volume_virtual_group" ]; then
            # Test if volume is root device
            if [ "$volume_mount_point" == "$root_device" ] || [ -n "$(file -s $volume_mount_point | grep "rootfs")" ]; then
                volume_filesystem=$(df -T | grep '/$' | awk 'NF == 1 {printf($1); next}; {print}' | awk '{print $2}')
            else
                echo -e "   Ignoring volume $ec2_volume attached to $volume_mount_point, could not determine filesystem.\r"
                ignored_volumes=( "${ignored_volumes[@]}" "$ec2_volume" )
            fi
        fi
        if [ "$volume_virtual_group" != "" ]; then
            echo -e "   Identified volume $ec2_volume attached to $volume_mount_point part of virtual group \"$volume_virtual_group\".\r"
            virtual_group_volumes=( "${virtual_group_volumes[@]}" "$ec2_volume" )
            virtual_group_names=( "$volume_virtual_group" )
            if [ $(contains "${virtual_groups[@]}" "$volume_virtual_group") == "n" ]; then
                virtual_groups=( "${virtual_groups[@]}" "$volume_virtual_group" )
            fi
        fi
    fi
    if [ "$volume_virtual_group" == "" ] && [ "$volume_filesystem" != "" ]; then
        echo -e "   Identified isolated volume $ec2_volume attached to $volume_mount_point with filesystem type $volume_filesystem.\r"
        if [ "$volume_filesystem" == "xfs" ]; then
            xfs_volumes=( "${xfs_volumes[@]}" "$ec2_volume" )
        else
            isolated_volumes=( "${isolated_volumes[@]}" "$ec2_volume" )
        fi
    fi
    if [ "$volume_mount_point" == "$root_device" ] || [ -n "$(file -s $volume_mount_point | grep "rootfs")" ]; then
        root_volume="$ec2_volume"
        echo -e "   Identified volume $ec2_volume attached to $volume_mount_point as root device with filesystem type $volume_filesystem.\r"
    fi
done

# Identify old snapshots for deletion (do not delete until new snapshots are in progress)
echo -e "Identifying old snapshots for deletion...\r"
i=0
for ec2_volume in ${volume_list[@]}
do
    volume_snapshots=( $($CMD_SNAPSHOT_DESCRIBE | grep "SNAPSHOT" | grep "${ec2_volume}" | awk '{ print $2 }') )
    for volume_snapshot in ${volume_snapshots[@]}
    do
        volume_id=$($CMD_SNAPSHOT_DESCRIBE $volume_snapshot | grep "SNAPSHOT" | awk '{print $3}')
        volume_name=$($CMD_TAG_DESCRIBE --filter "resource-id=$volume_id" --filter "key=Name" | cut -f5)
        if [ "$volume_name" != "" ]; then
            snapshots[i]=$volume_snapshot
            echo -e "   Found and marked snapshot $volume_snapshot for deletion.\r"
            snapshot_label="PENDING: $volume_name"
            echo -e "   Labeling snapshot $volume_snapshot as \"$snapshot_label\""
            $CMD_TAG_CREATE $volume_snapshot --tag "Name=$snapshot_label"
            let i+=1
        else
            echo -e "  Error detecting volume associated with snapshot $volume_snapshot, will not relabel or attempt to delete."
        fi
    done
done

# Initiate isolated volume snapshots
echo -e "Creating snapshots of isolated volumes...\r"
i=0
pids=()
for ec2_volume in ${isolated_volumes[@]}
do
    new_snapshot=$($CMD_SNAPSHOT_CREATE $ec2_volume &)
    volume_snapshot=$(echo $new_snapshot | awk '{print $2}')
    volume_id=$(echo $new_snapshot | awk '{print $3}')
    volume_name=`$CMD_TAG_DESCRIBE --filter "resource-id=$volume_id" --filter "key=Name" | cut -f5`
    current_date=$(date +%Y-%m-%d)
    snapshot_label="$current_date: $volume_name"
    echo -e "   Labeling snapshot $volume_snapshot as \"$snapshot_label\""
    $CMD_TAG_CREATE $volume_snapshot --tag "Name=$snapshot_label"
    pids[i]=$!
    let i+=1
done
for pid in ${pids[@]}
do
    wait $pid
done

# Initiate XFS volume snapshots
if [ ${#xfs_volumes[@]} != 0 ]; then
    echo -e "Creating snapshots of XFS volumes (mounts frozen during snapshot initiation)...\r"
    i=0
    pids=()
    mount_points=()
    for ec2_volume in ${xfs_volumes[@]}
    do
        if [ "$ec2_volume" == "$root_volume" ] && [ "$allow_root_freeze" != "yes" ]; then
            echo -e "  Skipping volume $ec2_volume because it is a root volume and will cause the system to hang indefinitely...\r"
            skipped_root_volume=$root_volume
        else
            volume_mount_point=$($CMD_VOLUME_DESCRIBE $ec2_volume | grep "ATTACHMENT" | awk '{print $4}')
            volume_mount_path=$(df -T | grep "$volume_mount_point" | awk 'NF == 1 {printf($1); next}; {print}' | awk '{print $7}')
            if [ -n $volume_mount_path ]; then
                mount_points=( "${mount_points[@]}" "$volume_mount_point" )
                echo -e "   Freezing mount at $volume_mount_path...\r"
                $CMD_XFS_FREEZE -f $volume_mount_path
                new_snapshot=$($CMD_SNAPSHOT_CREATE $ec2_volume &)
                volume_snapshot=$(echo $new_snapshot | awk '{print $2}')
                volume_id=$(echo $new_snapshot | awk '{print $3}')
                volume_name=`$CMD_TAG_DESCRIBE --filter "resource-id=$volume_id" --filter "key=Name" | cut -f5`
                current_date=$(date +%Y-%m-%d)
                snapshot_label="$current_date: (XFS) $volume_name"
                echo -e "   Labeling snapshot $volume_snapshot as \"$snapshot_label\"\r"
                $CMD_TAG_CREATE $volume_snapshot --tag "Name=$snapshot_label"
                pids[i]=$!
                let i+=1
            else
                echo -e "  Error freezing mount point for volume $ec2_volume because mount path was not found, skipping..."
                ignored_volumes=( "${ignored_volumes[@]}" "$ec2_volume" )
            fi
        fi
    done
    for pid in ${pids[@]}
    do
        wait $pid
    done
    for volume_mount_path in ${mount_points[@]}
    do
        echo -e "   Thawing mount at $volume_mount_path...\r"
        $CMD_XFS_FREEZE -u $volume_mount_path
    done
else
    echo -e "  No XFS volumes (${#xfs_volumes[@]}) were detected.\r"
fi

# Initiate virtual group volume snapshots
if [ ${#virtual_group_volumes[@]} != 0 ] && [ ${#virtual_groups[@]} != 0 ]; then
    echo -e "Creating snapshots of virtual group volumes (mounts frozen during snapshot initiation)...\r"
    i=0
    pids=()
    mount_points=()
    echo -e
    for virtual_group in ${virtual_groups[@]}
    do
        virtual_group_mount_point=$($CMD_LVDISPLAY $virtual_group | grep "LV Path" | awk '{print $3}')
        if [ -d "$virtual_group_mount_point" ]; then
            virtual_group_mount_pointer=$(mountpoint -d "$virtual_group_mount_point")
        else
            virtual_group_mount_pointer=$(mountpoint -x "$virtual_group_mount_point")
        fi
        if [ "$virtual_group_mount_pointer" == "" ]; then
            if [ -L "$virtual_group_mount_point" ]; then
                virtual_group_device_path=$(readlink -f $virtual_group_mount_point)
            fi
        else
            for file in $(find /dev)
            do
                device_major=$(stat --printf="%t" "$file")
                device_minor=$(stat --printf="%T" "$file")
                if [ "$device_major:$device_minor"  == "$virtual_group_mount_pointer" ]; then
                    virtual_group_device_path="$file"
                    break;
                else # Try decimal comparison
                    device_major=$(printf "%d\n" "0x$device_major")
                    device_minor=$(printf "%d\n" "0x$device_minor")
                    if [ "$device_major:$device_minor"  == "$virtual_group_mount_pointer" ]; then
                        virtual_group_device_path="$file"
                        break;
                    fi
                fi
            done
        fi
        virtual_group_mount_path=$(df -a | grep "$virtual_group_device_path" | awk '{print $6}')
        # Try reverse lookup in /dev/mapper directly
        if [ "$virtual_group_mount_path" == "" ]; then
            for file in $(find /dev/mapper)
            do
                target=$(readlink -f "$file")
                if [ "$target" == "$virtual_group_device_path" ]; then
                    virtual_group_device_path="$file"
                    break;
                fi
            done
            virtual_group_mount_path=$(df -a | grep "$virtual_group_device_path" | awk '{print $6}')
        fi
        if [ "$virtual_group_mount_path" != "" ] && [ "$virtual_group_device_path" != "" ]; then
            mount_points=( "${mount_points[@]}" "$virtual_group_mount_path" )
            echo -e "   Freezing mount at $virtual_group_mount_path for virtual group \"$virtual_group\"...\r"
            $CMD_XFS_FREEZE -f "$virtual_group_mount_path"
        else
            echo -e "  Error: Could not identify mountpoint to freeze, proceeding without freezing. This may not be consistent!\r"
        fi
    done
    for ec2_volume in ${virtual_group_volumes[@]}
    do
        # This check is probably irrelavant since I don't believe you could have a root partition as part of a volume group without a custom kernel
        if [ "$ec2_volume" == "$root_volume" ] && [ "$allow_root_freeze" != "yes" ]; then
            echo -e "  Skipping volume $ec2_volume because it is a root volume and will cause the system to hang indefinitely...\r"
            skipped_root_volume=$root_volume
        else
            new_snapshot=$($CMD_SNAPSHOT_CREATE $ec2_volume &)
            volume_snapshot=$(echo $new_snapshot | awk '{print $2}')
            volume_id=$(echo $new_snapshot | awk '{print $3}')
            volume_name=`$CMD_TAG_DESCRIBE --filter "resource-id=$volume_id" --filter "key=Name" | cut -f5`
            current_date=$(date +%Y-%m-%d)
            virtual_group_name=${virtual_group_names[@]}
            snapshot_label="$current_date: (VG - $virtual_group_name) $volume_name"
            echo -e "   Labeling snapshot $volume_snapshot as \"$snapshot_label\""
            $CMD_TAG_CREATE $volume_snapshot --tag "Name=$snapshot_label"
            pids[i]=$!
            let i+=1
        fi
    done
    for pid in ${pids[@]}
    do
        wait $pid
    done
    for virtual_group_mount_path in ${mount_points[@]}
    do
        echo -e "   Thawing mount at $virtual_group_mount_path...\r"
        $CMD_XFS_FREEZE -u "$virtual_group_mount_path"
    done
else
    echo -e "  No virtual volumes (${#virtual_group_volumes[@]}) and no virtual groups (${#virtual_groups[@]}) were detected.\r"
fi

# Snapshots initiated, delete old snapshots
echo -e "Snapshots initiated, delete old snapshots...\r"
for snapshot in ${snapshots[@]}
do
    result=$($CMD_SNAPSHOT_DELETE "$snapshot")
    # This does not currently work as the result of the delete command cannot be captured
    if [[ "$result" == *"InvalidSnapshot.InUse"* ]]; then
        echo -e "  Error: Could not delete snapshot $snapshot because it is currently in use by an AMI, renaming accorindgly."
        volume_id=$($CMD_SNAPSHOT_DESCRIBE "$snapshot" | grep "SNAPSHOT" | awk '{print $3}')
        volume_name=`$CMD_TAG_DESCRIBE --filter "resource-id=$volume_id" --filter "key=Name" | cut -f5`
        current_date=$(date +%Y-%m-%d)
        snapshot_label="$current_date: [AMI] $volume_name"
        echo -e "   Labeling snapshot $snapshot as \"$snapshot_label\""
        $CMD_TAG_CREATE $snapshot --tag "Name=$snapshot_label"
    fi
done

# All done!
echo -e 'Backup complete, deprecated snapshots removed, and new snapshots labeled as "YYYY-MM-DD: [(type)] Volume Id".\r'
if [ "$skipped_root_volume" != "" ]; then
    echo -e "The root volume $skipped_root_volume was skipped because it is an XFS volume and would cause the system to hang indefinitely.\r"
fi
if [ ${#ignored_volumes[@]} != 0 ]; then
    echo -e "The following volumes were ignored due to unknown filesystem (to prevent corruption) or unknown mount path (could not freeze XFS mount):\r"
    echo $ignored_volumes
fi
exit 0;

Result (Sample Output)

Identified instance as i-#######d.
Identified root device as /dev/sda1.
   Identified isolated volume vol-#######0 attached to /dev/sda1 with filesystem type xfs.
   Identified volume vol-#######0 attached to /dev/sda1 as root device with filesystem type xfs.
   Identified isolated volume vol-#######2 attached to /dev/sdf with filesystem type ext4.
Identifying old snapshots for deletion...
   Found and marked snapshot snap-#######0 for deletion.
   Labeling snapshot snap-#######0 as "PENDING: Some EBS Volume Root"
TAG	snapshot	snap-#######0	Name	PENDING: Some EBS Volume Root
   Found and marked snapshot snap-#######4 for deletion.
   Labeling snapshot snap-#######4 as "PENDING: Some EBS Volume Data"
TAG	snapshot	snap-#######4	Name	PENDING: Some EBS Volume Data
   Found and marked snapshot snap-#######2 for deletion.
   Labeling snapshot snap-#######2 as "PENDING: Some Other EBS Volume"
TAG	snapshot	snap-#######2	Name	PENDING: Some Other EBS Volume
Creating snapshots of isolated volumes...
   Labeling snapshot snap-#######8 as "2012-12-01: Some Other EBS Volume"
TAG	snapshot	snap-#######8	Name	2012-12-01: Some Other EBS Volume
Creating snapshots of XFS volumes (mounts frozen during snapshot initiation)...
  Skipping volume vol-#######0 because it is a root volume and will cause the system to hang indefinitely...
  No virtual volumes (0) and no virtual groups (0) were detected.
Snapshots initiated, delete old snapshots...
Client.InvalidSnapshot.InUse: The snapshot snap-#######2 is currently in use by ami-#######a
Client.InvalidSnapshot.InUse: The snapshot snap-#######4 is currently in use by ami-#######b
Backup complete, deprecated snapshots removed, and new snapshots labeled as "YYYY-MM-DD: [(type)] Volume Id".
The root volume vol-#######0 was skipped because it is an XFS volume and would cause the system to hang indefinitely.

Result (Sample Output) – Excerpt of Virtual Groups with XFS Freezing

Creating snapshots of virtual group volumes (mounts frozen during snapshot initiation)...
   Freezing mount at /mnt for virtual group "virtualgroup"...
   Labeling snapshot snap-#######f as "2012-12-01: (VG - virtualgroup) Some Server EBS Volume 0"
TAG	snapshot	snap-#######f	Name	2012-12-01: (VG - virtualgroup) Some Server EBS Volume 0
   Thawing mount at /mnt...
Snapshots initiated, delete old snapshots...

Result (Sample Output) – Block Device Mount Point Rewrite in CentOS

Identified instance as i-#######5.
Identified root device as /dev/xvde.
  Detected that root device is offset (to ___e) and will shift additional devices 4 places accordingly [if not found in /dev].
  Failed to read physical volume "/dev/xvda"
   * Testing offset volume mount point, /dev/xvde ...
  Failed to read physical volume "/dev/xvde"
   Identified isolated volume vol-#######d attached to /dev/xvde with filesystem type ext4.
   Identified volume vol-#######d attached to /dev/xvde as root device with filesystem type ext4.
  Failed to read physical volume "/dev/xvdb"
   * Testing offset volume mount point, /dev/xvdf ...
   Identified volume vol-#######8 attached to /dev/xvdf part of virtual group "virtualgroup".
Identifying old snapshots for deletion...
   Found and marked snapshot snap-#######a for deletion.
   Labeling snapshot snap-#######a as "PENDING: Some EBS Volume Root"
TAG    snapshot    snap-#######a    Name    PENDING: Some EBS Volume Root
   Found and marked snapshot snap-#######9 for deletion.
   Labeling snapshot snap-#######9 as "PENDING: Some Other EBS Volume"
TAG    snapshot    snap-#######9    Name    PENDING: Some Other EBS Volume
Creating snapshots of isolated volumes...
   Labeling snapshot snap-#######e as "2013-05-22: Some EBS Volume Root"
TAG    snapshot    snap-#######e    Name    2013-05-22: Some EBS Volume Root
  No XFS volumes (0) were detected.
Creating snapshots of virtual group volumes (mounts frozen during snapshot initiation)...
   Freezing mount at /mnt for virtual group "virtualgroup"...
   Labeling snapshot snap-#######2 as "2013-05-22: (VG - virtualgroup) Some Other EBS Volume"
TAG    snapshot    snap-#######2    Name    2013-05-22: (VG - virtualgroup) Some Other EBS Volume
   Thawing mount at /mnt...
Snapshots initiated, delete old snapshots...
Backup complete, deprecated snapshots removed, and new snapshots labeled as "YYYY-MM-DD: [(type)] Volume Id".

Development

Please tag me or comment if you reuse this script. I always appreciate feedback or improvement suggestions. Most of all I would really enjoy seeing a Windows version of this script that utilizes VSS to quiesce the filesystem to prevent corruption of, for example, a volume that contains an active MS SQL database store.

The above script runs on Linux volumes (with root freezing disabled) and live database stores with minimal impact and without causing outages (this could vary in different AMIs).

Disclaimer

Please note that I take no responsibility for your use of this script. I use it personally and I have tested it within the confines of my own environments (all Ubuntu EC2 instances, multiple sizes, multiple EBS configurations) but that in no way implies that it will work in yours. Like any application test it first!

Snapshots tend to take a while depending on the size and how often you execute them. If you snapshot 4 TB across 4 EBS volumes it might take 24 hours the first time but if you automate this script to run on a daily cron job (which is the configuration I’ve implemented with output to email or logging software such as Sentry) then it will likely complete within seconds or under one minute. My point is, don’t be discouraged the first time – run it again and see how much faster it is.

Advertisements

About christopherjcoleman

Independent IT Consultant. Cloud Expert. United States Navy Veteran. Dedicated. Focused. Driven. I make companies better by developing applications to meet specific business needs on reliable, cost-efficient cloud infrastructure. If the right solution doesn't exist then create it. I have achieved my greatest accomplishments because someone else told me "it's not possible; there is no way to do it" - and now there is.

Discussion

4 thoughts on “Automated Attached EBS Volume Backup Solution

  1. UPDATE: My recent installations indicate no need for installation of SunJava 6 SDK or setting the JAVA_HOME directory in /etc/profile. All dependencies can now be installed via apt-get using readily available repositories (no more downloads or custom installations). In Ubuntu, these dependencies reside in the “multiverse” repositories.

    Posted by christopherjcoleman | 2013-03-04, 15:36
  2. Excellent Work!! Applied and working… Congratulations.

    Posted by Alex Trejo (@TrejooAlex) | 2013-05-19, 02:49
  3. UPDATE: This script was recently updated (2013 May 22) to account for operating systems that shift the mount points internally (e.g. sda or xvda, as reported by AWS ec2-describe-volumes as the mount point, becomes xvde or xvde1 when viewing `df` from the server command line). In my brief research it was mentioned that there are multiple potential causes – XEN, Udev, etc. Regardless of that fact, why it happens to me is not necessarily important but I needed to be able to work around that without requiring the user to implement kernel modifications. I therefore added a “detection” by determining if the root mount (after stripped of any ending numbers on the mount point to account for virtualization) ends with the letter “a”. If not then I calculate the offset (e.g. if it ends in “e” then the offset is 4 because “e” is 4 letters after “a”) by subtracting the ASCII character codes. Later, if it cannot identify the mount point as reported by ec2-describe-volumes then it shifts the mount point by adjusting the ending letter (“a” in “xvda”) by the offset – thus “xvda” becomes “xvde” and applies this to the other mounts as well. So far this now works for all of my CentOS EC2 instances that were vulnerable to this block device rewriting … wherever it came from!

    Posted by christopherjcoleman | 2013-05-22, 00:42
  4. Hi Christopher,

    there’s a bug when trying to thaw frozen XFS volumes ( $CMD_XFS_FREEZE -u $volume_mount_path): it’s trying to unfreeze the device (ie. /dev/xvdg) and not the mount path (ie. /mnt/mymountpoint).

    Also I’m not convinced the snapshot naming is working as it should, it leaves “PENDING” tags behind, but i haven’t got to the bottom of this yet.

    i’ll have a crack at fixing the ‘unfreeze’ bug myself, but if you get there first please let me know.

    otherwise a very useful script. thanks a lot.

    cheers
    andy

    Posted by andy ryan | 2016-11-01, 14:18

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: