I’ve been asked a couple of times how Cinder’s Incremental Backup works and what do I actually mean when I say we need to rework Ceph’s Backup driver to support Cinder’s Incremental Backup. So I’ll try to explain both in this post.
Incremental Backup in Cinder was designed with Swift back-end in mind, so reference implementation is versatile enough to handle size limitations on the back-end, and new drivers can relatively easily extend from this implementation to support incremental backups.
In general incremental backup is quite straightforward, although it has some peculiarities. If you are only interested in how the driver does the incremental backup and not in the whole flow, you may want to skip to the Backup Driver section.
This is a diagram of the incremental backup workflow that is later explained in more detail:
1. Client backup request
2. Run all checks, reserve quota and look for latest backup
3. Change volume status and from the Cinder API Node
4. RPC call to Cinder Volume Node to do the backup
5. Return backup information to client
6. Check that volume and backup are in correct status
7. Call source volume’s driver to start the backup
8. Attach source volume from volume back-end for reading
9. Call backup driver to do the backup with attached volume
10. Detect changed chunks since last backup
11. Save backup extents in back-end storage
12. Change volume and backup status in DB
When a request for a backup comes in the Cinder API Node with optional incremental argument set to True, the API extracts all arguments from request and calls backup creation section of the API for a series of checks: policy, source volume availability, quota for number of backups and total gigabytes, and status of backup service on volume’s host – right now backup service is tightly coupled to Cinder Volume, so backup service must be enabled in the node hosting the volume.
Once these checks pass, quota is reserved and we search for latest backup of the source volume confirming it is “available” and we set it as the parent for the new backup. Volume status is changed to “backing-up” and a DB entry is created for the new backup in “creating” state.
Only then the backup service on the Cinder Volume Node is called to do the backup using an RPC call, and we finish by committing the quota and returning backup information to REST API caller.
Of course, during this process we’ve had data serialization and deserialization calls, response rendering, WSGI calls, AMQP messaging, and a lot more stuff, but lets abstract ourselves from all that since that is common to all Cinder APIs.
When the Backup Service on the Cinder Volume Node receives, from the API Node, the request to create the backup it checks that source volume and destination backup status are correct and that volume driver is initialized prior to calling the Volume Driver’s backup method passing it the Backup Driver that will be used to do the backup.
Once the backup is completed, it returns volume status to “available” and sets backup’s status to “available” as well.
Even thought it might seem counter intuitive, it is the volume driver the one that actually starts the backup process. The reason behind this is that there are some pre and post backups steps related to the source volume that need to be taken cared of.
Using the brick library the Volume Driver attaches source volume – taking into consideration multipath configuration – and opens the volume as a file before calling backup driver with it. After the backup driver has been finished the source volume will be detached.
This is where the magic happens, and this is also were we find 3 different kinds of drivers based on their incremental backup capabilities:
- Drivers that have no incremental backup: IBM’s TSM
- Drivers that implement incremental backup only when source volume is stored on the same kind of back-end: Ceph
- Drivers that capable of incremental backups regardless of were the source volume resides: Swift, NFS
For now I’ll only explain the third kind of drivers, those that inherit from ChunkedDriver and can do backups from any Volume Back-end, since this is the general feature introduced in Kilo.
Now we are at the backup driver with a file like object that gives us access to the source volume, and we are ready to do an incremental backup.
First thing we do is retrieve the SHA file of the parent backup that was set at the Cinder API Node. You can think of this SHA-256 information as a list of fingerprints for all the chunks of data from the volume. To get this list we run a mathematical function over each chunk of data and get a list of 256 bits results that will be stored as the fingerprint of those chunks. This will allow us to compare these stored values of the parent’s backup with the fingerprints of the current volume, thus identifying which chunks have changed.
Once we have retrieved this SHA information from the backup back-end, we have to confirm that the size of the volume hasn’t changed since the last backup.
Now that we have the parent’s fingerprints we’ll go through all the chunks of data in the volume:
- Reading the data
- Calculating the list of SHA-256 for read data chunk. It is a list because the SHA block size is smaller than the chunk size, which allows us to have finer granularity when saving changes.
- Comparing the SHA fingerprint of read data with parent’s backup fingerprint, and when we find a change we mark it as the start of the changes and we advance until we find the next SHA block that hasn’t changed and then we:
- Create a new extent with this changed data in the backup back-end as an object
- Compress source data if compression is enabled and compressing it is effectively reducing its size.
- Add this new object’s name, offset, length, md5, compression algorithm and object id to the backup metadata.
Once we have finished backing up all the changes, we save the SHA information for this new backup as a specific object in the backup back-end.
And as the last step we retrieve the source volume metadata so we can save it as a different object on the backup back-end together with the backup metadata that contains the extent information.
Ceph Incremental Backup
Like it was mentioned before, Ceph’s Backup Driver can only do incremental backups when the volume is also stored in a Ceph back-end. The back-ends do not have to be on the same cluster, it will work in the same cluster as well as between different clusters.
With the Ceph driver you can’t choose whether you want a full or an incremental backup, it is automatic. If the origin volume is stored on a Ceph back-end then the backup will be incremental if a full backup already exists, and full if this is the first backup of the volume. If the volume is not stored on a Ceph back-end all backups will be full backups.
Backups are stored as RBD and the incremental backup process is really simple. First, make sure that we have a base backup on the destination cluster and create one if we don’t have it.
Now we get the latest backup snapshot from the source volume, yes the source volume, because we’ll be using snapshots on the source volume to calculate the differential between backups.
Now create a new backup snapshot of the source volume so we can do a differential between the two snapshots or backup the whole snapshot if this is the first backup of the volume.
The differential is calculated by the Ceph cluster using the “export-diff” feature instead of calculating it on the Cinder Volume Node like the standard incremental backup feature does, and then a new snapshot on the backup is created with this delta is using the “import-diff” feature. You can read more about how the manual process is in this post
Once the new snapshot is created on the backup back-end we can delete the old backup snapshot on the source volume and leave only the new backup snapshot we just created.
Finally we store the Backup metadata as a Ceph rados object.
I hope I have been able to successfully explain what I mean when I say that Ceph’s Backup driver doesn’t support Cinder’s Incremental Backup introduced in Kilo and needs to be reworked. In any case, the sort version is: Ceph’s driver needs rework because incremental backups won’t happen of your own accord, and you can’t do incremental backups from non Ceph back-ends.
But even with this limitation, Ceph’s incremental backup is highly efficient and, in general, it is not an issue that you cannot do incremental backups from other back-ends, because if when you are using Ceph for your backups, you most certainly are using it as well for your volumes.
If you’ve read Cinder’s Incremental Backup specs you’ve probably noticed a nice difference between the implementation and the original spec, incremental backups can be performed with respect to an incremental backup, not only with respect to a full backup like the spec says.
Cinder Backup Drivers Incremental Capabilities:
Same back-end type
Any back-end type