Cinder’s Ceph Replication Sneak peek

 

Have you been dying to try out the Volume Replication functionality in OpenStack but you didn’t have some enterprise level storage with replication features lying around for you to play with? Then you are in luck!, because thanks to Ceph’s new RBD mirroring functionality and Jon Bernard’s work on Cinder, you can now have the full replication experience using your commodity hardware and I’m going to tell you how you can have a preview of what’s about to come to Cinder.

sneak peek

RBD mirroring

I’m sure by now there’s no need to explain the power of Ceph or how it has changed IT infrastructures everywhere, but you may have missed the new mirroring feature introduced back in April adding support for mirroring (asynchronous replication) of RBD images across clusters. If you missed it, you may want to go ahead and have a look at the official documentation or S├ębastien Han’s blog post on the subject.

The short version is that RBD images can now be asynchronously mirrored between two Ceph clusters. The capability that makes this possible is the RBD journaling image feature. By leveraging the journaling functionality it can ensure crash-consistent replication between clusters.

RBD mirroring has a good level of granularity, because even if it’s defined on a per-pool basis you can decide whether you want to automatically mirror all images within the pool or only specific images. This means that you could have your Ceph cluster replicated to another cluster using automatic pool mirroring even without the new Ceph replication functionality in Cinder; but that’s not the approach taken by Cinder, since it would be too restrictive and that’s not the Cinder way ;-).

Mirroring is configured using rbd commands, and the rbd-mirror daemon is responsible for pulling image updates from the one peer cluster -thanks to the journaling- and applying them to the image within the other cluster performing the cross-cluster replication.

One interesting thing to know is that mirroring doesn’t need to be enabled both ways, you can have it mirroring just from your primary cluster to your secondary cluster for failover to work, but it will be necessary for the failback; so in this post we will be enabling it both ways.

Replication in Cinder

Initial volume replication was introduced as v1, but in the Liberty cycle v2 was implemented and then improved in the next release. And it’s this Replication v2.1 that is being implemented by this new patch.

This replication version covers only 1 very specific use case, as mentioned in the specifications, and that is that a catastrophic event has occurred on the primary backend storage device, and we want to failover to the volumes available on the replicated backend.

The replication mechanism depends on the driver’s implementation, so there will be drivers that make all created volumes as replicated and those that use a per volume policy and require a volume type to tell them which volumes to replicate. This means that you have to know your backend driver and its features since Cinder doesn’t enforce one behavior or the other.

There is no automatic failover since the use case is Disaster Recovery, and it must be done manually when the primary backend is out.

Volumes that were attached/in-use are a special case and will require that either the tenants or the administrator detach and reattach the volumes since the instances’ attached volumes are addressing the primary backend that is not longer available.

There’s also the possibility to freeze the creation, deletion, extension of volumes or snapshots on a backend that is in failover mode until the admin decides that the system is stable again and those operations can be re-enabled using the thaw command. Freezing the management plane will not stop R/W I/O from being performed on resources that exist.

Ceph Replication in Cinder

To test Ceph’s replication in Cinder there are only a couple of things that you will need:

  1. Cinder, Nova, and Keystone services. These will usually be a part of a full OpenStack deployment.
  2. Two different Ceph clusters.
  3. Configure Ceph clusters in mirrored mode and to mirror the pool used by Cinder.
  4. Copy cluster keys to the Cinder Volume node.
  5. Configure Ceph driver in Cinder to use replication.

The RBD driver in Cinder could have taken care of the third step, but we believe that would violate the separation of concerns between the infrastructure deployment and its usage by Cinder. Not to mention that it would be introducing unnecessary restrictions to deployments -since the driver would probably not be as flexible as Ceph’s tools- and avoidable complexity to the RBD driver. So it is the system administrator’s responsibility to properly setup the Ceph clusters.

While the RBD driver will not enable the mirroring between the two pools and expect it to be set-up by the system administrator, it will be setting journaling and image mirroring for replicated volumes, as we have to explicitly tell rbd-mirror which volumes should be mirrored.

For these tests I’ve decided to use Ubuntu instead of my usual Fedora deployment since it’s the Operating System used at the gate, and in my last test setup people run into problems when testing things under Ubuntu and I wanted to avoid this situation if possible, so all instructions are meant to be run under Ubuntu Xenial.

Jon Bernard has created, not only the patch to implement the feature, but also some Replication Notes that were of great help when automating my deployment.

Since Jon Bernard is in paternity leave and I was familiar with his work I will be taking care of following up on the reviews and updating his patch.

ATTENTION: Following steps are only here to illustrate what needs to be done, for those interested in the details, but the whole process has been automated, so you don’t need to do any of this and can skip to the Automated deployment if you are not interested in the deployment side of things.

Cinder services

We’ll be testing the RBD driver updated with the Replication functionality using DevStack and pulling the rbd replication patch directly from gerrit. Under normal circumstances you would be able to try it with any deployment because you would only need to pull the rbd.py file from the patch under review to get the new functionality, but unfortunately there were some other issues in Cinder 1, 2, 3 that required fixing for the Ceph Replication patch to work.

We’ll be using this configuration to deploy DevStack with 2 different backends, LVM and Ceph, and DevStack will not be provisioning any of our two Ceph clusters, we’ll be doing that ourselves.

Ceph clusters

There are multiple ways to deploy your Ceph clusters, you can do it manually -using ceph-deploy tool or doing all the steps yourself-, or you can use ceph-ansible, TripleO, and even DevStack can deploy one of the clusters when stacking.

There are multiple benefits from having your Ceph cluster out of your DevStack deployment and not letting DevStack deploy it for you, as there’s no need to recreate the primary Ceph cluster every time you run stack.sh, since the Ceph cluster nodes have 2 network interfaces and the Ceph communications are done only on the 10.0.1.x network you can easily simulate losing the network connection just by bringing this interface down on one of the Ceph clusters.

In this explanation I will be using ceph-deploy in a custom script, but you can use any other method to deploy the 2 clusters, it’s all good as long as you end up with 2 clusters that have connectivity between them and with the Cinder Volume service and Nova Computes.

The script we’ll be using to do the deployment is quite simple, thanks to the usage of ceph-deploy, and will be run to create both clusters receiving the node’s binding IP address as argument. This script will also install and deploy the rbd-mirror but will not link both pools since we need to wait until both of them are created, and we’ll be doing the linking from the DevStack node.

#!/bin/bash

my_ip=$1
hostname=`hostname -s`

cd /root

echo "Installing ceph-deploy from pip to avoid existing package conflict"
apt-get update
apt-get install -y python-pip
pip install ceph-deploy

echo "Creating cluster"
ceph-deploy new $hostname --public-network $my_ip/24 || exit 1
echo -e "\nosd pool default size = 1\nosd crush chooseleaf type = 0\n" >> "ceph.conf"

echo "Installing from upstream repository"
echo -e "\n[myrepo]\nbaseurl = http://gitbuilder.ceph.com/ceph-deb-xenial-x86_64-basic/ref/master\ngpgkey = https://download.ceph.com/keys/autobuild.asc\ndefault = True" >> .cephdeploy.conf
ceph-deploy install $hostname || exit 2

echo "Deploying Ceph monitor"
ceph-deploy mon create-initial || exit 3

echo "Creating OSD on /dev/vdb"
ceph-deploy osd create $hostname:/dev/vdb || exit 3

echo "Creating default volumes pool"
ceph osd pool create volumes 100

echo "Deploying Ceph MDS"
ceph-deploy mds create $hostname || exit 4

echo "Health should be OK"
ceph -s

echo "Put Ceph to autostart"
systemctl enable ceph.target
systemctl enable ceph-mon.target
systemctl enable ceph-osd.target

echo "Installing ceph-mirror package and run it as a service"
apt-get install -y rbd-mirror || exit 5
systemctl enable ceph-rbd-mirror@admin
systemctl start ceph-rbd-mirror@admin

Configure Ceph clusters

As mentioned earlier, the system administrator is expected to setup the mirroring of the clusters, and the first step in this process is installing rbd-mirror and running it, which we already did in the previous step when the script run:

user@localhost:$ apt-get install -y rbd-mirror || exit 5

user@localhost:$ sudo systemctl enable ceph-rbd-mirror@admin

user@localhost:$ sudo systemctl start ceph-rbd-mirror@admin

Next we need to tell the rbd-mirror in each of the clusters about its own cluster and the user:group to use:

user@localhost:$ sudo rbd-mirror --setuser ceph --setgroup ceph

Now we have to configure the volumes pool to use per-image mirroring (in both clusters as well):

user@localhost:$ sudo rbd mirror pool enable volumes image

Since each of the rbd-mirror services will be talking with the other cluster, it needs to know about the configuration and have a key to access it. That’s why we now need to copy the configuration and key from one cluster to the other. Since the /etc/ceph/ceph.conf and /etc/ceph/ceph.client.admin.keyring already exist on the clusters we will copy these files using other names, on the primary we’ll copy the files from the secondary as ceph-secondary.conf and ceph-secondary.client.admin.keyring, and on the secondary we’ll do the same calling them ceph-primary.conf and ceph-primary.client.admin.keyring.

Once we have copied the files, remember to set the owner:group to ceph:ceph -since that’s what we told the rbd-mirror to use- and the file mode to 0600.

The last step is to configure the peers of the pools so they know where they should be replicating, primary to secondary, and secondary to primary.

NOTE: It is very important to know that Ceph only works with host names, not IPs, so we need to add to our /etc/hosts file the other node’s cluster name, in our case we need to add ceph-secondary to the primary node’s hosts and ceph-primary to the secondary’s hosts file. If we don’t do this we won’t be able to pair the nodes, since rbd-mirror won’t be able to locate the other cluster.

On the primary cluster we run:

user@localhost:$ sudo rbd mirror pool peer add volumes client.admin@ceph-secondary

And on the secondary cluster we run:

user@localhost:$ sudo rbd mirror pool peer add volumes client.admin@ceph-primary

NOTE: For mirroring to work, pools in both clusters need to have the same name.

Copy cluster keys

Just like we needed to have host names for the mirroring Ceph clusters in each of our clusters, we’ll also need them in our DevStack deployment, same thing with the Ceph configuration and keyring files.

We have to copy these files and make them readable by the user that will be running the DevStack, in our case this will be the vagrant user, and for convenience we won’t be renaming the files from the primary node, as these are the default for the RBD driver in Cinder.

Configure Ceph driver in Cinder to use replication.

The configuration in Cinder is easy and straightforward, what we need are 2 things:

  • Add replication_device configuration option to the driver’s section.
  • Create a specific volume type that enables replication.

The Cinder configuration for replication_device should go in the specific Ceph driver section in /etc/cinder/cinder.conf defining 4 things:

  • The backend_id, which is the host name of the secondary server.
  • The conf file, which will have the configuration for the secondary ceph cluster. There’s no need to define it if the default of /etc/ceph/{backend_id}.conf is valid.
  • The user of the secondary cluster that Cinder RBD driver should use to communicate. Defaults to the same user as the primary.

This is an example of the configuration:

[ceph]
replication_device = backend_id:ceph-secondary, conf:/etc/ceph/ceph-secondary.conf, user:cinder
rbd_max_clone_depth = 5
rbd_flatten_volume_from_snapshot = False
rbd_uuid = 6842adea-8feb-43ac-98d5-0eb89b19849b
rbd_user = cinder
rbd_pool = volumes
rbd_ceph_conf =
volume_driver = cinder.volume.drivers.rbd.RBDDriver
volume_backend_name = ceph

NOTE: I would recommend having the same user and credentials in both clusters, otherwise we’ll have problems when connecting from Nova. In our case we handle this in local.sh copying the cinder user from the primary to the secondary.

Multiple replication backends can be defined by adding more replication_device entries to the configuration.

Assuming that our backend name for the Ceph driver is ceph, creating the volume type would be like this:

user@localhost:$ cinder type-create replicated

user@localhost:$ cinder type-key replicated set volume_backend_name=ceph

user@localhost:$ cinder type-key replicated set replication_enabled=' True'

Configure Nova

We don’t want Nova to use Ceph for ephemeral volumes, because if we did we would need another DevStack deployment to test the contents of the attached volume after the failover. So we need to manually configure the rbd_user and rbd_secret_uuid in nova.conf since DevStack doesn’t do it when we set REMOTE_CEPH=True and ENABLE_CEPH_NOVA=False like we are doing.

In our automated deployment we handle this in local.conf, but it could be added manually to nova.conf in the [libvirt] section. The rbd_user we’ll be using is cinder, and the rbd_secret_uuid is created by DevStack, but we can see it using the virsh command:

user@localhost:$ sudo virsh secret-list
UUID Usage
--------------------------------------------------------------------------------
0fb7d33c-d87b-4edf-a89a-76e0412a103e ceph client.cinder secret

Automated deployment

Since the process is quite long and all these manual steps can be easily automated, that’s what I did, automate them in such a way that just a couple of lines will get me what I want:

  • One VM with my primary Ceph cluster
  • One VM with my secondary Ceph cluster
  • One VM with my all-in-one DevStack
  • A volume type for replicated volumes
  • Everything configured and running

To achieve this I used Vagrant with the libvirt plugin, and some simple bash scripts. If you are using VirtualBox or other hypervisor you can easily adapt the Vagrantfile.

The Vagrantfile by default uses local hypervisor and online configuration files, but this can be changed using variables USE_LOCAL_DATA and REMOTE_PROVIDER.

The steps are these:

  • Create a directory to hold our vagrant box
  • Download vagrant configuration file
  • Bring the VMs up with the basic provisioning
  • Copy Ceph clusters ssh keys around
  • Deploy devstack

Which can be achieved with:

user@localhost:$ mkdir replication

user@localhost:$ cd replication

user@localhost:$ curl -O http://gorka.eguileor.com/files/cinder/rbd_replication/Vagrantfile

user@localhost:$ vagrant up

user@localhost:$ vagrant ssh -c '/home/vagrant/copy_ceph_credentials.sh' -- -l root

user@localhost:$ vagrant ssh -c 'cd devstack && ./stack.sh'

Since some of these steps take a long time and I don’t like to pay much attention to the process, I usually run all this with a one-liner:

user@localhost:$ mkdir replication; cd replication; curl -O http://gorka.eguileor.com/files/cinder/rbd_replication/Vagrantfile; vagrant up | tee r.log && vagrant ssh -c 'sudo ~/copy_ceph_credentials.sh; cd devstack && ./stack.sh' | tee -a r.log

I could have done all this using Ansible, but I didn’t want to add more requirements for the tests

What we get

Default VM is devstack, so once this has finished you will be able to access your 3 VMs with:

user@localhost:$ vagrant ssh

user@localhost:$ vagrant ssh ceph-primary

user@localhost:$ vagrant ssh ceph-secondary

Or from within any of the nodes you can do:

user@localhost:$ sudo ssh devstack

user@localhost:$ sudo ssh ceph-primary

user@localhost:$ sudo ssh ceph-secondary

And you can easily query your clusters from the devstack VM:

user@localhost:$ sudo rbd mirror pool info volumes

user@localhost:$ sudo rbd --cluster ceph-secondary mirror pool info volumes

NOTE: The cluster name provided in –cluster must match the configuration file in /etc/ceph as this parameter is only used to locate the file and has nothing to do with the actual cluster name that is specified within the configuration file. So both clusters will actually have the “Ceph” name -which is the default- even if we use ceph-secondary to run the commands from the devstack node.

Once the deployment has completed you’ll have your clusters mirroring and the replicated volume type already available:

user@localhost:$ vagrant ssh

user@localhost:$ source ~/devstack/openrc admin admin

user@localhost:$ cinder type-list
+--------------------------------------+------------+-------------+-----------+
| ID | Name | Description | Is_Public |
+--------------------------------------+------------+-------------+-----------+
| 4f30dfdc-ef93-4a63-92a6-37adf1e053dc | replicated | - | True |
| a97e694a-8fa0-4eb3-9303-0240af374092 | lvm | - | True |
| f78397c8-2725-48df-a734-d7b4eef86edf | ceph | - | True |
+--------------------------------------+------------+-------------+-----------+

And you can unstack.sh your deployment and stack.sh it again, and each time the volumes pool from the secondary Ceph cluster will be removed just like the primary and a new one will be created, mirroring links will be created from the primary to the secondary and vice versa, and the replicated volume type will be created. All this thanks to the custom local.sh script.

Tests

OK, we have deployed everything and we have our 3 VMs, so it’s time to run some tests.

The tests in the following sections assume you are already connected to the DevStack node and have set the right environment variables to run OpenStack commands:

user@localhost:$ vagrant ssh

user@localhost:$ source ~/devstack/openrc admin admin

1. Sanity Checks

1.1 – Service

The most basic sanity check to do in Cinder is to confirm that the Cinder Volume service is up and running, since a misconfiguration will make it report as being down and we’ll have to go to /opt/stack/logs/c-vol.log to see what’s wrong.

user@localhost:$ cinder service-list
+------------------+---------------+------+---------+-------+----------------------------+-----------------+
| Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+------------------+---------------+------+---------+-------+----------------------------+-----------------+
| cinder-backup | devstack | nova | enabled | up | 2016-09-21T14:18:52.000000 | - |
| cinder-scheduler | devstack | nova | enabled | up | 2016-09-21T14:18:59.000000 | - |
| cinder-volume | devstack@ceph | nova | enabled | up | 2016-09-21T14:18:56.000000 | - |
| cinder-volume | devstack@lvm | nova | enabled | up | 2016-09-21T14:18:59.000000 | - |
+------------------+---------------+------+---------+-------+----------------------------+-----------------+

1.2 – Create Ceph volume

Now we want to check that we can create a normal Ceph volume, confirming that Cinder is properly accessing the primary cluster not only by looking at the volume status, but also checking the primary’s volumes pool.

user@localhost:$ cinder create --volume-type ceph --name normal-ceph 1
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-09-21T14:19:23.000000 |
| description | None |
| encrypted | False |
| id | 4dab45a2-34d6-497d-a5e3-40dd34015264 |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | normal-ceph |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 8b6e0a34d91a446d8719acdf4145a9d4 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | 0358f5a11feb4585856694a076bd86b0 |
| volume_type | ceph |
+--------------------------------+--------------------------------------+

user@localhost:$ cinder list
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+
| 4dab45a2-34d6-497d-a5e3-40dd34015264 | available | normal-ceph | 1 | ceph | false | |
+--------------------------------------+-----------+-------------+------+-------------+----------+-------------+

user@localhost:$ sudo rbd ls -l volumes
NAME SIZE PARENT FMT PROT LOCK
volume-4dab45a2-34d6-497d-a5e3-40dd34015264 1024M 2

1.3 – Delete Ceph volume

To finish the sanity check we want to confirm that deletion of Ceph volumes also work in our deployment.

user@localhost:$ cinder delete normal-ceph
Request to delete volume normal-ceph has been accepted.

user@localhost:$ cinder list
+----+--------+------+------+-------------+----------+-------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+----+--------+------+------+-------------+----------+-------------+
+----+--------+------+------+-------------+----------+-------------+

user@localhost:$ sudo rbd ls -l volumes

user@localhost:$

Ok, so at least the basic access to Ceph is working correctly and we can continue with the replication tests.

2. Testing Replication

2.1 – Create replicated volume

We’ll now create a replicated volume and check that it gets created first on the primary cluster and after a little bit it’s also available on the secondary cluster.

Please be patient with the replication, as it may not be instantaneous.

user@localhost:$ cinder create --volume-type replicated --name replicated-ceph 1
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-09-21T14:27:08.000000 |
| description | None |
| encrypted | False |
| id | e44311dd-8977-4728-a773-0695deca00fc |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | replicated-ceph |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 8b6e0a34d91a446d8719acdf4145a9d4 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | 0358f5a11feb4585856694a076bd86b0 |
| volume_type | replicated |
+--------------------------------+--------------------------------------+


user@localhost:$ sudo rbd ls -l volumes
NAME SIZE PARENT FMT PROT LOCK
volume-e44311dd-8977-4728-a773-0695deca00fc 1024M 2


user@localhost:$ sudo rbd --cluster ceph-secondary ls volumes


user@localhost:$ sudo rbd mirror image status volumes/volume-e44311dd-8977-4728-a773-0695deca00fc
volume-e44311dd-8977-4728-a773-0695deca00fc:
global_id: a2d50987-76f9-48a7-b879-3ec19b0a09cf
state: down+unknown
description: status not found
last_update: 1970-01-01 03:00:00


user@localhost:$ sudo rbd --cluster ceph-secondary mirror image status volumes/volume-e44311dd-8977-4728-a773-0695deca00fc
rbd: error opening image volume-e44311dd-8977-4728-a773-0695deca00fc: (2) No such file or directory

user@localhost:$ sudo rbd mirror image status volumes/volume-e44311dd-8977-4728-a773-0695deca00fc
volume-e44311dd-8977-4728-a773-0695deca00fc:
global_id: a2d50987-76f9-48a7-b879-3ec19b0a09cf
state: up+stopped
description: remote image is non-primary or local image is primary
last_update: 2016-09-21 17:27:36


user@localhost:$ sudo rbd --cluster ceph-secondary ls -l volumes
NAME SIZE PARENT FMT PROT LOCK
volume-e44311dd-8977-4728-a773-0695deca00fc 1024M 2 excl


user@localhost:$ sudo rbd --cluster ceph-secondary mirror image status volumes/volume-e44311dd-8977-4728-a773-0695deca00fc
volume-e44311dd-8977-4728-a773-0695deca00fc:
global_id: a2d50987-76f9-48a7-b879-3ec19b0a09cf
state: up+replaying
description: replaying, master_position=[object_number=3, tag_tid=1, entry_tid=3], mirror_position=[object_number=3, tag_tid=1, entry_tid=3], entries_behind_master=0
last_update: 2016-09-21 17:30:06


user@localhost:$ cinder list
+--------------------------------------+-----------+-----------------+------+-------------+----------+-------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+-----------------+------+-------------+----------+-------------+
| e44311dd-8977-4728-a773-0695deca00fc | available | replicated-ceph | 1 | replicated | false | |
+--------------------------------------+-----------+-----------------+------+-------------+----------+-------------+


user@localhost:$ cinder show replicated-ceph
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-09-21T14:27:08.000000 |
| description | None |
| encrypted | False |
| id | e44311dd-8977-4728-a773-0695deca00fc |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | replicated-ceph |
| os-vol-host-attr:host | devstack@ceph#ceph |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 8b6e0a34d91a446d8719acdf4145a9d4 |
| replication_status | enabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | available |
| updated_at | 2016-09-21T14:27:09.000000 |
| user_id | 0358f5a11feb4585856694a076bd86b0 |
| volume_type | replicated |
+--------------------------------+--------------------------------------+

NOTE: replication_status should be ignored in the response from the create request, as this will be updated after the driver has completed the volume creation. We can see the updated value in the response to the show command.

When checking the image status of the volume being created we’ve seen 3 different results in the state and description fields:

When the volume is not relicated yet:

  state:       down+unknown
  description: status not found

When the volume has been replicated and we are checking the primary:

  state:       up+stopped
  description: remote image is non-primary or local image is primary

When the volume has been replicated and we are checking a replica:

  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=1, entry_tid=3], mirror_position=[object_number=3, tag_tid=1, entry_tid=3], entries_behind_master=0

If we had checked the non replicated volume we created during our sanity check we would have seen that while it also says down+unkown in the same way the volume that was going to be replicated did, it does not show any global_id:

user@localhost:$ sudo rbd mirror image status volumes/volume-4dab45a2-34d6-497d-a5e3-40dd34015264
volume-4dab45a2-34d6-497d-a5e3-40dd34015264:
global_id:
state: down+unknown
description: status not found
last_update: 1970-01-01 03:00:00

And if we hadn’t delete it and checked the info of both volumes we’d have seen that the non clustered volume does not have journaling enabled -although this will depend on how we deployed our cluster- and that the replicated volume has additional key/value pairs related to mirroring.

user@localhost:$ sudo rbd info volumes/volume-4dab45a2-34d6-497d-a5e3-40dd34015264
rbd image 'volume-4dab45a2-34d6-497d-a5e3-40dd34015264':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.10331190cde7
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:


user@localhost:$ sudo rbd info volumes/volume-e44311dd-8977-4728-a773-0695deca00fc
rbd image 'volume-e44311dd-8977-4728-a773-0695deca00fc':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.105522221a70
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling
flags:
journal: 105522221a70
mirroring state: enabled
mirroring global id: a2d50987-76f9-48a7-b879-3ec19b0a09cf
mirroring primary: true


user@localhost:$ sudo rbd --cluster ceph-secondary info volumes/volume-e44311dd-8977-4728-a773-0695deca00fc
rbd image 'volume-e44311dd-8977-4728-a773-0695deca00fc':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.101941b71efb
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling
flags:
journal: 101941b71efb
mirroring state: enabled
mirroring global id: a2d50987-76f9-48a7-b879-3ec19b0a09cf
mirroring primary: false

2.2 – Create replicated snapshot

Now we’ll go ahead and create a snapshot of the replicated volume and confirm that this snapshot will also be available on the secondary Ceph cluster.

user@localhost:$ cinder snapshot-create --name replicated-snapshot replicated-ceph
+-------------+--------------------------------------+
| Property | Value |
+-------------+--------------------------------------+
| created_at | 2016-09-21T14:35:51.232338 |
| description | None |
| id | 6aca1779-619f-4222-86f2-037d50d932d3 |
| metadata | {} |
| name | replicated-snapshot |
| size | 1 |
| status | creating |
| updated_at | None |
| volume_id | e44311dd-8977-4728-a773-0695deca00fc |
+-------------+--------------------------------------+


user@localhost:$ sudo rbd snap ls volumes/volume-e44311dd-8977-4728-a773-0695deca00fc
SNAPID NAME SIZE
10 snapshot-6aca1779-619f-4222-86f2-037d50d932d3 1024 MB


user@localhost:$ sudo rbd --cluster ceph-secondary snap ls volumes/e44311dd-8977-4728-a773-0695deca00fc
SNAPID NAME SIZE
10 snapshot-6aca1779-619f-4222-86f2-037d50d932d3 1024 MB


user@localhost:$ cinder snapshot-list
+--------------------------------------+--------------------------------------+-----------+---------------------+------+
| ID | Volume ID | Status | Name | Size |
+--------------------------------------+--------------------------------------+-----------+---------------------+------+
| 6aca1779-619f-4222-86f2-037d50d932d3 | e44311dd-8977-4728-a773-0695deca00fc | available | replicated-snapshot | 1 |
+--------------------------------------+--------------------------------------+-----------+---------------------+------+

2.3 – Delete replicated snapshot & volume

Now that we’ve seen how new volumes and snapshots are replicated we want to confirm that deletion works as well, right? What’s more basic than that?

First we’ll delete the snapshot and confirm that it gets deleted on both Ceph clusters:

user@localhost:$ cinder snapshot-delete replicated-snapshot

user@localhost:$ sudo rbd snap ls volumes/volume-e44311dd-8977-4728-a773-0695deca00fc

user@localhost:$ sudo rbd --cluster ceph-secondary snap ls volumes/e44311dd-8977-4728-a773-0695deca00fc

user@localhost:$

Now that there is no snapshot on the volume we’ll proceed to delete the replicated volume.

user@localhost:$ cinder delete replicated-ceph
Request to delete volume replicated-ceph has been accepted.


user@localhost:$ sudo rbd ls -l volumes


user@localhost:$ sudo rbd --cluster ceph-secondary ls -l volumes
NAME SIZE PARENT FMT PROT LOCK
volume-e44311dd-8977-4728-a773-0695deca00fc 1024M 2


user@localhost:$ sudo rbd --cluster ceph-secondary ls volumes

user@localhost:$

2.4 – Setup resources for failover

To test that RBD replication is actually working properly within the Cinder service we’ll want to test the failover mechanism, so we’ll create a couple of resources to test that everything works as expected:

  • Non replicated volume
  • Replicated available volume with snapshot
  • Replicated in-use volume with a file with data (numbers 1 to 100)

So we’ll set them up first:

user@localhost:$ cinder create --volume-type ceph --name normal-ceph 1
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-09-21T14:45:01.000000 |
| description | None |
| encrypted | False |
| id | 1466c553-35ee-419a-9d5c-cbe36c22aed0 |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | normal-ceph |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 8b6e0a34d91a446d8719acdf4145a9d4 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | 0358f5a11feb4585856694a076bd86b0 |
| volume_type | ceph |
+--------------------------------+--------------------------------------+


user@localhost:$ cinder create --volume-type replicated --name replicated-ceph-snapshot 1
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-09-21T14:47:35.000000 |
| description | None |
| encrypted | False |
| id | 4950f13d-e95d-41ea-85ae-80f0ac22aa0e |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | replicated-ceph-snapshot |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 8b6e0a34d91a446d8719acdf4145a9d4 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | 0358f5a11feb4585856694a076bd86b0 |
| volume_type | replicated |
+--------------------------------+--------------------------------------+


user@localhost:$ cinder snapshot-create replicated-ceph-snapshot --name replicated-snapshot
+-------------+--------------------------------------+
| Property | Value |
+-------------+--------------------------------------+
| created_at | 2016-09-21T14:47:48.931379 |
| description | None |
| id | c3d947e6-d2d6-47fb-9a54-41a6baf40e63 |
| metadata | {} |
| name | replicated-snapshot |
| size | 1 |
| status | creating |
| updated_at | None |
| volume_id | 4950f13d-e95d-41ea-85ae-80f0ac22aa0e |
+-------------+--------------------------------------+


user@localhost:$ nova keypair-add --pub-key ~/.ssh/id_rsa.pub mykey


user@localhost:$ nova keypair-list
+-------+------+-------------------------------------------------+
| Name | Type | Fingerprint |
+-------+------+-------------------------------------------------+
| mykey | ssh | dd:3b:b8:2e:85:04:06:e9:ab:ff:a8:0a:c0:04:6e:d6 |
+-------+------+-------------------------------------------------+


user@localhost:$ nova secgroup-add-rule default tcp 22 22 0.0.0.0/0
WARNING: Command secgroup-add-rule is deprecated and will be removed after Nova 15.0.0 is released. Use python-neutronclient or python-openstackclient instead.
+-------------+-----------+---------+-----------+--------------+
| IP Protocol | From Port | To Port | IP Range | Source Group |
+-------------+-----------+---------+-----------+--------------+
| tcp | 22 | 22 | 0.0.0.0/0 | |
+-------------+-----------+---------+-----------+--------------+


user@localhost:$ nova secgroup-add-rule default icmp -1 -1 0.0.0.0/0
WARNING: Command secgroup-add-rule is deprecated and will be removed after Nova 15.0.0 is released. Use python-neutronclient or python-openstackclient instead.
+-------------+-----------+---------+-----------+--------------+
| IP Protocol | From Port | To Port | IP Range | Source Group |
+-------------+-----------+---------+-----------+--------------+
| icmp | -1 | -1 | 0.0.0.0/0 | |
+-------------+-----------+---------+-----------+--------------+


user@localhost:$ nova boot --flavor m1.nano --image cirros-0.3.4-x86_64-uec --key-name mykey --security-groups default --nic net-name=private myvm
+--------------------------------------+----------------------------------------------------------------+
| Property | Value |
+--------------------------------------+----------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hostname | myvm |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | |
| OS-EXT-SRV-ATTR:kernel_id | 10fcc280-2b42-4e5f-9e5d-3a0b0c220767 |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | c0e20554-cd2c-40e8-a0da-6f749d72f126 |
| OS-EXT-SRV-ATTR:reservation_id | r-n0zl5kiu |
| OS-EXT-SRV-ATTR:root_device_name | - |
| OS-EXT-SRV-ATTR:user_data | - |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | vMepeoN8PY6h |
| config_drive | |
| created | 2016-09-21T14:49:09Z |
| description | - |
| flavor | m1.nano (42) |
| hostId | |
| host_status | |
| id | 84b97e11-955a-4568-8e49-163e5477acf6 |
| image | cirros-0.3.4-x86_64-uec (a335801e-26fe-4ed9-8079-06230121e78a) |
| key_name | mykey |
| locked | False |
| metadata | {} |
| name | myvm |
| os-extended-volumes:volumes_attached | [] |
| progress | 0 |
| security_groups | default |
| status | BUILD |
| tags | [] |
| tenant_id | 8b6e0a34d91a446d8719acdf4145a9d4 |
| updated | 2016-09-21T14:49:09Z |
| user_id | 0358f5a11feb4585856694a076bd86b0 |
+--------------------------------------+----------------------------------------------------------------+


user@localhost:$ cinder create --volume-type replicated --name ceph-attached 1
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-09-21T14:50:07.000000 |
| description | None |
| encrypted | False |
| id | 890313d7-c72b-4f97-b0c7-defd3641bfd4 |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | ceph-attached |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 8b6e0a34d91a446d8719acdf4145a9d4 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | 0358f5a11feb4585856694a076bd86b0 |
| volume_type | replicated |
+--------------------------------+--------------------------------------+


user@localhost:$ nova list
+--------------------------------------+------+--------+------------+-------------+--------------------------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+--------+------------+-------------+--------------------------------------------------------+
| 84b97e11-955a-4568-8e49-163e5477acf6 | myvm | ACTIVE | - | Running | private=10.0.0.4, fde7:d51b:1893:0:f816:3eff:fe4e:818e |
+--------------------------------------+------+--------+------------+-------------+--------------------------------------------------------+


user@localhost:$ nova volume-attach myvm 890313d7-c72b-4f97-b0c7-defd3641bfd4
+----------+--------------------------------------+
| Property | Value |
+----------+--------------------------------------+
| device | /dev/vdb |
| id | 890313d7-c72b-4f97-b0c7-defd3641bfd4 |
| serverId | 84b97e11-955a-4568-8e49-163e5477acf6 |
| volumeId | 890313d7-c72b-4f97-b0c7-defd3641bfd4 |
+----------+--------------------------------------+


user@localhost:$ ssh -o StrictHostKeychecking=no cirros@10.0.0.4 "sudo su - -c 'seq 1 100 > /dev/vdb; head -c 292 /dev/vdb'"
Warning: Permanently added '10.0.0.4' (RSA) to the list of known hosts.
1
2
...
99
100


user@localhost:$ cinder list
+--------------------------------------+-----------+--------------------------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------------------+------+-------------+----------+--------------------------------------+
| 1466c553-35ee-419a-9d5c-cbe36c22aed0 | available | normal-ceph | 1 | ceph | false | |
| 4950f13d-e95d-41ea-85ae-80f0ac22aa0e | available | replicated-ceph-snapshot | 1 | replicated | false | |
| 890313d7-c72b-4f97-b0c7-defd3641bfd4 | in-use | ceph-attached | 1 | replicated | false | 84b97e11-955a-4568-8e49-163e5477acf6 |
+--------------------------------------+-----------+--------------------------+------+-------------+----------+--------------------------------------+


user@localhost:$ cinder snapshot-list
+--------------------------------------+--------------------------------------+-----------+---------------------+------+
| ID | Volume ID | Status | Name | Size |
+--------------------------------------+--------------------------------------+-----------+---------------------+------+
| c3d947e6-d2d6-47fb-9a54-41a6baf40e63 | 4950f13d-e95d-41ea-85ae-80f0ac22aa0e | available | replicated-snapshot | 1 |
+--------------------------------------+--------------------------------------+-----------+---------------------+------+


user@localhost:$ sudo rbd ls -l volumes
NAME SIZE PARENT FMT PROT LOCK
volume-1466c553-35ee-419a-9d5c-cbe36c22aed0 1024M 2
volume-4950f13d-e95d-41ea-85ae-80f0ac22aa0e 1024M 2
volume-4950f13d-e95d-41ea-85ae-80f0ac22aa0e@snapshot-c3d947e6-d2d6-47fb-9a54-41a6baf40e63 1024M 2 yes
volume-890313d7-c72b-4f97-b0c7-defd3641bfd4 1024M 2 excl


user@localhost:$ sudo rbd --cluster ceph-secondary ls -l volumes
NAME SIZE PARENT FMT PROT LOCK
volume-4950f13d-e95d-41ea-85ae-80f0ac22aa0e 1024M 2 excl
volume-4950f13d-e95d-41ea-85ae-80f0ac22aa0e@snapshot-c3d947e6-d2d6-47fb-9a54-41a6baf40e63 1024M 2 yes
volume-890313d7-c72b-4f97-b0c7-defd3641bfd4 1024M 2 excl

Everything looks fine, we have 3 volumes and 1 snapshot on the primary and 2 volumes and 1 snapshot on the secondary.

2.5 – Failover

There are two failover scenarios that are contemplated by the driver, when the primary cluster is still accesible and when a connection between the primary and secondary Ceph clusters is no longer possible. In the first case a clean promotion of the secondary cluster is possible and that’s how the failover will be done, and on the second case only force promotion is possible.

Since the main use case for Replication v2.1 is the Smoking Hole scenario we’ll be testing the failover when there’s no connection between the clusters and force promotion is the only available path. To achieve this we’ll be shutting down the network interface that is being used to link the clusters. Since this network interface is also being used to communicate with the DevStack node we will bring it down on the primary cluster.

NOTE: Currently there is a bug in the RBD mirror daemon that causes force-promoted volumes to remain in read-only mode until the daemon is restarted, so we’ll need a workaround for the tests until this gets fixed.

So let’s bring the network down on the primary cluster.

user@localhost:$ sudo ssh ceph-primary ifconfig
eth0 Link encap:Ethernet HWaddr 52:54:00:7a:af:7c
inet addr:192.168.121.241 Bcast:192.168.121.255 Mask:255.255.255.0
inet6 addr: fe80::5054:ff:fe7a:af7c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:115697 errors:0 dropped:0 overruns:0 frame:0
TX packets:35908 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:168262393 (168.2 MB) TX bytes:2675665 (2.6 MB)

eth1 Link encap:Ethernet HWaddr 52:54:00:09:45:41
inet addr:10.0.1.11 Bcast:10.0.1.255 Mask:255.255.255.0
inet6 addr: fe80::5054:ff:fe09:4541/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:75007 errors:0 dropped:0 overruns:0 frame:0
TX packets:59203 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:64542899 (64.5 MB) TX bytes:70810095 (70.8 MB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:38973 errors:0 dropped:0 overruns:0 frame:0
TX packets:38973 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:6757038 (6.7 MB) TX bytes:6757038 (6.7 MB)


user@localhost:$ sudo ssh 192.168.121.241 'ifconfig eth1 down'


user@localhost:$ ping ceph-primary -c 1 -w 2
PING ceph-primary (10.0.1.11) 56(84) bytes of data.

--- ceph-primary ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1006ms

NOTE: We’ll still be able to access the primary cluster using eth0 IP -192.168.121.241- if we want to bring the interface back up.

The manual failover is performed using command cinder failover-host and passing as argument the host of the service where the failover is being performed.

user@localhost:$ cinder service-list --binary cinder-volume --withreplication
+---------------+---------------+------+---------+-------+----------------------------+--------------------+-------------------+--------+-----------------+
| Binary | Host | Zone | Status | State | Updated_at | Replication Status | Active Backend ID | Frozen | Disabled Reason |
+---------------+---------------+------+---------+-------+----------------------------+--------------------+-------------------+--------+-----------------+
| cinder-volume | devstack@ceph | nova | enabled | up | 2016-09-21T15:11:03.000000 | enabled | - | False | - |
| cinder-volume | devstack@lvm | nova | enabled | up | 2016-09-21T15:11:09.000000 | disabled | - | False | - |
+---------------+---------------+------+---------+-------+----------------------------+--------------------+-------------------+--------+-----------------+


user@localhost:$ cinder failover-host devstack@ceph

user@localhost:$ cinder service-list --binary cinder-volume --withreplication
+---------------+---------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+
| Binary | Host | Zone | Status | State | Updated_at | Replication Status | Active Backend ID | Frozen | Disabled Reason |
+---------------+---------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+
| cinder-volume | devstack@ceph | nova | disabled | up | 2016-09-21T15:13:54.000000 | failing-over | ceph-secondary | False | failed-over |
| cinder-volume | devstack@lvm | nova | enabled | up | 2016-09-21T15:13:49.000000 | disabled | - | False | - |
+---------------+---------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+


user@localhost:$ cinder service-list --binary cinder-volume --withreplication
+---------------+---------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+
| Binary | Host | Zone | Status | State | Updated_at | Replication Status | Active Backend ID | Frozen | Disabled Reason |
+---------------+---------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+
| cinder-volume | devstack@ceph | nova | disabled | up | 2016-09-21T15:13:54.000000 | failed-over | ceph-secondary | False | failed-over |
| cinder-volume | devstack@lvm | nova | enabled | up | 2016-09-21T15:13:49.000000 | disabled | - | False | - |
+---------------+---------------+------+----------+-------+----------------------------+--------------------+-------------------+--------+-----------------+

NOTE: Since the driver is trying to do a clean failover first we’ll have around 30 seconds delay in passing from failing-over state to failed-over.

After issuing the failover-host command we can see that the service got disabled, and will stay like that until we re-enable it.

We can check the progress of the failover in the c-vol screen window, and once it has been completed we’ll see something similar to this:

2016-... DEBUG cinder.utils [req-...] Failed attempt 3 from (pid=22483) _print_stop /opt/stack/cinder/cinder/utils.py:795
2016-... DEBUG cinder.utils [req-...] Have been at this for 36.022 seconds from (pid=22483) _print_stop /opt/stack/cinder/cinder/utils.py:797
2016-... DEBUG cinder.volume.drivers.rbd [req-...] Failed to demote {'volume': 'volume-4950f13d-e95d-41ea-85ae-80f0ac22aa0e', 'error': VolumeBackendAPIException(u'Bad or unexpected response from the storage volume backend API: Error connecting to ceph cluster.',)}(volume)s with error: Bad or unexpected response from the storage volume backend API: Error connecting to ceph cluster.. from (pid=22483) _demote_volumes /opt/stack/cinder/cinder/volume/drivers/rbd.py:1076
2016-... DEBUG cinder.volume.drivers.rbd [req-...] Skipping failover for non replicated volume volume-1466c553-35ee-419a-9d5c-cbe36c22aed0 with status: available from (pid=22483) _failover_volume /opt/stack/cinder/cinder/volume/drivers/rbd.py:1040
2016-... DEBUG cinder.volume.drivers.rbd [req-...] connecting to ceph-secondary (timeout=2). from (pid=22483) _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:421
2016-... DEBUG cinder.volume.drivers.rbd [req-...] connecting to ceph-secondary (timeout=2). from (pid=22483) _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:421
2016-... INFO cinder.volume.drivers.rbd [req-...] RBD driver failover completed.

If we look at the volumes we can see that non replicated normal-ceph volume is now on error state because it is not available on the secondary cluster.

user@localhost:$ cinder list
+--------------------------------------+-----------+--------------------------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------------------+------+-------------+----------+--------------------------------------+
| 1466c553-35ee-419a-9d5c-cbe36c22aed0 | error | normal-ceph | 1 | ceph | false | |
| 4950f13d-e95d-41ea-85ae-80f0ac22aa0e | available | replicated-ceph-snapshot | 1 | replicated | false | |
| 890313d7-c72b-4f97-b0c7-defd3641bfd4 | in-use | ceph-attached | 1 | replicated | false | 84b97e11-955a-4568-8e49-163e5477acf6 |
+--------------------------------------+-----------+--------------------------+------+-------------+----------+--------------------------------------+

And we can check the DB for some internal replication information where we can see the original status of the non replicated volume:

user@localhost:$ mysql cinder -e "select display_name, status, replication_status, replication_driver_data,replication_extended_status from volumes where not deleted;"
+--------------------------+-----------+--------------------+--------------------------------------------------------+---------------------------------------------------------------------------------+
| display_name | status | replication_status | replication_driver_data | replication_extended_status |
+--------------------------+-----------+--------------------+--------------------------------------------------------+---------------------------------------------------------------------------------+
| normal-ceph | error | disabled | {"status":"available","replication_status":"disabled"} | NULL |
| replicated-ceph-snapshot | available | failed-over | {"had_journaling":false} | {"name":"ceph-secondary","conf":"/etc/ceph/ceph-secondary.conf","user":"admin"} |
| ceph-attached | in-use | failed-over | {"had_journaling":false} | {"name":"ceph-secondary","conf":"/etc/ceph/ceph-secondary.conf","user":"admin"} |
+--------------------------+-----------+--------------------+--------------------------------------------------------+---------------------------------------------------------------------------------+

For the rbd-mirror workaround mentioned earlier first we’ll confirm that the rbd-mirror is still watching the volume and setting it as read-only, then we’ll restart the daemon, and finally confirm the volume is no longer being watched by the daemon.

user@localhost:$ sudo rbd --cluster ceph-secondary ls -l volumes
NAME SIZE PARENT FMT PROT LOCK
volume-4950f13d-e95d-41ea-85ae-80f0ac22aa0e 1024M 2 excl
volume-4950f13d-e95d-41ea-85ae-80f0ac22aa0e@snapshot-c3d947e6-d2d6-47fb-9a54-41a6baf40e63 1024M 2 yes
volume-890313d7-c72b-4f97-b0c7-defd3641bfd4 1024M 2 excl


user@localhost:$ sudo rbd --cluster ceph-secondary status volumes/volume-4950f13d-e95d-41ea-85ae-80f0ac22aa0e
Watchers:
watcher=10.0.1.12:0/3468660901 client.4121 cookie=139940840521952


user@localhost:$ sudo rbd --cluster ceph-secondary status volumes/volume-890313d7-c72b-4f97-b0c7-defd3641bfd4
Watchers:
watcher=10.0.1.12:0/3468660901 client.4121 cookie=139940840635408


user@localhost:$ sudo ssh ceph-secondary 'systemctl kill -s6 ceph-rbd-mirror@admin'


user@localhost:$ sudo ssh ceph-secondary 'systemctl kill -s6 ceph-rbd-mirror@admin'


user@localhost:$ sudo rbd --cluster ceph-secondary status volumes/volume-4950f13d-e95d-41ea-85ae-80f0ac22aa0e
Watchers: none


user@localhost:$ sudo ssh ceph-secondary 'systemctl start ceph-rbd-mirror@admin'


user@localhost:$ sudo rbd --cluster ceph-secondary status volumes/volume-4950f13d-e95d-41ea-85ae-80f0ac22aa0e
Watchers: none


user@localhost:$ sudo rbd --cluster ceph-secondary status volumes/volume-890313d7-c72b-4f97-b0c7-defd3641bfd4
Watchers: none

2.6 – Freeze functionality

According to the specs and the devref the freeze functionality should prevent deletes and extends, but as I realized during these tests, this is not working. I have opened a bug.

For the time being only operations going through the scheduler will be prevented, so it’s basically the same as disabling the service. So I don’t see much point in testing this since the driver has nothing to do here, it’s all done on the scheduler.

2.7 – Deleting failedover volume & snapshot

Let’s confirm we can delete replicated snapshots and volumes once we have failed over.

user@localhost:$ cinder snapshot-delete replicated-snapshot

user@localhost:$ sudo rbd --cluster ceph-secondary ls -l volumes
NAME SIZE PARENT FMT PROT LOCK
volume-4950f13d-e95d-41ea-85ae-80f0ac22aa0e 1024M 2
volume-890313d7-c72b-4f97-b0c7-defd3641bfd4 1024M 2 excl

user@localhost:$ cinder delete replicated-ceph-snapshot
NAME SIZE PARENT FMT PROT LOCK
volume-890313d7-c72b-4f97-b0c7-defd3641bfd4 1024M 2 excl

2.8 – Reattaching volume

As mentioned earlier attached volumes are a special case that require manual detaching and reattaching, or at least that’s the procedure mentioned in the spec and what seems logical to me. Unfortunately Nova -as far as I know- lacks the ability to detach a volume, at least an RBD volume, from an instance if the backend is not accessible. So we’ll just attach the volume to the same instance again (but we could create a new instance and attach it to that one) after changing status and attach_status of the volume.

In a real Smoke Hole situation we probably wouldn’t have to detach the volume since our computes would most likely be also dead.

After attaching the volume from the secondary Ceph cluster we’ll check the contents to confirm that the data we wrote on the primary is also there.

user@localhost:$ cinder reset-state ceph-attached --state available --attach-status detached


user@localhost:$ nova volume-attach myvm 890313d7-c72b-4f97-b0c7-defd3641bfd4
+----------+--------------------------------------+
| Property | Value |
+----------+--------------------------------------+
| device | /dev/vdc |
| id | 890313d7-c72b-4f97-b0c7-defd3641bfd4 |
| serverId | 84b97e11-955a-4568-8e49-163e5477acf6 |
| volumeId | 890313d7-c72b-4f97-b0c7-defd3641bfd4 |
+----------+--------------------------------------+


user@localhost:$ ssh -o StrictHostKeychecking=no cirros@10.0.0.4 "sudo su - -c 'head -c 292 /dev/vdc'"
1
2
...
99
100

2.9 – Creating volumes

Now we’ll confirm that after enabling the service we can still create volumes, both normal and replicated.

NOTE: It is the system administrator’s responsibility to know if allowing the creation of volumes after a failover is a good idea or not, since non-replicated volumes will not be available on the primary cluster if we ever failback.

user@localhost:$ cinder service-enable devstack@ceph cinder-volume
+---------------+---------------+---------+
| Host | Binary | Status |
+---------------+---------------+---------+
| devstack@ceph | cinder-volume | enabled |
+---------------+---------------+---------+


user@localhost:$ cinder create --volume-type ceph --name failedover-normal 1
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-09-21T18:29:28.000000 |
| description | None |
| encrypted | False |
| id | c39a8c16-210e-45ff-8453-20f527f9fbd1 |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | failedover-normal |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 8b6e0a34d91a446d8719acdf4145a9d4 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | 0358f5a11feb4585856694a076bd86b0 |
| volume_type | ceph |
+--------------------------------+--------------------------------------+


user@localhost:$ sudo rbd --cluster ceph-secondary mirror image status volumes/volume-c39a8c16-210e-45ff-8453-20f527f9fbd1
volume-c39a8c16-210e-45ff-8453-20f527f9fbd1:
global_id:
state: down+unknown
description: status not found
last_update: 1970-01-01 03:00:00


user@localhost:$ cinder create --volume-type replicated --name failedover-replicated 1
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-09-21T18:34:21.000000 |
| description | None |
| encrypted | False |
| id | 0e8c732a-0a4a-4d8f-bd67-b337a2e26954 |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | failedover-replicated |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 8b6e0a34d91a446d8719acdf4145a9d4 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | 0358f5a11feb4585856694a076bd86b0 |
| volume_type | replicated |
+--------------------------------+--------------------------------------+


user@localhost:$ sudo rbd --cluster ceph-secondary mirror image status volumes/volume-0e8c732a-0a4a-4d8f-bd67-b337a2e26954
volume-0e8c732a-0a4a-4d8f-bd67-b337a2e26954:
global_id: 14348cf5-3b4b-4ec0-8fb7-7b5fc9836ba9
state: down+unknown
description: status not found
last_update: 1970-01-01 03:00:00

We can see that the non replicated volume has no global_id unlike the replicated volume as expected, and both are down+unknown since they are not currently replicated anywhere.

Future work

You are probably thinking that there’s something missing in these tests, where’s the failback?. Well, we don’t have a failback mechanism in Cinder yet. We have specs for the failback but no implementation, so I cannot test it.

After playing around with replication I see there are a couple of places that require further work in the replication feature but what is there seems to work fine:


Picture: “Peeking” by niXerKG is licensed under CC BY-NC 2.0


Manual validation of Cinder A/A patches 2

 

In the Cinder Midcycle I agreed to create some sort of document explaining the manual tests I’ve been doing to validate the work on Cinder’s Active-Active High Availability -as a starting point for other testers and for the automation of the tests- and writing a blog post was the most convenient way for me to do so, so here it is.

checklist

Scope

The Active-Active High Availability work in Cinder is formed by a good number of specs and patches, and most of them have not yet been merged and some have not even been created, yet we are at a point were we can start testing things to catch bugs and performance bottlenecks as soon as possible.

We have merged in master -Newton cycle- most of the DLM work and all of the patches that form the foundation needed for the new job distribution and cleanup mechanisms, but we decided not to include in this cycle any patches that changed the way we do the job distribution or the cleanup since those also affect non clustered deployments; we wanted to be really sure we are not introducing any bugs in normal deployments.

The scope of the tests I’m going to be discussing in this post is limited to the job distribution and cleanup mechanism using the Tooz library with local file locks instead of a DLM. This way we’ll be able to follow a classic crawl-walk-run approach where we first test these mechanisms ignoring the DLM variable from the equation, together with the potential configuration and communication issues it brings. Later we’ll add the DLM as well as connection failures simulation.

Since the two mechanisms to test are highly intertwined we’ll be testing them both at the same time.

The explanation provided in this post is not only useful for testing existing code, but it’s also interesting for driver maintainers, as they can start testing their drivers to confirm they are ready for the Active-Active work. It is true that this would be a very basic check -since proper tests require of a DLM and having the services deployed in different hosts- but it would allow them to get familiar with it and catch the most obvious issues.

It is important that all driver maintainers are able to start working on this at the same time to ensure fair treatment for all, otherwise driver maintainers working on the A/A feature would have an unfair advantage.

Deployment

For the initial tests we want to keep things as simple as possible, so we’ll only have 1 cinder-api service -so no HAProxy configuration is needed-, only 1 scheduler (it’s really easy to add more, but it’ll make it harder to debug, so only 1 for now), we’ll not be using a DLM like we said earlier, and we will also be using local storage using the LVM driver -yes you read it right- to simplify our deployment. For this we’ll be using an all in one deployment using DevStack, as it will reduce configuration requirements since we don’t need services deployed in one host -or VM- to communicate with another host -or VM-. I know this sounds counter-intuitive, but this is good enough for now as you’ll see and in the near future we’ll expand this configuration to do more real tests.

To run 2 cinder-volume services in a clustered configuration under the same devstack deployment all you really need is to pull the latest patch in the HA A/A series from Gerrit and then run both services with the same cluster and different host configuration options. But since we are going to perform some additional tests it will be good to configure a little bit more.

Our DevStack configuration will do these things:

  • Download the Cinder component from the latest patch in the A/A branch (at the moment of this writing it’s the 6th patch for Scheduler’s Cosmetic Changes first patch for Add remaining operations to cluster).
  • Set over subscription ration to 10.0 (since we won’t be really writing anything in most cases).
  • Configure the host parameter instead of using the default value.
  • Create 2 LVM backends of 5GB each
  • Set backends to use thin provisioning

So first we must edit the local.conf and make sure we have included these lines:

# Retrieve Cinder code from gerrit's A/A work
CINDER_REPO=https://review.openstack.org/p/openstack/cinder
CINDER_BRANCH=refs/changes/68/355968/1

# 5GB LVM backends
VOLUME_BACKING_FILE_SIZE=5125M

# 2 backends
CINDER_ENABLED_BACKENDS=${CINDER_ENABLED_BACKENDS:-lvm:lvmdriver-1,lvm:lvmdriver-2}

[[post-config|$CINDER_CONF]]
[DEFAULT]
# Don't use default host name
host = host1
[lvmdriver-1]
lvm_type = thin
lvm_max_over_subscription_ratio = 10.0
[lvmdriver-2]
lvm_type = thin
lvm_max_over_subscription_ratio = 10.0

For reference this is the local.conf file I use. If you want to use other storage backends you just need to adapt above configuration to the backend driver.

We didn’t configure the cluster option on purpose, don’t worry, we’ll do it later after we’ve done some tests.

You’ll need to create 2 files both with the cluster option -same value on both- and only one of them with the host option.

These were the commands I run after updating my VM and made sure it had git installed, they basically cloned devstack, downloaded my devstack configuration, deployed a devstack, created the 2 configuration files for later, created a new window with logging, left the command I’ll need to run to start the second service, and attach to the stack screen session.

user@localhost:$ git clone https://git.openstack.org/openstack-dev/devstack

user@localhost:$ cd devstack

user@localhost:$ curl -o local.conf http://gorka.eguileor.com/files/cinder/manual_ha_aa_local.conf

user@localhost:$ ./stack.sh

user@localhost:$ echo -e "[DEFAULT]\ncluster = mycluster" > /etc/cinder/host1.conf

user@localhost:$ echo -e "[DEFAULT]\ncluster = mycluster\nhost = host2" > /etc/cinder/host2.conf

user@localhost:$ screen -S stack -X screen -t c-vol2

user@localhost:$ screen -S stack -p c-vol2 -X logfile /opt/stack/logs/c-vol2.log

user@localhost:$ screen -S stack -p c-vol2 -X log on

user@localhost:$ touch /opt/stack/logs/c-vol2.log

user@localhost:$ screen -S stack -p c-vol2 -X stuff $'/usr/bin/cinder-volume --config-file /etc/cinder/cinder.conf --config-file /etc/cinder/host2.conf & echo $! >/opt/stack/status/stack/c-vol2.pid; fg || echo "c-vol failed to start" | tee "/opt/stack/status/stack/c-vol2.failure"'

user@localhost:$ screen -x stack -p 0

Code changes

There are some cases were Cinder may perform operations too fast for us to act on them, be it to check that something has occurred or to make Cinder do something, so we’ll be making some small modifications to cinder’s code to create some delays that will give us some leeway.

Required changes to the code are:

cinder/volume/utils.py

def introduce_delay(seconds, operation='-', resource_id='-'):
    for __ in range(seconds):
        time.sleep(1)
        LOG.debug(_('Delaying %(op)s operation on %(id)s.'),
                  {'op': operation, 'id': resource_id})

And then we need to introduce calls to it from cinder/volume/flows/manager/create_volume.py and cinder/volume/manager.py in create volume, delete volume, create snapshot and delete snapshot, so that we have a 30 seconds delay before actually performing the operation and in the case of doing a volume creation from an image, the delay should be right after we have changed the status to “downloading”.

We can do these code changes manually or we can just change the CINDER_BRANCH inside local.conf to point to a patch that I specifically created to introduce these delays for my manual tests.

CINDER_BRANCH=refs/changes/69/353069/3

If you are using my configuration you are already pointing to that patch.

1. Non clustered tests

My recommendation is to split the screen session so we can see multiple windows at the same time as it will allow us to execute commands and follow the flow from the API, SCH, and VOL nodes.

We all have our preferences, but when I’m working on these tests I usually have the screen session horizontally split in at least 5 sections -command line, c-api, c-sch, c-vol, c-vol2- and I tend to reorder my windows so the cinder windows are at the beginning in the same order as I listed going from 0 to 4, and having the c-back window as number 5, number 6 a mysql connection, and number 7 with a vim editor.

The reason why we didn’t add the cluster configuration when deploying DevStack is because we wanted to check the non clustered deployment first.

1.0 – Sanity checks

The first thing we should do, now that we have a DevStack running and before doing any tests, is do some checks that will allow us to do some basic sanity checks once we run 2 services in the same cluster:

  • Check that there are no clusters:
user@localhost:$ cinder --os-volume-api-version 3.11 cluster-list --detail
+------+--------+-------+--------+-----------+----------------+----------------+-----------------+------------+------------+
| Name | Binary | State | Status | Num Hosts | Num Down Hosts | Last Heartbeat | Disabled Reason | Created At | Updated at |
+------+--------+-------+--------+-----------+----------------+----------------+-----------------+------------+------------+
+------+--------+-------+--------+-----------+----------------+----------------+-----------------+------------+------------+

  • Check services and notice that the Cluster field is empty for all services:
user@localhost:$ cinder --os-volume-api-version 3.11 service-list
+------------------+-------------------+------+---------+-------+----------------------------+---------+-----------------+
| Binary | Host | Zone | Status | State | Updated_at | Cluster | Disabled Reason |
+------------------+-------------------+------+---------+-------+----------------------------+---------+-----------------+
| cinder-backup | host1 | nova | enabled | up | 2016-08-08T17:33:26.000000 | - | - |
| cinder-scheduler | host1 | nova | enabled | up | 2016-08-08T17:33:23.000000 | - | - |
| cinder-volume | host1@lvmdriver-1 | nova | enabled | up | 2016-08-08T17:33:30.000000 | - | - |
| cinder-volume | host1@lvmdriver-2 | nova | enabled | up | 2016-08-08T17:33:31.000000 | - | - |
+------------------+-------------------+------+---------+-------+----------------------------+---------+-----------------+

  • Check there’s no RabbitMQ cluster queue:
user@localhost:$ sudo rabbitmqctl list_queues name | grep cinder-volume.mycluster

user@localhost:$
  • Check that the workers table is empty:
user@localhost:$ mysql cinder -e 'select * from workers;'

user@localhost:$

1.1 – Creation

The most basic thing we need to test is that we are able to create a volume and that we are creating the workers table entry:

user@localhost:$ cinder create --name mydisk 1; sleep 3; mysql cinder -e 'select * from workers;'
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-08-08T17:52:47.000000 |
| description | None |
| encrypted | False |
| id | 16fcca48-8729-44ab-b024-ddd5cfd458a4 |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | mydisk |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 2e00c4d79a5f49708438f8d3761a6d3d |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | f78b297498774851a758b08385e39b77 |
| volume_type | lvmdriver-1 |
+--------------------------------+--------------------------------------+
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+
| created_at | updated_at | deleted_at | deleted | id | resource_type | resource_id | status | service_id |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+
| 2016-08-08 17:52:47 | 2016-08-08 17:52:48 | NULL | 0 | 2 | Volume | 16fcca48-8729-44ab-b024-ddd5cfd458a4 | creating | 3 |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+

user@localhost:$

We can see that we have a new entry in the workers table for the volume that is being created, and this operation is being performed by the service #3.

It is important that once the volume has been created we check that the workers table table is empty.

user@localhost:$ cinder list
+--------------------------------------+-----------+--------+------+-------------+----------+-------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------+------+-------------+----------+-------------+
| 16fcca48-8729-44ab-b024-ddd5cfd458a4 | available | mydisk | 1 | lvmdriver-1 | false | |
+--------------------------------------+-----------+--------+------+-------------+----------+-------------+

user@localhost:$ mysql cinder -e 'select * from workers;'

user@localhost:$

1.2 – Deletion

Now we proceed to delete the newly created volume and make sure that we also have the workers DB entry while the operation is undergoing and that it is removed once it has completed.

user@localhost:$ cinder delete mydisk; sleep 3; mysql cinder -e 'select * from workers;'
Request to delete volume mydisk has been accepted.
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+
| created_at | updated_at | deleted_at | deleted | id | resource_type | resource_id | status | service_id |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+
| 2016-08-08 18:26:16 | 2016-08-08 18:26:16 | NULL | 0 | 3 | Volume | 16fcca48-8729-44ab-b024-ddd5cfd458a4 | deleting | 3 |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+

user@localhost:$ cinder list
+----+--------+------+------+-------------+----------+-------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+----+--------+------+------+-------------+----------+-------------+
+----+--------+------+------+-------------+----------+-------------+

user@localhost:$ mysql cinder -e 'select * from workers;'

user@localhost:$

1.3 – Cleanup

We are going to test that basic cleanup works when the node dies, for this we’ll create a volume that we’ll attach to a VM, create another volume and start creating a snapshot, start creating a volume from an image, start creating a volume, and start deleting another volume.

So in the end we’ll have the following cleanable volume statuses:

  • “in-use”
  • “creating”
  • “deleting”
  • “downloading”

Snapshot:

  • “creating”

The sequence of commands and results would look like this:

user@localhost:$ nova boot --flavor m1.nano --image cirros-0.3.4-x86_64-uec myvm
+--------------------------------------+----------------------------------------------------------------+
| Property | Value |
+--------------------------------------+----------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hostname | myvm |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | instance-00000001 |
| OS-EXT-SRV-ATTR:kernel_id | 4c1a9ce2-a78e-43ec-99e3-5b532359d62c |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | 7becc8ae-0153-44be-9b81-9c94e5c7849a |
| OS-EXT-SRV-ATTR:reservation_id | r-qy2z7bpp |
| OS-EXT-SRV-ATTR:root_device_name | - |
| OS-EXT-SRV-ATTR:user_data | - |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | Xk7g2tbzVNjg |
| config_drive | |
| created | 2016-08-09T10:43:23Z |
| description | - |
| flavor | m1.nano (42) |
| hostId | |
| host_status | |
| id | 74fa7147-a0c5-4b17-b9b3-ce4ebfbea911 |
| image | cirros-0.3.4-x86_64-uec (432c9a2b-8ed2-4957-8d12-063217f26a3f) |
| key_name | - |
| locked | False |
| metadata | {} |
| name | myvm |
| os-extended-volumes:volumes_attached | [] |
| progress | 0 |
| security_groups | default |
| status | BUILD |
| tags | [] |
| tenant_id | 83e1beb749d74956b664ef58c001af29 |
| updated | 2016-08-09T10:43:23Z |
| user_id | c21ca8dae0644e52afe624a518e5e8f2 |
+--------------------------------------+----------------------------------------------------------------+


user@localhost:$ cinder create --name attached 1
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-08-09T10:43:31.000000 |
| description | None |
| encrypted | False |
| id | a9102b47-37ff-4fd2-a76c-44e50c00e1fd |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | attached |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 83e1beb749d74956b664ef58c001af29 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | c21ca8dae0644e52afe624a518e5e8f2 |
| volume_type | lvmdriver-1 |
+--------------------------------+--------------------------------------+


user@localhost:$ nova volume-attach myvm a9102b47-37ff-4fd2-a76c-44e50c00e1fd
+----------+--------------------------------------+
| Property | Value |
+----------+--------------------------------------+
| device | /dev/vdb |
| id | a9102b47-37ff-4fd2-a76c-44e50c00e1fd |
| serverId | 74fa7147-a0c5-4b17-b9b3-ce4ebfbea911 |
| volumeId | a9102b47-37ff-4fd2-a76c-44e50c00e1fd |
+----------+--------------------------------------+


user@localhost:$ cinder create --name deleting_vol 1
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-08-09T12:25:06.000000 |
| description | None |
| encrypted | False |
| id | 36fcb60b-83fc-420b-94cb-1f8f7979ea9d |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | deleting_vol |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 83e1beb749d74956b664ef58c001af29 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | c21ca8dae0644e52afe624a518e5e8f2 |
| volume_type | lvmdriver-1 |
+--------------------------------+--------------------------------------+


user@localhost:$ cinder create --name snapshot_vol 1
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-08-09T12:11:33.000000 |
| description | None |
| encrypted | False |
| id | 6a12f169-c6a7-4de5-85f2-c8259cbd6924 |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | snapshot_vol |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 83e1beb749d74956b664ef58c001af29 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | c21ca8dae0644e52afe624a518e5e8f2 |
| volume_type | lvmdriver-1 |
+--------------------------------+--------------------------------------+


user@localhost:$ mysql cinder -e 'select * from workers;'


user@localhost:$ cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 6a12f169-c6a7-4de5-85f2-c8259cbd6924 | available | snapshot_vol | 1 | lvmdriver-1 | false | |
| 36fcb60b-83fc-420b-94cb-1f8f7979ea9d | available | deleting_vol | 1 | lvmdriver-1 | false | |
| a9102b47-37ff-4fd2-a76c-44e50c00e1fd | in-use | attached | 1 | lvmdriver-1 | false | 74fa7147-a0c5-4b17-b9b3-ce4ebfbea911 |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+


user@localhost:$ cinder create --name downloading --image-id cirros-0.3.4-x86_64-uec 1; cinder create --name creating 1; cinder snapshot-create snapshot_vol --name creating_snap; cinder delete deleting_vol; sleep 3; kill -9 -- -`cat /opt/stack/status/stack/c-vol.pid`
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-08-09T12:26:06.000000 |
| description | None |
| encrypted | False |
| id | 58d2c5aa-9334-46a1-9246-0bc893196454 |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | downloading |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 83e1beb749d74956b664ef58c001af29 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | c21ca8dae0644e52afe624a518e5e8f2 |
| volume_type | lvmdriver-1 |
+--------------------------------+--------------------------------------+
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-08-09T12:26:08.000000 |
| description | None |
| encrypted | False |
| id | a7443e99-b87a-4e0a-bb44-6b63bdef477b |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | creating |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 83e1beb749d74956b664ef58c001af29 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | c21ca8dae0644e52afe624a518e5e8f2 |
| volume_type | lvmdriver-1 |
+--------------------------------+--------------------------------------+
+-------------+--------------------------------------+
| Property | Value |
+-------------+--------------------------------------+
| created_at | 2016-08-09T12:26:10.353392 |
| description | None |
| id | acc8b408-2148-4de7-9774-ccb123650244 |
| metadata | {} |
| name | creating_snap |
| size | 1 |
| status | creating |
| updated_at | None |
| volume_id | 6a12f169-c6a7-4de5-85f2-c8259cbd6924 |
+-------------+--------------------------------------+
Request to delete volume deleting_vol has been accepted.


user@localhost:$ mysql cinder -e 'select * from workers;'
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+-------------+------------+
| created_at | updated_at | deleted_at | deleted | id | resource_type | resource_id | status | service_id |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+-------------+------------+
| 2016-08-09 12:26:06 | 2016-08-09 12:26:08 | NULL | 0 | 45 | Volume | 58d2c5aa-9334-46a1-9246-0bc893196454 | downloading | 3 |
| 2016-08-09 12:26:08 | 2016-08-09 12:26:09 | NULL | 0 | 46 | Volume | a7443e99-b87a-4e0a-bb44-6b63bdef477b | creating | 3 |
| 2016-08-09 12:26:10 | 2016-08-09 12:26:10 | NULL | 0 | 47 | Snapshot | acc8b408-2148-4de7-9774-ccb123650244 | creating | 3 |
| 2016-08-09 12:26:11 | 2016-08-09 12:26:11 | NULL | 0 | 48 | Volume | 36fcb60b-83fc-420b-94cb-1f8f7979ea9d | deleting | 3 |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+-------------+------------+


user@localhost:$ cinder list
+--------------------------------------+-------------+--------------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-------------+--------------+------+-------------+----------+--------------------------------------+
| 36fcb60b-83fc-420b-94cb-1f8f7979ea9d | deleting | deleting_vol | 1 | lvmdriver-1 | false | |
| 58d2c5aa-9334-46a1-9246-0bc893196454 | downloading | downloading | 1 | lvmdriver-1 | false | |
| 6a12f169-c6a7-4de5-85f2-c8259cbd6924 | available | snapshot_vol | 1 | lvmdriver-1 | false | |
| a7443e99-b87a-4e0a-bb44-6b63bdef477b | creating | creating | 1 | lvmdriver-1 | false | |
| a9102b47-37ff-4fd2-a76c-44e50c00e1fd | in-use | attached | 1 | lvmdriver-1 | false | 74fa7147-a0c5-4b17-b9b3-ce4ebfbea911 |
+--------------------------------------+-------------+--------------+------+-------------+----------+--------------------------------------+


user@localhost:$ cinder snapshot-list
+--------------------------------------+--------------------------------------+----------+---------------+------+
| ID | Volume ID | Status | Name | Size |
+--------------------------------------+--------------------------------------+----------+---------------+------+
| acc8b408-2148-4de7-9774-ccb123650244 | 6a12f169-c6a7-4de5-85f2-c8259cbd6924 | creating | creating_snap | 1 |
+--------------------------------------+--------------------------------------+----------+---------------+------+

We can see how we didn’t have any entry in the workers table before we execute all the operations and killed the service, and that at the end we have expected entries in it now and that they match the status of the volumes and snapshots. This part is crucial, because if they don’t match they will not be cleaned up.

This would be a simulation of a service that is abruptly interrupted in the middle of some operations and that will need to recover on the next restart.

Before we check the restart of the service we want to remove the iSCSI target to make sure that it gets recreated on service start, but take notice that you need to change the UUID in the command by the UUID of the volume that was attached to the instance:

user@localhost:$ sudo tgt-admin --force --delete iqn.2010-10.org.openstack:volume-c632fd5d-bd05-4eda-a146-796136376ece

user@localhost:$ sudo tgtadm --lld iscsi --mode target --op show

user@localhost:$

And now we can restart the c-vol service we just killed -going to the c-vol window, pressing Ctrl+p, and hitting enter- and check that the service is actually doing what we expect it to do, which is reclaim the workers entries -changing the updated_at field since the service_id is already his- and doing the cleanup. Since most cleanups are just setting the status field to “error” we won’t see them in the workers table anymore, and only the delete operation remains until it is completed.

user@localhost:$
user@localhost:$ mysql cinder -e 'select * from workers;'
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+
| created_at | updated_at | deleted_at | deleted | id | resource_type | resource_id | status | service_id |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+
| 2016-08-09 12:26:11 | 2016-08-09 12:32:52 | NULL | 0 | 48 | Volume | 36fcb60b-83fc-420b-94cb-1f8f7979ea9d | deleting | 3 |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+


user@localhost:$ cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 36fcb60b-83fc-420b-94cb-1f8f7979ea9d | deleting | deleting_vol | 1 | lvmdriver-1 | false | |
| 58d2c5aa-9334-46a1-9246-0bc893196454 | error | downloading | 1 | lvmdriver-1 | false | |
| 6a12f169-c6a7-4de5-85f2-c8259cbd6924 | available | snapshot_vol | 1 | lvmdriver-1 | false | |
| a7443e99-b87a-4e0a-bb44-6b63bdef477b | error | creating | 1 | lvmdriver-1 | false | |
| a9102b47-37ff-4fd2-a76c-44e50c00e1fd | in-use | attached | 1 | lvmdriver-1 | false | 74fa7147-a0c5-4b17-b9b3-ce4ebfbea911 |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+


user@localhost:$ cinder snapshot-list
+--------------------------------------+--------------------------------------+--------+---------------+------+
| ID | Volume ID | Status | Name | Size |
+--------------------------------------+--------------------------------------+--------+---------------+------+
| acc8b408-2148-4de7-9774-ccb123650244 | 6a12f169-c6a7-4de5-85f2-c8259cbd6924 | error | creating_snap | 1 |
+--------------------------------------+--------------------------------------+--------+---------------+------+


user@localhost:$ sudo tgtadm --lld iscsi --mode target --op show
Target 1: iqn.2010-10.org.openstack:volume-a9102b47-37ff-4fd2-a76c-44e50c00e1fd
System information:
Driver: iscsi
State: ready
I_T nexus information:
I_T nexus: 2
Initiator: iqn.1994-05.com.redhat:d434849ec720 alias: localhost
Connection: 0
IP Address: 192.168.121.80
LUN information:
LUN: 0
Type: controller
SCSI ID: IET 00010000
SCSI SN: beaf10
Size: 0 MB, Block size: 1
Online: Yes
Removable media: No
Prevent removal: No
Readonly: No
SWP: No
Thin-provisioning: No
Backing store type: null
Backing store path: None
Backing store flags:
LUN: 1
Type: disk
SCSI ID: IET 00010001
SCSI SN: beaf11
Size: 1074 MB, Block size: 512
Online: Yes
Removable media: No
Prevent removal: No
Readonly: No
SWP: No
Thin-provisioning: No
Backing store type: rdwr
Backing store path: /dev/stack-volumes-lvmdriver-1/volume-a9102b47-37ff-4fd2-a76c-44e50c00e1fd
Backing store flags:
Account information:
MoYuxFwQJQvaWJNmz47H
ACL information:
ALL

And after 30 seconds or so the volume named “deleting_vol” will finish deleting and we won’t have the workers table entry anymore:

user@localhost:$ cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 58d2c5aa-9334-46a1-9246-0bc893196454 | error | downloading | 1 | lvmdriver-1 | false | |
| 6a12f169-c6a7-4de5-85f2-c8259cbd6924 | available | snapshot_vol | 1 | lvmdriver-1 | false | |
| a7443e99-b87a-4e0a-bb44-6b63bdef477b | error | creating | 1 | lvmdriver-1 | false | |
| a9102b47-37ff-4fd2-a76c-44e50c00e1fd | in-use | attached | 1 | lvmdriver-1 | false | 74fa7147-a0c5-4b17-b9b3-ce4ebfbea911 |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+

user@localhost:$ mysql cinder -e 'select * from workers;'

user@localhost:$

2. Clustered tests

One thing to remember -as it will help us know where we can expect operations to go- is that clustered operations are scheduled using round-robin. So if we create 2 volumes one will be created in each service, and if we don an attach, each service will perform one part of the attachment, since we have the reservation and the connection initiation.

It’s important to remember that we already have some resources -volumes and snapshots- in our backend, and we should check the contents of the DB for the volumes to confirm that they don’t belong to any cluster.

user@localhost:$ mysql cinder -e 'select display_name, id, status, host, cluster_name from volumes where not deleted;'
+--------------+--------------------------------------+-----------+-------------------------------+--------------+
| display_name | id | status | host | cluster_name |
+--------------+--------------------------------------+-----------+-------------------------------+--------------+
| downloading | 58d2c5aa-9334-46a1-9246-0bc893196454 | error | host1@lvmdriver-1#lvmdriver-1 | NULL |
| snapshot_vol | 6a12f169-c6a7-4de5-85f2-c8259cbd6924 | available | host1@lvmdriver-1#lvmdriver-1 | NULL |
| creating | a7443e99-b87a-4e0a-bb44-6b63bdef477b | error | host1@lvmdriver-1#lvmdriver-1 | NULL |
| attached | a9102b47-37ff-4fd2-a76c-44e50c00e1fd | in-use | host1@lvmdriver-1#lvmdriver-1 | NULL |
+--------------+--------------------------------------+-----------+-------------------------------+--------------+

Now we have to stop the c-vol service pressing Ctr+c in the c-vol window and then press Ctrl+p to bring back the command that started the service so we can modify the command and add --config-file /etc/cinder/host1.conf right after --config-file /etc/cinder/cinder.conf before running the command. With this we are effecitvely starting the service in the cluster. The command would look like this:

user@localhost:$ usr/bin/cinder-volume --config-file /etc/cinder/cinder.conf --config-file /etc/cinder/host1.conf & echo $! >/opt/stack/status/stack/c-vol.pid; fg || echo "c-vol failed to start" | tee "/opt/stack/status/stack/c-vol.failure"

Now we go to the c-vol2 window and run the command that is already written there.

2.0 – Sanity checks

We now have 2 services running in the same cluster.

  • Check cluster status
user@localhost:$ cinder --os-volume-api-version 3.11 cluster-list --detail
+-----------------------+---------------+-------+---------+-----------+----------------+----------------------------+-----------------+----------------------------+------------+
| Name | Binary | State | Status | Num Hosts | Num Down Hosts | Last Heartbeat | Disabled Reason | Created At | Updated at |
+-----------------------+---------------+-------+---------+-----------+----------------+----------------------------+-----------------+----------------------------+------------+
| mycluster@lvmdriver-1 | cinder-volume | up | enabled | 2 | 0 | 2016-08-09T13:42:20.000000 | - | 2016-08-09T13:41:13.000000 | |
| mycluster@lvmdriver-2 | cinder-volume | up | enabled | 2 | 0 | 2016-08-09T13:42:20.000000 | - | 2016-08-09T13:41:13.000000 | |
+-----------------------+---------------+-------+---------+-----------+----------------+----------------------------+-----------------+----------------------------+------------+

  • Check services status:
user@localhost:$ cinder --os-volume-api-version 3.11 service-list
+------------------+-------------------+------+---------+-------+----------------------------+-----------------------+-----------------+
| Binary | Host | Zone | Status | State | Updated_at | Cluster | Disabled Reason |
+------------------+-------------------+------+---------+-------+----------------------------+-----------------------+-----------------+
| cinder-backup | host1 | nova | enabled | up | 2016-08-09T13:42:45.000000 | - | - |
| cinder-scheduler | host1 | nova | enabled | up | 2016-08-09T13:42:47.000000 | - | - |
| cinder-volume | host1@lvmdriver-1 | nova | enabled | up | 2016-08-09T13:42:50.000000 | mycluster@lvmdriver-1 | - |
| cinder-volume | host1@lvmdriver-2 | nova | enabled | up | 2016-08-09T13:42:50.000000 | mycluster@lvmdriver-2 | - |
| cinder-volume | host2@lvmdriver-1 | nova | enabled | up | 2016-08-09T13:42:46.000000 | mycluster@lvmdriver-1 | - |
| cinder-volume | host2@lvmdriver-2 | nova | enabled | up | 2016-08-09T13:42:46.000000 | mycluster@lvmdriver-2 | - |
+------------------+-------------------+------+---------+-------+----------------------------+-----------------------+-----------------+

  • Check RabbitMQ cluster queue:
user@localhost:$ sudo rabbitmqctl list_queues name | grep cinder-volume.mycluster
cinder-volume.mycluster@lvmdriver-2
cinder-volume.mycluster@lvmdriver-1

  • Check existing volumes were moved to the cluster when the service was started
user@localhost:$ mysql cinder -e 'select display_name, id, status, host, cluster_name from volumes where not deleted;'
+--------------+--------------------------------------+-----------+-------------------------------+-----------------------------------+
| display_name | id | status | host | cluster_name |
+--------------+--------------------------------------+-----------+-------------------------------+-----------------------------------+
| downloading | 58d2c5aa-9334-46a1-9246-0bc893196454 | error | host1@lvmdriver-1#lvmdriver-1 | mycluster@lvmdriver-1#lvmdriver-1 |
| snapshot_vol | 6a12f169-c6a7-4de5-85f2-c8259cbd6924 | available | host1@lvmdriver-1#lvmdriver-1 | mycluster@lvmdriver-1#lvmdriver-1 |
| creating | a7443e99-b87a-4e0a-bb44-6b63bdef477b | error | host1@lvmdriver-1#lvmdriver-1 | mycluster@lvmdriver-1#lvmdriver-1 |
| attached | a9102b47-37ff-4fd2-a76c-44e50c00e1fd | in-use | host1@lvmdriver-1#lvmdriver-1 | mycluster@lvmdriver-1#lvmdriver-1 |
+--------------+--------------------------------------+-----------+-------------------------------+-----------------------------------+

2.1 – Volume creation

The most basic thing we need to test is that we are able to create a volume in a cluster and that we are creating the workers table entry. To see that we are really sending it to the cluster we’ll just create 2 volumes instead of one. It is useful to have c-vol and c-vol2 windows open to see in the logs how each one is processing one of the creations.

user@localhost:$ cinder create --name host1 1; cinder create --name host2 1; sleep 3; mysql cinder -e 'select * from workers;'
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-08-09T14:04:15.000000 |
| description | None |
| encrypted | False |
| id | a592ff26-d70c-4a0e-92a3-ad3f5b8ac599 |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | host1 |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 83e1beb749d74956b664ef58c001af29 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | c21ca8dae0644e52afe624a518e5e8f2 |
| volume_type | lvmdriver-1 |
+--------------------------------+--------------------------------------+
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2016-08-09T14:04:16.000000 |
| description | None |
| encrypted | False |
| id | 271aac9a-e9c2-4a89-87d2-c6fd13d81a5d |
| metadata | {} |
| migration_status | None |
| multiattach | False |
| name | host2 |
| os-vol-host-attr:host | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | 83e1beb749d74956b664ef58c001af29 |
| replication_status | disabled |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| updated_at | None |
| user_id | c21ca8dae0644e52afe624a518e5e8f2 |
| volume_type | lvmdriver-1 |
+--------------------------------+--------------------------------------+
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+
| created_at | updated_at | deleted_at | deleted | id | resource_type | resource_id | status | service_id |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+
| 2016-08-09 14:04:15 | 2016-08-09 14:04:15 | NULL | 0 | 53 | Volume | a592ff26-d70c-4a0e-92a3-ad3f5b8ac599 | creating | 3 |
| 2016-08-09 14:04:16 | 2016-08-09 14:04:17 | NULL | 0 | 54 | Volume | 271aac9a-e9c2-4a89-87d2-c6fd13d81a5d | creating | 5 |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+

We can see that we have 2 new entries in the workers table for the volumes we are creating, and service 3 and service 5 are performing these operations.

It is important that once the volume has been created we check that the workers table is empty and both volumes are in “available” status.

user@localhost:$ cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 271aac9a-e9c2-4a89-87d2-c6fd13d81a5d | available | host2 | 1 | lvmdriver-1 | false | |
| 58d2c5aa-9334-46a1-9246-0bc893196454 | error | downloading | 1 | lvmdriver-1 | false | |
| 6a12f169-c6a7-4de5-85f2-c8259cbd6924 | available | snapshot_vol | 1 | lvmdriver-1 | false | |
| a592ff26-d70c-4a0e-92a3-ad3f5b8ac599 | available | host1 | 1 | lvmdriver-1 | false | |
| a7443e99-b87a-4e0a-bb44-6b63bdef477b | error | creating | 1 | lvmdriver-1 | false | |
| a9102b47-37ff-4fd2-a76c-44e50c00e1fd | in-use | attached | 1 | lvmdriver-1 | false | 74fa7147-a0c5-4b17-b9b3-ce4ebfbea911 |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+

user@localhost:$ mysql cinder -e 'select * from workers;'

user@localhost:$

If we check the DB contents for the new volumes we’ll have an unexpected surprise:

user@localhost:$ mysql cinder -e 'select display_name, id, status, host, cluster_name from volumes where display_name in ("host1", "host2");'
+--------------+--------------------------------------+-----------+-------------------------------+-----------------------------------+
| display_name | id | status | host | cluster_name |
+--------------+--------------------------------------+-----------+-------------------------------+-----------------------------------+
| host2 | 271aac9a-e9c2-4a89-87d2-c6fd13d81a5d | available | host1@lvmdriver-1#lvmdriver-1 | mycluster@lvmdriver-1#lvmdriver-1 |
| host1 | a592ff26-d70c-4a0e-92a3-ad3f5b8ac599 | available | host1@lvmdriver-1#lvmdriver-1 | mycluster@lvmdriver-1#lvmdriver-1 |
+--------------+--------------------------------------+-----------+-------------------------------+-----------------------------------+

As you can see both have the host field set to “host1@lvmdriver-1#lvmdriver-1” even though host2 was created in host2. You may think that this is a mistake, but it’s not, it’s what we can expect, since the scheduler doesn’t know which host from the cluster will be taking the job it just assigns one host that is up.

This will not be an issue for the cleanup and it’s only relevant for operations that have not cluster aware yet, as they will all be going to the same host instead of distributed among all the hosts as we’ll see in the next section.

2.2 – Snapshot creation

Snapshot creation is not yet cluster aware, so it will still use the host DB field to direct the job. This will soon change, but it’s a good opportunity to illustrate what I meant in the previous test. Creating 2 snapshots from the volumes we created in the previous section we’ll see how they are both handled by “host1”.

user@localhost:$ cinder snapshot-create host1 --name host1_snap; cinder snapshot-create host2 --name host2_snap; sleep 3; mysql cinder -e 'select * from workers;'
+-------------+--------------------------------------+
| Property | Value |
+-------------+--------------------------------------+
| created_at | 2016-08-09T14:21:28.497009 |
| description | None |
| id | 7d0923dd-c666-41df-ab12-2887e6a04bc3 |
| metadata | {} |
| name | host1_snap |
| size | 1 |
| status | creating |
| updated_at | None |
| volume_id | a592ff26-d70c-4a0e-92a3-ad3f5b8ac599 |
+-------------+--------------------------------------+
+-------------+--------------------------------------+
| Property | Value |
+-------------+--------------------------------------+
| created_at | 2016-08-09T14:21:30.478635 |
| description | None |
| id | 6d2124b8-2cdd-48ad-b525-27db18470587 |
| metadata | {} |
| name | host2_snap |
| size | 1 |
| status | creating |
| updated_at | None |
| volume_id | 271aac9a-e9c2-4a89-87d2-c6fd13d81a5d |
+-------------+--------------------------------------+
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+
| created_at | updated_at | deleted_at | deleted | id | resource_type | resource_id | status | service_id |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+
| 2016-08-09 14:21:28 | 2016-08-09 14:21:28 | NULL | 0 | 55 | Snapshot | 7d0923dd-c666-41df-ab12-2887e6a04bc3 | creating | 3 |
| 2016-08-09 14:21:30 | 2016-08-09 14:21:30 | NULL | 0 | 56 | Snapshot | 6d2124b8-2cdd-48ad-b525-27db18470587 | creating | 3 |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+

We can see that we have 2 new entries in the workers table for the volumes we are creating, and both are being executed by service 3 (“host1”).

As usual, we check the results:

user@localhost:$ cinder snapshot-list
+--------------------------------------+--------------------------------------+-----------+---------------+------+
| ID | Volume ID | Status | Name | Size |
+--------------------------------------+--------------------------------------+-----------+---------------+------+
| 6d2124b8-2cdd-48ad-b525-27db18470587 | 271aac9a-e9c2-4a89-87d2-c6fd13d81a5d | available | host2_snap | 1 |
| 7d0923dd-c666-41df-ab12-2887e6a04bc3 | a592ff26-d70c-4a0e-92a3-ad3f5b8ac599 | available | host1_snap | 1 |
| acc8b408-2148-4de7-9774-ccb123650244 | 6a12f169-c6a7-4de5-85f2-c8259cbd6924 | error | creating_snap | 1 |
+--------------------------------------+--------------------------------------+-----------+---------------+------+


user@localhost:$ mysql cinder -e 'select * from workers;'

user@localhost:$

2.3 – Deletion

Volume, Snapshot, Consistency Group, and Consistency Group Snapshot deletions are cluster aware, so even though both snapshots were created in “host1”, deletion will be spread between the 2 hosts, as we can see in the logs.

user@localhost:$ cinder snapshot-delete host1_snap; cinder snapshot-delete host2_snap; cinder delete downloading; sleep 3; mysql cinder -e 'select * from workers;'
equest to delete volume downloading has been accepted.
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+
| created_at | updated_at | deleted_at | deleted | id | resource_type | resource_id | status | service_id |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+
| 2016-08-09 15:40:53 | 2016-08-09 15:40:53 | NULL | 0 | 59 | Volume | 58d2c5aa-9334-46a1-9246-0bc893196454 | deleting | 3 |
+---------------------+---------------------+------------+---------+----+---------------+--------------------------------------+----------+------------+


user@localhost:$ cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+
| 271aac9a-e9c2-4a89-87d2-c6fd13d81a5d | available | host2 | 1 | lvmdriver-1 | false | |
| 6a12f169-c6a7-4de5-85f2-c8259cbd6924 | available | snapshot_vol | 1 | lvmdriver-1 | false | |
| a592ff26-d70c-4a0e-92a3-ad3f5b8ac599 | available | host1 | 1 | lvmdriver-1 | false | |
| a7443e99-b87a-4e0a-bb44-6b63bdef477b | error | creating | 1 | lvmdriver-1 | false | |
| a9102b47-37ff-4fd2-a76c-44e50c00e1fd | in-use | attached | 1 | lvmdriver-1 | false | 74fa7147-a0c5-4b17-b9b3-ce4ebfbea911 |
+--------------------------------------+-----------+--------------+------+-------------+----------+--------------------------------------+


user@localhost:$ cinder snapshot-list
+--------------------------------------+--------------------------------------+--------+---------------+------+
| ID | Volume ID | Status | Name | Size |
+--------------------------------------+--------------------------------------+--------+---------------+------+
| acc8b408-2148-4de7-9774-ccb123650244 | 6a12f169-c6a7-4de5-85f2-c8259cbd6924 | error | creating_snap | 1 |
+--------------------------------------+--------------------------------------+--------+---------------+------+


user@localhost:$ mysql cinder -e 'select * from workers;'

user@localhost:$

If you are wondering why we don’t have workers table entries for the deletion, that’s because snapshot deletion is not cleanable in existing code. So it’s something we’ll probably want to add, but it’s not specific to the High Availability Active-Active work.

2.4 – Attach volume

We can attach one of the newly created volumes to our existing VM and see in the logs how it’s handled by the 2 services.

user@localhost:$ nova volume-attach myvm a592ff26-d70c-4a0e-92a3-ad3f5b8ac599
+----------+--------------------------------------+
| Property | Value |
+----------+--------------------------------------+
| device | /dev/vdc |
| id | a592ff26-d70c-4a0e-92a3-ad3f5b8ac599 |
| serverId | 74fa7147-a0c5-4b17-b9b3-ce4ebfbea911 |
| volumeId | a592ff26-d70c-4a0e-92a3-ad3f5b8ac599 |
+----------+--------------------------------------+

2.5 – Detach volume

Now we’ll stop the “host1” service to confirm that it doesn’t matter, because even though the host field is set to “host1”, the “host2” can handle the detach operation as well since it’s in the same cluster:

Go to the c-vol window and stop the service using Ctrl+c, execute the following command while looking at the c-vol2 log:

user@localhost:$ cinder --os-volume-api-version 3.11 cluster-list --detail

After a little bit we can see how the service is no longer considered to be alive in the cluster and is reported as being down:

user@localhost:$ cinder --os-volume-api-version 3.11 cluster-list --detail
+-----------------------+---------------+-------+---------+-----------+----------------+----------------------------+-----------------+----------------------------+------------+
| Name | Binary | State | Status | Num Hosts | Num Down Hosts | Last Heartbeat | Disabled Reason | Created At | Updated at |
+-----------------------+---------------+-------+---------+-----------+----------------+----------------------------+-----------------+----------------------------+------------+
| mycluster@lvmdriver-1 | cinder-volume | up | enabled | 2 | 1 | 2016-08-09T15:52:55.000000 | - | 2016-08-09T13:41:13.000000 | |
| mycluster@lvmdriver-2 | cinder-volume | up | enabled | 2 | 1 | 2016-08-09T15:52:55.000000 | - | 2016-08-09T13:41:13.000000 | |
+-----------------------+---------------+-------+---------+-----------+----------------+----------------------------+-----------------+----------------------------+------------+

Future manual tests

These are just some manual tests to illustrate how things work and how we can test things, giving what I think is a good sample of different scenarios. I have tested more cases and manually debugged the code to ensure that specific paths are being followed in some corner cases, but I think these are a good start and can serve as a guide for people to test the other cluster aware operations.

I have tested more cluster aware operations than the ones I have included here, but I think these give a decent sample on the tests and expected results, so more tests can be easily done in this setup, and we can also easily add another cinder-volume service that is out of the cluster to this setup to confirm they can peacefully coexist.

I’ll be doing more tests on all operations that are already cluster aware, current and new, and even more once new features are added, like triggering service cleanups from the API.

Current cluster aware operations


– Manage existing
– Get pools
– Create volume
– Retype
– Migrate volume to host
– Create consistency group
– Update service capabilities
– Delete volume
– Delete snapshot
– Delete consistency group
– Delete consistency group snapshot
– Attach volume
– Detach volume
– Initialize connection
– Terminate connection
– Remove export

With the latest patches all operation in the Cinder volume service are cluster aware.

On Automatic Tests

I believe that to be reasonably certain that the Active-Active implementation is reliable we’ll need some sort of Fault Injection mechanism for Cinder intended not for real deployments, like Tempest is, but designed for testing deployments.

The reason for this is that you cannot automatically create a real life workload, make it fail, and then check the results without a really knowing what specific part of the code was actually being run at the moment the failure occurred. Some failures can be externally simulated, but the simultaion of others present their own challenges.

Again we’ll take the crawl-walk-run approach beginning with manual tests, then some kind of automated tests, then add multi-node CI jobs, and finally -hopefully- introducing the Fault Injection mechanism to add additional tests.


Picture: “Checklist” by [Claire Dela Cruz](https://pixabay.com/en/users
/ellisedelacruz-2310550/) is licensed under CC0 1.0


Cinder Active-Active HA – Newton mid-cycle

 

Last week took place the OpenStack Cinder mid-cycle sprint in Fort Collins, and on the first day we discussed the Active-Active HA effort that’s been going on for a while now and the plans for the future. This is a summary of that session.

Just like in previous mid-cycles the Cinder community did its best to accommodate remote attendees and make them feel included in the sessions with hangouts, live video streaming, IRC pings as reminders, and even moving the microphone around the room so they could hear better, and it’s an effort that I for one appreciate.

Above video includes 2 sessions, the first one is the discussion about the mascot, and the second is the one related to Active-Active, which begins 23 minutes and 34 seconds into the video, you can skip it manually or just jump right in there here.

We had a clear agenda on the Etherpad prior to the meeting, and even if we didn’t follow the order closely, we did cover everything in it and some more:

  • Status of patches
  • Status of testing
  • Help with reviewing
  • Set goals for Newton: DB changes

Getting the code in

During the Austin summit we reached an agreement related to the job distribution for clusters, and the patches were changed accordingly, so we are now ready to start merging them; but people are having problems reviewing patches because there is too much to process.

Even if the Cinder A-A blueprint has links to all specs and patches, it’s hard sometimes to understand what each patch of the series is trying to accomplish in the grand scheme of things.

This is understandable because with the new job distribution approach now patches from different specs are interleaved in the chain of patches. That’s why I try to keep specs up to date with any change or additional information I deem relevant while coding the feature, and one must pay attention to the commit message on the review to see to which one of the specs the patch is related to.

But this may not be enough, and maybe even after having read the specs there are things that aren’t clear in the patches, and that would mean that either the specs lack information or are not clear enough and needed more work, or the problem lay in the patches and they required more or better comments, or refactoring to make them easier to follow. In any case, we agreed that reviewers should not be coy about things that are not clear and I would make the appropriate changes to specs or code to fix it.

There were a couple of controversial patches at the beginning of the chain of patches that would delay the merging of the rest of the patches until an agreement could be reached. So Dulek suggested to remove them from the chain and rework the other patches as necessary, if this wasn’t too much work, to move things along. Since this mostly meant adding some more code to a couple of patches seemed reasonable and we agreed to have them separated from the chain.

We want to review and merge as many patches as possible in Newton, but we want to keep the risk of disruption low, so a list of low risk patches that can be reviewed and merged in N is necessary to focus our efforts.

Testing

Quoting Bruce Eckel, “If it’s not tested, it’s broken”, so we are going to need some multinode automated tests at the gate, and the Tempest patch for this is under review, but that won’t be enough, because Tempest is meant for API validation, not negative tests, and much less the kind of tests that are required to verify the correctness of an Active-Active cluster on failure.

To have proper tests we’ll need a fault injection mechanism, but that’s a big effort that probably will not be in time for the tech preview release of Active-Active HA, so for now we’ll go with manual testing forcing corner case error conditions as well as normal error conditions and non error conditions.

We agree that the tempest tests together with this manual testing will be enough for the tech-preview, but the manual process we follow will need to be properly documented so people can see which test cases are being checked -managing their expectations accordingly and allowing them to suggest new tests- and serve as a starting point for the automated tests.

In order to coordinate testing, resolve doubts, and create focused reviews, Scottda suggests creating a Working Group that will meet weekly on IRC, and we all vote in favor.

Drivers

We discussed multiple topics related to drivers and Active-Active configurations, the first one was that they need to be ready for the feature if they don’t want to be at a disadvantage for not being able to support this feature. And they can already start testing their own drivers today with existing patches, although there is not documentation to help them in the process.

Szymon pointed out that there is an issue with drivers inheriting from the same base when the base class uses external locks, because that means that changing the base class to use distributed locks changes all drivers at the same time, and that’s not ideal as some drivers may not want or need them. The case presented was the RemoteFS driver and inheriting drivers

Proposed approach is creating 2 different classes, one that leaves all drivers as they are, using local locks, and another class that uses Tooz abstraction layer for drivers that want to use distributed locks. To this issue I suggested an alternative approach changing the base class to use Tooz abstraction layer which uses the default local file locks, equivalent to the ones the drivers are using, and moving the Tooz configuration options from the DEFAULT section to the driver specific section, that way we can have a multibackend deployment where one driver uses local locks and the other uses distributed locks.

Review sprint

On Friday, after all the sessions, the Cinder team decided to give a big push to Active-Active patches, so we all synchronized -those in Fort Collins and those attending remotely- for a reviewing sprint where I would be updating patches and replying to gerrit and IRC comments as they were going through the patches. And it was a great success, as we were able to go through the 12 patches -many of considerable size- that we wanted to get merged in Newton.

Some of the patches haven’t merged yet, but they are in the process of merging and being verified by Zuul at this moment.

Actions

During the meeting we agreed upon the following actions:

  • Remove the 2 controversial patches from the chain and update other patches ➞ Done
  • Create a list of low risk patches that can merge in N ➞ First patch in the series that shouldn’t merge is https://review.openstack.org/303021
  • Create a document with manual tests
  • Look into fault injection
  • Pick a time for the Work Group’s weekly meeting ➞ Will be held on #openstack-cinder on Tuesdays at 16:00 UTC, as announced on the dev mailing list

Other

Session notes were taken in Etherpad as usual and they are split according to the day they were help, but they are all linked in the mid-cycle Cinder Etherpad and are available for reference.

All videos from the mid-cycle are also available at the openstack-cinder YouTube channel.