If you work on an OpenStack project that is in the process of adopting Versioned Objects or has recently adopted them, like we have in Cinder, you’ve probably done some code reviews, and even implemented a couple of Versioned Objects yourself, and are probably thinking that you know your way around them pretty well, but maybe there are some gaps in that knowledge that you are not aware of, specially if you are taking Nova’s or Cinder’s code as reference, at least I know I had some gaps. If you aren’t a seasoned Versioned Object user like the Nova people are, I recommend you keep reading.
Context
In Cinder for the past couple of releases we’ve been working on refactoring our code to stop using ORM objects and dictionaries and switch to Versioned Objects in order to make Rolling Upgrades a reality, which they are now by the way.
Lately I’ve been working on making remaining services in Cinder Active-Active, and in the process I’ve had the opportunity to work with all the cool new toys like Microversions, and Versioned Objects for Rolling Upgrades. And after so many design sessions, summit talks, specs reviews, and code reviews I got to believe that I knew all I needed to know about them to avoid any surprise when working on my patches. But then reality came knocking at the door and slapped me with a code Exception out of nowhere that left me wondering what had hit me.
Reality knocks
I had just completed my Job Cleaning patches, I had diligently tested them in my environment forcing corner cases and checking that code behaved as expected, and tempest tests were also happy with the patches, so I was feeling good when I pushed them for review, but then a shout from a grenade CI job caught me by surprise, and I could barely understand what it was all about:
ERROR cinder.api.middleware.fault File "/opt/stack/new/cinder/cinder/objects/base.py", line 433, in serialize_entity
ERROR cinder.api.middleware.fault entity = entity.obj_to_primitive(backport_ver)
ERROR cinder.api.middleware.fault File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 548, in obj_to_primitive
ERROR cinder.api.middleware.fault version_manifest)
ERROR cinder.api.middleware.fault File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 520, in obj_make_compatible_from_manifest
ERROR cinder.api.middleware.fault return self.obj_make_compatible(primitive, target_version)
ERROR cinder.api.middleware.fault File "/opt/stack/new/cinder/cinder/objects/volume.py", line 230, in obj_make_compatible
ERROR cinder.api.middleware.fault super(Volume, self).obj_make_compatible(primitive, target_version)
ERROR cinder.api.middleware.fault File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 507, in obj_make_compatible
ERROR cinder.api.middleware.fault self._obj_make_obj_compatible(primitive, target_version, key)
ERROR cinder.api.middleware.fault File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 459, in _obj_make_obj_compatible
ERROR cinder.api.middleware.fault relationship_map = self._obj_relationship_for(field, target_version)
ERROR cinder.api.middleware.fault File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 436, in _obj_relationship_for
ERROR cinder.api.middleware.fault reason='No rule for %s' % field)
ERROR cinder.api.middleware.fault ObjectActionError: Object action obj_make_compatible failed because: No rule for volume_type
How could it be complaining about volume_type? That wasn’t a new field I had introduced in the Volume object, in fact, all I had done was make Volume objects cleanable, which besides a version bump required to track that other services had the cleanable capability it didn’t require adding any new fields. So why was I getting this error? And why wasn’t I getting it during my tests?
The problem
After looking at our Verioned Objects code in Cinder and looking at Nova’s code I still couldn’t figure out what was wrong, so I had to dig deeper into our RPC serialization mechanism and into Oslo’s Versioned Objects code. And that’s when I learned about manifests and object relationship maps.
As it turns out, the issue was not easily triggered because it needed a Versioned Object with an ObjectField set and the service sending this object in an RPC message had to have a newer object version than the service receiving the RPC message (its version was pinned), so the serialization had to make the object compatible. And it was during this compatibilization process that the exception happened.
In both Nova and Cinder you can find the same kind of compatibilization code in the Versioned Objects and it looks like this:
def obj_make_compatible(self, primitive, target_version):
super(Instance, self).obj_make_compatible(primitive, target_version)
target_version = versionutils.convert_version_to_tuple(target_version)
if target_version < (2, 1) and 'services' in primitive:
del primitive['services']
This code is from nova/objects/instance.py for the Instance
objects, and when you read it, it makes perfect sense; I mean, you make the object compatible and then you remove fields that didn’t exist in prior versions, right?
Well, this is only right in the case of Nova, because they have a completely different mechanism to make objects compatible than Cinder does. And yes, you’ve guessed it right, it’s because they have a Conductor!.
When you look at the Instance
object you see that it includes other Versioned Objects using ObjectField
like in the flavor
and vcpu_model
fields, which means that each version of an Instance
object has a specific version of the Flavor
and VirtCPUModel
versioned objects. So Instance
v1.1 may go with Flavor
v1.0 and VirtCPUModel v1.1, and v1.2 of Instance
may go with Flavor
v1.1 and VirtCPUModel v1.1. So when you are backporting an object you need to know how these versions match one with another.
In Nova, the RPC service serializes the object in its current version as it, without modifications, and it’s the receiver who checks if received object is in the right version -one it can understand- and if it’s not, it generates the relationship manifest with the latest supported versions for all fields that are related Versioned Objects using obj_tree_get_versions
method and then asks the Conductor
to backport received primitive object using that manifest.
But Cinder doesn’t have Conductor
, so what we do is backport the object on serialization, so we send the right object version on the RPC message using the pinned version.
And what mechanism does Cinder have for matching related objects? None.
Yes, you read that right, we don’t have one yet. But don’t fret, like I explained before this issue is not likely to happen and we’ll have this sorted out in no time.
Solution
The solution is quite easy -although it took me a while to get there- all I had to do to fix this was create a relationship map in the Volume Versioned Object to let Oslo library know the relationships between Volume object’s versions and the other object’s versions, like this:
obj_relationships = {
'volume_type': (('1.3', '1.0'),),
'volume_attachment': (('1.3', '1.0'),),
'consistencygroup': (('1.3', '1.2'),),
'snapshots': (('1.3', '1.1'),),
}
With that I’m telling Oslo that when backporting a Volume
object to version 1.3 or later it needs to use versions 1.0 for volume_type
and volume_attachment
fields, version 1.2 for consistencygroup
field, and version 1.3 for snapshots
field.
As I was writing this map I could hear in my head the voice of my lazy self saying: “this is a pita“, “will we have to do this for every object?”, “we need to avoid this manual process somehow!”, “someone needs to create a mechanism so we can get rid of this manual process”…
So as soon as the grenade CI job confirmed that everything was good, I started looking at automating this process.
Relationship mapping mechanism
As I’ve mentioned before, in Cinder we backport objects on serialization, so we have a history dictionary that groups versions of all objects together, so we track when versions get added to Cinder and how they relate to other object’s version. And we use this history to pin services and know which versions they can accept of each object.
So the simplest solution is to use this history to automatically build the relationship mapping of objects, and all we’ll have to do is make sure that our history accurately tracks all our changes.
Based on this general idea of using the Versioned Object History I came up with a couple of solutions, one was to make a change in the serialization process and check if oslo was going to perform a backport, and in that case create the manifest and pass it to the obj_to_primitive
method. And the other solution, the one I implemented in a patch, was replacing the static obj_relationships
dictionary with a property that dynamically builds the map when Oslo asks for this map to do a backport.
Since the map is the same for all objects, we will use a class attribute to store the map once it gets computed the first time, so we don’t have to create it every time we do a backport of an object of that same class.
It’s important to see that this dynamic generation of the relationships map will only be triggered during rolling upgrades, and as I mentioned, only once per Versioned Object class.
The code itself is quite short, it’s mostly comments:
@property
def obj_relationships(self):
"""Dictionary with the map of versioned objects versions.
To make objects that have other objects as fields compatible to an
older version oslo versioned objects uses either a manifest passed to
obj_to_primitive or the object's obj_relationships mapping.
The obj_relationsips mapping is dictionary where each related versioned
object (or list of version objects) has an iterable with a mapping, in
the form of tuple, of this object's version to the related object's
version.
Property defined here replaces standard Versioned Object dictionary
with the same name and dynamically builds the relationships using
existing object history stored in OBJ_VERSIONS.
Mapping will only be built once per class and this will only be used
during rolling upgrades.
Instead of creating all possible mappings we only map when the related
object has changed, because oslo versioned object method
`_get_subobject_version` understand that even if we have changed our
version, the previous mapping with related object is still valid.
"""
cls = type(self)
if not hasattr(cls, '_obj_relationships'):
# Get all relationships to other Versioned Objects or lists of them
vo_cls = (fields.ObjectField, fields.ListOfObjectsField)
relationships = {field: {} for field in cls.fields
if isinstance(cls.fields[field], vo_cls)}
# Build the version map storing the latest object version for each
# field.
if relationships:
# We use this to avoid unnecessary relationship duplicates
last_version = {}
my_name = cls.obj_name()
for version in OBJ_VERSIONS.versions:
my_version = OBJ_VERSIONS[version].get(my_name)
# If this object didn't exist in this version, skip it
if not my_version:
continue
for field in relationships:
obj_class = cls.fields[field].objname
# If this field didn't exist in this version we'll
# set it to None, which is what's expected.
obj_version = OBJ_VERSIONS[version].get(obj_class)
# NOTE(geguileo): If version of related object hasn't
# changed we don't need a new entry, as previous entry
# will be used. We use 0 as sentinel because None is
# a valid value.
if obj_version != last_version.get(field, 0):
relationships[field][my_version] = obj_version
last_version[field] = obj_version
# Transform the version map to expected format
for field in relationships:
relationships[field] = tuple(sorted(
relationships[field].items(),
key=lambda x: versionutils.convert_version_to_tuple(x[0])))
cls._obj_relationships = relationships
return cls._obj_relationships
Instead of just adding this property to our CinderObject base Versioned Object class, I decided to create a mix-in class so our Lists of Versioned Objects could also benefit from it, allowing us to stop manually creating the child_versions
dictionary in all the Versioned Objects Lists, as they can also use obj_relationships
map if they don’t have the child_versions
dictionary.
Keeping our History straight
Since this whole mechanism is based on the premises that our history is accurate, I had a look at our current history and realized that we kept forgetting to bump our Lists’ version when we bumped the version of the object it contains. And that is a problem, because that means that when we pass a list of objects in an RPC message, each of those objects will get unnecessarily backported due to the version mismatch.
So I created another patch that links our Lists’ versions to the version of the object it contains and modifies the version of those linked list objects when the link source object gets added.
It’s probably easier to see with an example, so let’s say we have made changes to the Volume
Versioned Object that is linked with the VolumeList
Versioned Object, so we bump the VERSION
of the Volume
class in cinder/objects/volume.py from ‘1.2’ to ‘1.3’, and go to cinder/objects/base.py and add OBJ_VERSIONS.add('1.4', {'Volume': '1.3'})
to add that version bump to our history. Then the add
method for the history will automatically add the ‘VolumeList’: ‘1.3’ entry as well.
In the case of the Backup
object we have 2 linked objects BackupList
and BackupImport
that will be added whenever we add a new Backup
version.
This will automate our list version bumping removing our chances of missing a list bump and making sure we don’t do unnecessary backports.
Picture: “2” by Michael Holler is licensed under CC BY-NC 2.0