Learning something new about Oslo Versioned Objects 2


If you work on an OpenStack project that is in the process of adopting Versioned Objects or has recently adopted them, like we have in Cinder, you’ve probably done some code reviews, and even implemented a couple of Versioned Objects yourself, and are probably thinking that you know your way around them pretty well, but maybe there are some gaps in that knowledge that you are not aware of, specially if you are taking Nova’s or Cinder’s code as reference, at least I know I had some gaps. If you aren’t a seasoned Versioned Object user like the Nova people are, I recommend you keep reading.

2.0

Context

In Cinder for the past couple of releases we’ve been working on refactoring our code to stop using ORM objects and dictionaries and switch to Versioned Objects in order to make Rolling Upgrades a reality, which they are now by the way.

Lately I’ve been working on making remaining services in Cinder Active-Active, and in the process I’ve had the opportunity to work with all the cool new toys like Microversions, and Versioned Objects for Rolling Upgrades. And after so many design sessions, summit talks, specs reviews, and code reviews I got to believe that I knew all I needed to know about them to avoid any surprise when working on my patches. But then reality came knocking at the door and slapped me with a code Exception out of nowhere that left me wondering what had hit me.

Reality knocks

I had just completed my Job Cleaning patches, I had diligently tested them in my environment forcing corner cases and checking that code behaved as expected, and tempest tests were also happy with the patches, so I was feeling good when I pushed them for review, but then a shout from a grenade CI job caught me by surprise, and I could barely understand what it was all about:

ERROR cinder.api.middleware.fault   File "/opt/stack/new/cinder/cinder/objects/base.py", line 433, in serialize_entity
ERROR cinder.api.middleware.fault     entity = entity.obj_to_primitive(backport_ver)
ERROR cinder.api.middleware.fault   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 548, in obj_to_primitive
ERROR cinder.api.middleware.fault     version_manifest)
ERROR cinder.api.middleware.fault   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 520, in obj_make_compatible_from_manifest
ERROR cinder.api.middleware.fault     return self.obj_make_compatible(primitive, target_version)
ERROR cinder.api.middleware.fault   File "/opt/stack/new/cinder/cinder/objects/volume.py", line 230, in obj_make_compatible
ERROR cinder.api.middleware.fault     super(Volume, self).obj_make_compatible(primitive, target_version)
ERROR cinder.api.middleware.fault   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 507, in obj_make_compatible
ERROR cinder.api.middleware.fault     self._obj_make_obj_compatible(primitive, target_version, key)
ERROR cinder.api.middleware.fault   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 459, in _obj_make_obj_compatible
ERROR cinder.api.middleware.fault     relationship_map = self._obj_relationship_for(field, target_version)
ERROR cinder.api.middleware.fault   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 436, in _obj_relationship_for
ERROR cinder.api.middleware.fault     reason='No rule for %s' % field)
ERROR cinder.api.middleware.fault ObjectActionError: Object action obj_make_compatible failed because: No rule for volume_type

How could it be complaining about volume_type? That wasn’t a new field I had introduced in the Volume object, in fact, all I had done was make Volume objects cleanable, which besides a version bump required to track that other services had the cleanable capability it didn’t require adding any new fields. So why was I getting this error? And why wasn’t I getting it during my tests?

The problem

After looking at our Verioned Objects code in Cinder and looking at Nova’s code I still couldn’t figure out what was wrong, so I had to dig deeper into our RPC serialization mechanism and into Oslo’s Versioned Objects code. And that’s when I learned about manifests and object relationship maps.

As it turns out, the issue was not easily triggered because it needed a Versioned Object with an ObjectField set and the service sending this object in an RPC message had to have a newer object version than the service receiving the RPC message (its version was pinned), so the serialization had to make the object compatible. And it was during this compatibilization process that the exception happened.

In both Nova and Cinder you can find the same kind of compatibilization code in the Versioned Objects and it looks like this:

def obj_make_compatible(self, primitive, target_version):
    super(Instance, self).obj_make_compatible(primitive, target_version)
    target_version = versionutils.convert_version_to_tuple(target_version)
    if target_version < (2, 1) and 'services' in primitive:
        del primitive['services']

This code is from nova/objects/instance.py for the Instance objects, and when you read it, it makes perfect sense; I mean, you make the object compatible and then you remove fields that didn’t exist in prior versions, right?

Well, this is only right in the case of Nova, because they have a completely different mechanism to make objects compatible than Cinder does. And yes, you’ve guessed it right, it’s because they have a Conductor!.

When you look at the Instance object you see that it includes other Versioned Objects using ObjectField like in the flavor and vcpu_model fields, which means that each version of an Instance object has a specific version of the Flavor and VirtCPUModel versioned objects. So Instance v1.1 may go with Flavor v1.0 and VirtCPUModel v1.1, and v1.2 of Instance may go with Flavor v1.1 and VirtCPUModel v1.1. So when you are backporting an object you need to know how these versions match one with another.

In Nova, the RPC service serializes the object in its current version as it, without modifications, and it’s the receiver who checks if received object is in the right version -one it can understand- and if it’s not, it generates the relationship manifest with the latest supported versions for all fields that are related Versioned Objects using obj_tree_get_versions method and then asks the Conductor to backport received primitive object using that manifest.

But Cinder doesn’t have Conductor, so what we do is backport the object on serialization, so we send the right object version on the RPC message using the pinned version.

And what mechanism does Cinder have for matching related objects? None.

Yes, you read that right, we don’t have one yet. But don’t fret, like I explained before this issue is not likely to happen and we’ll have this sorted out in no time.

Solution

The solution is quite easy -although it took me a while to get there- all I had to do to fix this was create a relationship map in the Volume Versioned Object to let Oslo library know the relationships between Volume object’s versions and the other object’s versions, like this:

    obj_relationships = {
        'volume_type': (('1.3', '1.0'),),
        'volume_attachment': (('1.3', '1.0'),),
        'consistencygroup': (('1.3', '1.2'),),
        'snapshots': (('1.3', '1.1'),),
    }

With that I’m telling Oslo that when backporting a Volume object to version 1.3 or later it needs to use versions 1.0 for volume_type and volume_attachment fields, version 1.2 for consistencygroup field, and version 1.3 for snapshots field.

As I was writing this map I could hear in my head the voice of my lazy self saying: “this is a pita“, “will we have to do this for every object?”, “we need to avoid this manual process somehow!”, “someone needs to create a mechanism so we can get rid of this manual process”…

So as soon as the grenade CI job confirmed that everything was good, I started looking at automating this process.

Relationship mapping mechanism

As I’ve mentioned before, in Cinder we backport objects on serialization, so we have a history dictionary that groups versions of all objects together, so we track when versions get added to Cinder and how they relate to other object’s version. And we use this history to pin services and know which versions they can accept of each object.

So the simplest solution is to use this history to automatically build the relationship mapping of objects, and all we’ll have to do is make sure that our history accurately tracks all our changes.

Based on this general idea of using the Versioned Object History I came up with a couple of solutions, one was to make a change in the serialization process and check if oslo was going to perform a backport, and in that case create the manifest and pass it to the obj_to_primitive method. And the other solution, the one I implemented in a patch, was replacing the static obj_relationships dictionary with a property that dynamically builds the map when Oslo asks for this map to do a backport.

Since the map is the same for all objects, we will use a class attribute to store the map once it gets computed the first time, so we don’t have to create it every time we do a backport of an object of that same class.

It’s important to see that this dynamic generation of the relationships map will only be triggered during rolling upgrades, and as I mentioned, only once per Versioned Object class.

The code itself is quite short, it’s mostly comments:

@property
def obj_relationships(self):
    """Dictionary with the map of versioned objects versions.

    To make objects that have other objects as fields compatible to an
    older version oslo versioned objects uses either a manifest passed to
    obj_to_primitive or the object's obj_relationships mapping.

    The obj_relationsips mapping is dictionary where each related versioned
    object (or list of version objects) has an iterable with a mapping, in
    the form of tuple, of this object's version to the related object's
    version.

    Property defined here replaces standard Versioned Object dictionary
    with the same name and dynamically builds the relationships using
    existing object history stored in OBJ_VERSIONS.

    Mapping will only be built once per class and this will only be used
    during rolling upgrades.

    Instead of creating all possible mappings we only map when the related
    object has changed, because oslo versioned object method
    `_get_subobject_version` understand that even if we have changed our
    version, the previous mapping with related object is still valid.
    """
    cls = type(self)
    if not hasattr(cls, '_obj_relationships'):
        # Get all relationships to other Versioned Objects or lists of them
        vo_cls = (fields.ObjectField, fields.ListOfObjectsField)
        relationships = {field: {} for field in cls.fields
                         if isinstance(cls.fields[field], vo_cls)}
        # Build the version map storing the latest object version for each
        # field.
        if relationships:
            # We use this to avoid unnecessary relationship duplicates
            last_version = {}
            my_name = cls.obj_name()
            for version in OBJ_VERSIONS.versions:
                my_version = OBJ_VERSIONS[version].get(my_name)
                # If this object didn't exist in this version, skip it
                if not my_version:
                    continue
                for field in relationships:
                    obj_class = cls.fields[field].objname
                    # If this field didn't exist in this version we'll
                    # set it to None, which is what's expected.
                    obj_version = OBJ_VERSIONS[version].get(obj_class)
                    # NOTE(geguileo): If version of related object hasn't
                    # changed we don't need a new entry, as previous entry
                    # will be used.  We use 0 as sentinel because None is
                    # a valid value.
                    if obj_version != last_version.get(field, 0):
                        relationships[field][my_version] = obj_version
                        last_version[field] = obj_version

        # Transform the version map to expected format
        for field in relationships:
            relationships[field] = tuple(sorted(
                relationships[field].items(),
                key=lambda x: versionutils.convert_version_to_tuple(x[0])))

        cls._obj_relationships = relationships

    return cls._obj_relationships

Instead of just adding this property to our CinderObject base Versioned Object class, I decided to create a mix-in class so our Lists of Versioned Objects could also benefit from it, allowing us to stop manually creating the child_versions dictionary in all the Versioned Objects Lists, as they can also use obj_relationships map if they don’t have the child_versions dictionary.

Keeping our History straight

Since this whole mechanism is based on the premises that our history is accurate, I had a look at our current history and realized that we kept forgetting to bump our Lists’ version when we bumped the version of the object it contains. And that is a problem, because that means that when we pass a list of objects in an RPC message, each of those objects will get unnecessarily backported due to the version mismatch.

So I created another patch that links our Lists’ versions to the version of the object it contains and modifies the version of those linked list objects when the link source object gets added.

It’s probably easier to see with an example, so let’s say we have made changes to the Volume Versioned Object that is linked with the VolumeList Versioned Object, so we bump the VERSION of the Volume class in cinder/objects/volume.py from ‘1.2’ to ‘1.3’, and go to cinder/objects/base.py and add OBJ_VERSIONS.add('1.4', {'Volume': '1.3'}) to add that version bump to our history. Then the add method for the history will automatically add the ‘VolumeList’: ‘1.3’ entry as well.

In the case of the Backup object we have 2 linked objects BackupList and BackupImport that will be added whenever we add a new Backup version.

This will automate our list version bumping removing our chances of missing a list bump and making sure we don’t do unnecessary backports.


Picture: “2” by Michael Holler is licensed under CC BY-NC 2.0