So, it’s been over a year since my last post? Well, I didn’t give an SLA when I created this blog, and I didn’t give a defined set of topics that would be covered either. Anyhow, this is going to be a little higher level than my previous programming posts.
The Python Object Model
At the core of the Python object model are two types: type and object. They have a curious relation: type is derived from object, and object is an instance of type. In fact, type is also an instance of type. To look through the object model, we are going to walk through the process of creating a new class step by step, by invoking the type method (Python class definitions are just syntactic sugar for this):
- We’ll invoke T = type(“T”, (object,), {}), creating a new type named “T”, derived from object, and with no new methods
- type is an object, so we will invoke the __call__ operator. Operators are always looked up on the object’s class, which in this case is type itself.
- Therefore, we will be invoking __getattribute__ on type for __call__. type defines a __call__ method, so __getattribute__ doesn’t have to do any complex lookups.
- __call__ is a function, and all functions are descriptors, so it is time for an interlude…
Descriptors
Descriptors are also objects. The most well known type of descriptor is that returned by property (property is actually a class!), but classmethod and staticmethod also return descriptors. Descriptors come in two types:
- Non-data descriptors, which implement the __get__ method, and which are only invoked if a member of the object’s class, and are overriden by members on the object’s instance dictionary
- Data descriptors, which implement the __set__ method, override accesses to the instance dictionary
If a descriptor is looked up, then the method corresponding to the action will be invoked
- __getattribute__ -> __get__
- __setattr__ -> __set__
- __delattr__ -> __del__
So, a function is a descriptor. It’s irrelevant to this case, but it is actually a non-data descriptor. So…
- __get__returns a bound method (Ever wondered where those were created? Now we know!), which fills in the self parameter for calls
- Its now time to call the bound method. Now, as functions (and bound methods) are actually objects, we could follow the above again. However, it is easy to see how this could result in infinite recursion; fortunately, this doesn’t happen: Python recognizes that we are calling a function, and we don’t have to repeat the above
- Now we begin standard class construction: we will invoke type.__new__(type, (object,), {}) followed by a do-nothing type.__init__ method.
Constructing a new class is pretty simple: We create a new object with its instance dictionary set to the members dictionary passed as type’s third argument.
Making an instance of an object
Admittedly, we made an instance of an object above; but the case of creating a new type is rather different from the normal (because type is its own type). So, lets create a new instance of the type T we created above
- We’ll do O = T()
- T is again an object, so we will invoke __call__ again. Again, T is specifically a type, so we will be invoking type.__call__(T)
- type.__call__(T) will look up T.__new__. T doesn’t define a new method, so we will look in the next type in the method resolution order. Since T singly inherits (from object), we will be looking up __new__ in object.
- object.__new__(T) will create a new instance of T, and set it up as a normal Python object (as you might expect from object.__new__(T))
- type.__call__(T) will next invoke object.__init__(T), for the same reasons. Object’s __init__ method does nothing, so nothing of interest happens here
- type.__call__(T) will return the newly created object
Summarizing it all
We can see that, from a couple of rules, we can derive the Python object system:
- We can separate lookups into two types: operator lookups and normal lookups
- Operator lookups always look up on the type of the object
- Normal lookups look up first on the object, and then on the type
- Lookups always involve invoking the __getattribute__ operator
- If a data descriptor is found during a lookup, it is always invoked
- If a non-data descriptor is found during a lookup, it is only invoked if it was found somewhere but the instance dictionary of the object being interrogated
At its core, the Python object model is incredibly simple (excluding some complexity around descriptors, which is, in my opinion, justified because it makes using it more inituitive). It is, however, somewhat hard to inspect from outside; the core workings of it are not particularly brilliantly documented and are implemented in C, and some of the interactions are hard to identify without re-implementing it yourself (In fact, much of this was discerned by reimplementing it in Lua)
