Thoughts on the Python object model

So, it’s been over a year since my last post? Well, I didn’t give an SLA when I created this blog, and I didn’t give a defined set of topics that would be covered either. Anyhow, this is going to be a little higher level than my previous programming posts.

The Python Object Model

At the core of the Python object model are two types: type and object. They have a curious relation: type is derived from  object, and object is an instance of type. In fact, type is also an instance of type. To look through the object model, we are going to walk through the process of creating a new class step by step, by invoking the type method (Python class definitions are just syntactic sugar for this):

  • We’ll invoke T = type(“T”, (object,), {}), creating a new type named “T”, derived from object, and with no new methods
  • type is an object, so we will invoke the __call__ operator. Operators are always looked up on the object’s class, which in this case is type itself.
  • Therefore, we will be invoking __getattribute__ on type for __call__type defines a __call__ method, so __getattribute__ doesn’t have to do any complex lookups.
  • __call__ is a function, and all functions are descriptors, so it is time for an interlude…

Descriptors

Descriptors are also objects. The most well known type of descriptor is that returned by property (property is actually a class!), but classmethod and staticmethod also return descriptors. Descriptors come in two types:

  • Non-data descriptors, which implement the __get__ method, and which are only invoked if a member of the object’s class, and are overriden by members on the object’s instance dictionary
  • Data descriptors, which implement the __set__ method, override accesses to the instance dictionary

If a descriptor is looked up, then the method corresponding to the action will be invoked

  • __getattribute__ -> __get__
  • __setattr__ -> __set__
  • __delattr__ -> __del__

So, a function is a descriptor. It’s irrelevant to this case, but it is actually a non-data descriptor. So…

  • __get__returns a bound method (Ever wondered where those were created? Now we know!), which fills in the self parameter for calls
  • Its now time to call the bound method. Now, as functions (and bound methods) are actually objects, we could follow the above again. However, it is easy to see how this could result in infinite recursion; fortunately, this doesn’t happen: Python recognizes that we are calling a function, and we don’t have to repeat the above
  • Now we begin standard class construction: we will invoke type.__new__(type, (object,), {}) followed by a do-nothing type.__init__ method.

Constructing a new class is pretty simple: We create a new object with its instance dictionary set to the members dictionary passed as type’s third argument.

Making an instance of an object

Admittedly, we made an instance of an object above; but the case of creating a new type is rather different from the normal (because type is its own type). So, lets create a new instance of the type T we created above

  • We’ll do O = T()
  • T is again an object, so we will invoke __call__ again. Again, T is specifically a type, so we will be invoking type.__call__(T)
  • type.__call__(T) will look up T.__new__T doesn’t define a new method, so we will look in the next type in the method resolution order. Since T singly inherits (from object), we will be looking up __new__ in object.
  • object.__new__(T) will create a new instance of T, and set it up as a normal Python object (as you might expect from object.__new__(T))
  • type.__call__(T) will next invoke object.__init__(T), for the same reasons. Object’s __init__ method does nothing, so nothing of interest happens here
  • type.__call__(T) will return the newly created object

Summarizing it all

We can see that, from a couple of rules, we can derive the Python object system:

  • We can separate lookups  into two types: operator lookups and normal lookups
  • Operator lookups always look up on the type of the object
  • Normal lookups look up first on the object, and then on the type
  • Lookups always involve invoking the __getattribute__ operator
  • If a data descriptor is found during a lookup, it is always invoked
  • If a non-data descriptor is found during a lookup, it is only invoked if it was found somewhere but the instance dictionary of the object being interrogated

At its core, the Python object model is incredibly simple (excluding some complexity around descriptors, which is, in my opinion, justified because it makes using it more inituitive). It is, however, somewhat hard to inspect from outside; the core workings of it are not particularly brilliantly documented and are implemented in C, and some of the interactions are hard to identify without re-implementing it yourself (In fact, much of this was discerned by reimplementing it in Lua)

Posted in Languages, Python | Tagged , , | Leave a comment

The releng-0.5 branch… is open

EForge is a project of mine – a project to build a better project management system. The aim is quite simple: Combine the best features of systems like Trac, RedMine and SourceForge, with the best features of systems like GitHub and Gitorious.

Its not a simple process by a long shot. There is a long road ahead of us.

But we just moved one small – but oh so very important – step closer. EForge is moving to its first release, 0.5.

What is EForge today?

Continue reading

Posted in EForge | Tagged , , , | Leave a comment

The magical Futex

Its rare that computing creates something as elegant as the Futex: A simple and highly elegant system on top of which all the important synchronization primitives can be built, which has minimal overhead, and which is screamingly fast.

Its even better when they work everywhere at the same efficiency – that is, whether placed in the process’ local memory, or a shared memory segment. And its such a shame that nobody has implemented them outside of Linux.

Continue reading

Posted in Operating Systems | Tagged , | Leave a comment

I believe in UDI

UDI is the Uniform Driver Interface. It provides a standard, high performance interface for operating system device drivers, and is standardised at all the important levels – both API and ABI. This means you can take a driver – regardless of if you have the source for it or not – and use it with any UDI supporting system. There is only one problem.

And that is that, unfortunately, UDI has been almost completely ignored

Continue reading

Posted in Operating Systems | Tagged | Leave a comment

The sorry state of Unicode in C

Once upon a time…

Things were nice and simple; all characters were the same size, and that size was 8-bits. Things were easy to handle, for the most part. Much of C’s string handling is rooted in this era, with functions like isalpha inexorably tied to the English language.

Of course, since then the world has changed.

Things got ugly: Multi-byte character sets

Now, these aren’t all bad; many can be considered, for most things, like the one byte per character sets that preceded them. They maintain the invariant that the contents of a multi-byte character cannot be interpreted as a single byte character. This does of course make your coding take more space, but makes legacy programs work.

Some encodings, however, are not so kind. A good example of this is Shift-JIS. Shift-JIS reuses the single byte character space in the second bytes of multi-byte characters. This means that, if you do strchr(‘e’), you might not actually find an E.

Now things are getting really messy, because C doesn’t provide standard functions for dealing with these character sets. The other problem is that dealing with the  thousands of character sets in existence is rather complex.

Continue reading

Posted in Uncategorized | Tagged , | Leave a comment