mikeash.com: just this guy, you know?

Posted at 2009-03-13 14:13 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2009-03-20: Objective-C Messaging
Previous article: Friday Q&A 2009-03-06: Using the Clang Static Analyzer
Tags: fridayqna objectivec
Friday Q&A 2009-03-13: Intro to the Objective-C Runtime
by Mike Ash  

Welcome back to another Friday Q&A, on another Friday the 13th. This week I'm going to take Oliver Mooney's suggestion and talk about the Objective-C runtime, how it works, and what it can do for you.

Many Cocoa programmers are only vaguely aware of the Objective-C runtime. They know it's there (and some don't even know this!), that it's important, and you can't run Objective-C without it, but that's about where it stops.

Today I want to run through exactly how Objective-C works at the runtime level and what kinds of things you can do with it.

(Note: I'll be talking only about Apple's runtime on 10.5 and later. The runtime on 10.4 and earlier is missing many APIs, instead forcing direct structure access, and the runtimes for GNU and Cocotron are different beasts entirely.)

Objects
In Objective-C we work with objects all the time, but what is an object? Well, let's take a look and construct something that will tell us about them.

First, we know that objects are referred to using pointers, like NSObject *. And we know that we create them using the +alloc method. The documentation for that just says that it calls +allocWithZone:. Following the chain of documentation a bit further, we discover NSDefaultMallocZone and see that they're just allocated using malloc. Easy!

But what do they look like when they're allocated? Let's find out:

    #import <Foundation/Foundation.h>
    
    @interface A : NSObject { @public int a; } @end
    @implementation A @end
    @interface B : A { @public int b; } @end
    @implementation B @end
    @interface C : B { @public int c; } @end
    @implementation C @end
    
    int main(int argc, char **argv)
    {
        [NSAutoreleasePool new];
        
        C *obj = [[C alloc] init];
        obj->a = 0xaaaaaaaa;
        obj->b = 0xbbbbbbbb;
        obj->c = 0xcccccccc;
        
        NSData *objData = [NSData dataWithBytes:obj length:malloc_size(obj)];
        NSLog(@"Object contains %@", objData);
        
        return 0;
    }
We construct a class hierarchy that just has some instance variables, then we put obvious values into each ivar. Then we extract the data in nice printable form using malloc_size to get the right length, and use NSData to print a nice hex representation. Here's what we get:
    2009-01-27 15:58:04.904 a.out[22090:10b] Object contains <20300000 aaaaaaaa bbbbbbbb cccccccc>
We can see here that the class just gets laid out sequentially in memory. First you have A's ivar, then B's, then C's. Easy!

But what's this 20300000 thing at the beginning? Well, it comes before A's ivar, so it must be NSObject's. Let's look at NSObject's definition:

    /***********	Base class		***********/
    
    @interface NSObject  {
        Class	isa;
    }
Sure enough, there's another ivar. But what's this Class business? If we tell Xcode to take us to the definition we find ourselves in /usr/include/objc/objc.h which contains:
    typedef struct objc_class *Class;
And following it further we get to /usr/include/objc/runtime.h which contains:
    struct objc_class {
        Class isa;
    
    #if !__OBJC2__
        Class super_class                                        OBJC2_UNAVAILABLE;
        const char *name                                         OBJC2_UNAVAILABLE;
        long version                                             OBJC2_UNAVAILABLE;
        long info                                                OBJC2_UNAVAILABLE;
        long instance_size                                       OBJC2_UNAVAILABLE;
        struct objc_ivar_list *ivars                             OBJC2_UNAVAILABLE;
        struct objc_method_list **methodLists                    OBJC2_UNAVAILABLE;
        struct objc_cache *cache                                 OBJC2_UNAVAILABLE;
        struct objc_protocol_list *protocols                     OBJC2_UNAVAILABLE;
    #endif
    
    } OBJC2_UNAVAILABLE;
So a Class is a pointer to a structure which... starts with another Class.

Let's look at another root class, NSProxy:

    @interface NSProxy  {
        Class	isa;
    }
It's there too. Let's look in one more place, the definition of id, the Objective-C type for "any object":
    typedef struct objc_object {
        Class isa;
    } *id;
There it is again. Clearly every single Objective-C object must start with Class isa, even class objects. But what is it?

As the name and type imply, the isa ivar indicates what class a particular object is. Every Objective-C object must begin with an isa pointer, otherwise the runtime won't know how to work with it. Everything about a particular object's type is wrapped up in that one little pointer. The remainder of an object is basically just a big blob and as far as the runtime is concerned, it is irrelevant. It's up to the individual classes to give that blob meaning.

Classes
What exactly do classes contain, then? The "unavailable" structure members give a good clue. (They're there for compatibility with the pre-Leopard runtime, and you shouldn't use them if you're targeting Leopard, but it still tells us what kind of information is there.) First comes the isa, which allows a class to act like an object as well. There's a pointer to the superclass, giving the proper class hierarchy. Some other basic information about the class follows. At the end is the really interesting stuff. There's a list of instance variables, a list of methods, and a list of protocols. All of this stuff is accessible at runtime, and can be modified at runtime too.

I skipped right over the cache member because it's not really useful for runtime manipulation, but it's an interesting exposure of an implementation detail. Every time you send a message ([foo bar]) the runtime has to look up the actual code to invoke by rummaging through the list of methods in the target object's class. However, methods are stored in big linear lists by default, so this is really slow. The cache is just a hash table mapping selectors to code. The first time you send a message you'll get a slow, time-consuming lookup, but the result is put in the hash table. Subsequent calls will find the entry in the hash table, making the process go much faster.

Looking at the rest of runtime.h you'll see a lot of functions for accessing and manipulating these properties. Each function is prefixed with what it operates on. General runtime functions start with objc_, functions that operate on a class start with class_, and so forth. For example, you can call class_getInstanceMethod to get information about a particular method, like the argument/return types. Or you can call class_addMethod to add a new method to an existing class at runtime. You can even create a whole new class at runtime by using objc_allocateClassPair.

Practical Applications
There are tons of useful things that can be done with this kind of runtime meta-information, but here are some ideas.

  1. Automatic ivar/method searches. Apple's Key-Value Coding does this kind of thing already: you give it a name, and it looks up a method or ivar based on that name and does some stuff with it. You can do that kind of thing yourself, in case you need to look up an ivar based on a name or something of the sort.
  2. Automatically register/invoke subclasses. Using objc_getClassList you can get a list of all classes currently known to the runtime, and by tracing out the class hierarchy, you can identify which ones subclass a given class. This can let you write subclasses to handle specialized data formats or other such situations and let the superclass look them up without having to tediously register every subclass manually.
  3. Automatically call a method on every class. This can be useful for custom unit testing frameworks and the like. Similar to #2, but look for a method being implemented rather than a particular class hierarchy.
  4. Override methods at runtime. The runtime provides a complete set of tools for re-pointing methods to custom implementations so that you can change what classes do without touching their source code.
  5. Automatically deallocate synthesized properties. The @synthesize keyword is handy for making the compiler generate setters/getters but it still forces you to write cleanup code in -dealloc. By reading meta-information about the class's properties, you can write code that will go through and clean up all synthesized properties automatically instead of having to write code for each case.
  6. Bridging. By dynamically generating classes at runtime, and by looking up the necessary properties on demand, you can create a bridge between Objective-C and another (sufficiently dynamic) language.
  7. Much more. Don't feel limited to the above, come up with your own ideas!

Wrapping Up
Objective-C is a powerful language and the comprehensive runtime API is an extremely useful part of it. While it may be a bit ugly groveling around in all that C code, it's really not that difficult to work with, and it's well worth the power it provides.

That's it for this week's Friday Q&A. Please send in your suggestions, either by posting them below or by e-mail (tell me if you don't want me to use your name). Friday Q&A runs on your suggestions so please write in!

Have a favorite use of the ObjC runtime? Something you dislike about it? Have a tip to share? Post it all below.

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Nice post Mike! I would be lax in my duties if I did not mention that while the Cocotron runtime is a different implementation the goal is to be compatible with Apple's API. The function based interface of ObjC 2..0 makes this all nicer and easier as we can be a little different under the hood if needed.

If anyone is interested, I have an implementation of "#5, Automatically deallocating synthesized properties."

http://vgable.com/blog/2008/12/20/automatically-freeing-every-property/

In retrospect I was probably overly conservative with pointers, but I've yet to use pointer-properties.
Thanks for the note about Cocotron. Your web site sure has a lot of stuff on it.
So where is the the version 2 of the runtime storing all the data about the class if not in the objc_class struct?
The 2.0 runtime is still storing everything in the objc_class struct. This is a necessity in 32-bit mode, as old code which directly manipulates these structs still has to work, so all of the data still has to be in the same place. In 64-bit mode it could be moved around, but I'll bet that it's still there. They've simply removed it from the definition to make all of this stuff "private".
It seems to me that the #if !__OBJC2__ ... #endif would erase all those declarations in the objc_class struct if compiled with the version 2 runtime. Am I missing something?
What you're missing is that they're just declarations. If you remove them, the data is still there, the compiler just won't let you access it by name anymore. The declarations in the public headers are not related to the actual structures created and used by the runtime in its implementation.
The declarations in the public headers are not related to the actual structures created and used by the runtime in its implementation.

Doesn't this mean that the struct available in runtime.h is a different size than whatever the runtime is using internally? Won't this break pointer arithmetic for struct objc_class* types? Is that just something that nothing outside of the runtime itself should ever need to do?
Yes to both. But it's not an issue, because you'll never have an array of struct objc_class. The semantics are like ObjC object pointers, and should be treated as opaque references rather than something you can do arithmetic with. You can't do pointer arithmetic on NSObject * either, and it doesn't matter for the same reasons.
Cool, thanks!
Mike, thanks for writing this article! It answered a lot of questions I had. Apple seem to have done a fair bit of work getting the runtime to be more accessible to other languages. The provision of functions to interface with runtime structures, and BridgeSupport coupled with libffi, makes interfacing much easier than in previous OS X incarnations.
nice post!!!!
specially about bridging!!
wow. explains a lot. thanks for sharing, looking forward for more articles from you.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.