mikeash.com: just this guy, you know?

Posted at 2009-05-23 00:37 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2009-06-05: Introduction to Valgrind
Previous article: Use NSOperationQueue
Tags: fridayqna objectivec
Friday Q&A 2009-05-22: Objective-C Class Loading and Initialization
by Mike Ash  

Welcome back to another cromulent Friday Q&A. After taking a few weeks off I intend to resume the regular schedule. We'll see how far that intention takes me, but I'm hopeful. This week I'm going to take Daniel Jalkut's suggestion to discuss class loading and initialization in Objective-C.

How classes actually get loaded into memory in Objective-C aren't anything that you, the programmer, need to worry about most of the time. It's a bunch of complicated stuff that's handled by the runtime linker and is long done before your code ever starts to run.

For most classes, that's all you need to worry about. But some classes need to do more, and actually run some code in order to perform some kind of setup. A class may need to initialize a global table, cache values from user defaults, or do any number of other tasks.

The Objective-C runtime uses two methods to provide this functionality: +initialize and +load.

+load
+load is invoked as the class is actually loaded, if it implements the method. This happens very early on. If you implement +load in an application or in a framework that an application links to, +load will run before main(). If you implement +load in a loadable bundle, then it runs during the bundle loading process.

Using +load can be tricky because it runs so early. Obviously some classes need to be loaded before others, so you can't be sure that your other classes have had +load invoked yet. Worse than this, C++ static initializers in your app (or framework or plugin) won't have run yet, so if you run any code that relies on that it will likely crash. The good news is that frameworks you link to are guaranteed to be fully loaded by this point, so it's safe to use framework classes. Your superclasses are also guaranteed to be fully loaded, so they are safe to use as well. Keep in mind that there's no autorelease pool present at loading time (usually) so you'll need to wrap your code in one if you're calling into Objective-C stuff.

An interesting feature of +load is that it's special-cased by the runtime to be invoked in categories which implement it as well as the main class. This means that if you implement +load in a class and in a category on that class, both will be called. This probably goes against everything you know about how categories work, but that's because +load is not a normal method. This feature means that +load is an excellent place to do evil things like method swizzling.

+initialize
The +initialize method is invoked in a more sane environment and is usually a better place to put code than +load. +initialize is interesting because it's invoked lazily and may not be invoked at all. When a class first loads, +initialize is not called. When a message is sent to a class, the runtime first checks to see if +initialize has been called yet. If not, it calls it before proceeding with the message send. Conceptually, you can think of it as working like this:

    id objc_msgSend(id self, SEL _cmd, ...)
    {
        if(!self->class->initialized)
            [self->class initialize];
        ...send the message...
    }
It is of course considerably more complex than that due to thread safety and many other fun things, but that's the basic idea. +initialize happens once per class, and it happens the first time a message is sent to that class. Like +load, +initialize is always sent to all of a class's superclasses before it's sent to the class itself.

This makes +initialize safer to use because it's usually called in a much more forgiving environment. Obviously the environment depends on exactly when that first message send happens, but it's virtually certain to at least be after your call to NSApplicationMain().

Because +initialize runs lazily, it's obviously not a good place to put code to register a class that otherwise wouldn't get used. For example, NSValueTransformer or NSURLProtocol subclasses can't use +initialize to register themselves with their superclasses, because you set up a chicken-and-egg situation.

This makes it a good place to do virtually everything else as far as class loading goes, though. The fact that it runs in a much more forgiving environment means you can be much freer with the code you write, and the fact that it runs lazily means that you don't waste resources setting your class up until your class actually gets used.

There's one more trick to +initialize. In my pseudocode above I wrote that the runtime does [self->class initialize]. This implies that normal Objective-C dispatch semantics apply, and that if the class doesn't implement it, the superclass's +initialize will run instead. That's exactly what happens. Because of this, you should always write your +initialize method to look like this:

   + (void)initialize
    {
        if(self == [WhateverClass class])
        {
            ...perform initialization...
        }
    }
Without that extra check, your initializations could run twice if you ever have a subclass that doesn't implement its own +initialize method. This is not just a theoretical concern, even if you don't write any subclasses. Apple's Key-Value Observing creates dynamic subclasses which don't override +initialize.

Conclusion
Objective-C offers two ways to automatically run class-setup code. The +load method is guaranteed to run very early, as soon as a class is loaded, and is useful for code that must also run very early. This also makes it dangerous, as it's not a very friendly environment to run it.

The +initialize method is much nicer for most setup tasks, because it runs lazily and in a nice environment. You can do pretty much anything you want from here, as long as it doesn't need to happen until some external entity messages your class.

That wraps up Friday Q&A for this week. Come back next week for another exciting edition. As always, e-mail your suggestions or post them below. Without your valuable contribution of ideas, Friday Q&A can't operate, so send yours in today!

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Thanks for the article. I have been wondering about +initialize. I can't find a corresponding de-initialize. If I allocate objects, where do they get cleaned up?

For example:

-------

static NSDateFormatter *dateFormatter = nil;
static NSCalendar *gregorian = nil;    

@implementation WhatEver

+ (void)initialize {
    if(self != [WhatEver class]) { return;    }    
    dateFormatter = [[NSDateFormatter alloc] init];
    [dateFormatter setDateFormat:@"MMM dd, yyyy"];
    gregorian = [[NSCalendar alloc] initWithCalendarIdentifier:NSGregorianCalendar];    
}
Steven: The memory is reclaimed by the application terminating ;) If you really need to do some cleanup (remove cache file or something), register an atexit() function, or subscribe to the applicationWillTerminate notification.

Mike: I love your Q&A's! Didn't know about +load, thanks :)

Back when I "coded" in C++, I wrote a StaticInitialization class that one could inherit from to get the equivalent of +load (it used a macro to define a static int being initialized by a function call). It had a function called require() that would throw an exception if the required class hadn't been +load'ed, which would be caught by that global initialize function and the throwing initializer would be put on a queue to be checked again once the require()'d class was +load'ed. Phew! It actually worked, too!
Hm, come to think of it, I even have the source here: http://tr.im/static_h http://tr.im/static_cpp
Thanks Joachim.
Will the function names that starting with initialize and load will also get while loading. Because I m seeing a strange behaviour where a function with name intializeX got called on itself.

Thanks
There is no such prefix-matching behavior, no. I'd suggest spending some quality time with the debugger to figure out what's calling these.
Im slightly confused here... + (void) initialize is a class method, so how can you access the self variable?
In a class method, self refers to the class.
Ah, I see. You learn something new every day someone teaches you something you didn't already know. Thanks man
FWIW, I've got a category on NSDate (NSDate+Helper) and at some point one of the contributors added code to use a shared, static NSCalendar object on the category, instantiated in the +load method, by calling [NSCalendar currentCalendar]. This worked fine on Debug configuration (iOS 5.1, Xcode 4.3.1, unsure if earlier versions are affected), but in Release configuration, +load must be getting called earlier than expected, perhaps before there is a currentCalendar, so it was always nil! Devil of a thing to debug, dug it up by sticking some NSLog statements in there to figure out what was going on. Certainly would appreciate any ideas on how it might be approached, for now I just removed the use of a static calendar.

More info here if anyone wants to look at the code:
https://github.com/billymeltdown/nsdate-helper/issues/7

Cheers!
The +load method can be dangerous like that. You really can't call out to any other Objective-C classes, because they may rely on +load too, and yours might run first.

My recommendation would be to use dispatch_once to lazily initialize your static variable. The additional overhead is insignificant and you completely avoid problems like this.
Ah, very nice idea, didn't know that was out there. Thanks, Mike!
Any thought about using +initialize to create a singleton with something like this?

static WhateverClass *sharedInstance;

+ (void)initialize
{
   if(self == [WhateverClass class])
   {
       sharedInstance = [[self alloc] init];
   }
}

It would mean we can get rid of all these ugly dispatch_once! It looks too good to be true!
@Mike Ash:
"You really can't call out to any other Objective-C classes, because they may rely on +load too, and yours might run first. "

However, that seems to contradict what you said in your article above: "The good news is that frameworks you link to are guaranteed to be fully loaded by this point, so it's safe to use framework classes." Since Billy Gray's category is in his framework, and it links against Foundation framework, shouldn't that mean NSCalendar's currentCalendar stuff (which is in Foundation) should have already been "+load"ed?
@bob

That just points to NSCalendar not setting currentCalendar via +load.
Hey, a quick update (improvement?) to this with dispatch_once here might be good, per http://stackoverflow.com/a/11555869/297472

+ (void) initialize {
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        // <do stuff here once, Johnny-Dangerously style>
    });
}

Great work btw.
./scc
@scottcorscadden

There is no need to add dispatch_once(). It is already provided at least since iOS 6 and presume the same for Mac OS X. You can put a breakpoint in +initialize and see it in the stack trace.

(I recall testing iOS 5 and found that it had acquired a pthread_mutex lock before calling +initialize)
@mikeash Do you have any recommendation for code that needs to run in a sane environment like +initialize, but needs not to run lazily?

My case: A class that needs to register itself to some other class from the same library, so in +load, I can't be sure that this other class is loaded yet, and I can't use +initialize becuase the class might never get messages then.
Good question! If it's reasonable, I'd put an explicit "setup" call somewhere convenient, like your app delegate. Implicit can be nice, but no need to overdo it. Absent that, I'd put a small amount of code in +load that then use dispatch_async or CFRunLoopPerformBlock to schedule the real code for execution once the main runloop starts up. It should be safe to use these APIs from +load since they're lower level, and frameworks that you link to are guaranteed to be initialized first.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.