mikeash.com: just this guy, you know?

Posted at 2011-09-02 16:30 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2011-09-16: Let's Build Reference Counting
Previous article: Friday Q&A 2011-08-19: Namespaced Constants and Functions
Tags: autorelease cocoa fridayqna letsbuild
Friday Q&A 2011-09-02: Let's Build NSAutoreleasePool
by Mike Ash  

It's that time again: time for more programming craziness. Dallas Brown suggested that I talk about how NSAutoreleasePool works behind the scenes. I decided that the best way to do that would be to simply reimplement it, and that is what I'll discuss today.

High Level Overview
The ball gets rolling when the autorelease message is sent to an object. autorelease is implemented on NSObject, and just calls through to [NSAutoreleasePool addObject: self]. This is a class method, which then needs to track down the right instance to talk to.

NSAutoreleasePool instances are stored in a per-thread stack. When a new pool is created, it gets pushed onto the top of the stack. When a pool is destroyed, it's popped off the stack. When the NSAutoreleasePool class method needs to look up the current pool, it grabs the stack for the current thread and grabs the pool at the top.

Once the right pool is found, the addObject: instance method is used to add the object to the pool. When an object is added to the pool, it's just added to a list of objects kept by the pool.

When a pool is destroyed, it goes through this list of objects and sends release to each one. This is just about all there is to it. There is one additional small complication: if a pool is destroyed which is not at the top of the stack of pools, it also destroys the other pools which sit above it. In short, NSAutoreleasePool instances nest, and if you fail to destroy an inner one, the outer one will take care of it when it gets destroyed.

Garbage Collection
NSAutoreleasePool exists under garbage collection and is even slightly functional. If you use the drain message rather than the release message, destroying a pool under garbage collection signals to the collector that this might be a good time to run a collection cycle. Aside from this, however, NSAutoreleasePool does nothing under garbage collection and isn't very interesting to consider there. For this article I will ignore garbage collection and concentrate on traditional memory management.

10.7 and ARC
10.7 got a major internal overhaul of autorelease pools, and Apple is in the middle of introducing a whole new automatic reference counting system. Although the details have changed a great deal, the concepts of how autorelease pools work have not, so everything here is still valid.

Interface
My version of an autorelease pool class will be called MAAutoreleasePool. If you'd like to look at the code in its entirety, it's available on GitHub.

This class has class and instance methods for addObject:, as well as a CFMutableArray to hold the list of autoreleased objects. CFMutableArray is used instead of NSMutableArray because the automatic retain and release behavior of NSMutableArray will interfere with what we're trying to do here. There's also the possibility that NSMutableArray would use autorelease internally, which would really mess things up. CFMutableArray can be configured not to do any memory management of its contents, which is what we're after here.

The interface then looks like this:

    @interface MAAutoreleasePool : NSObject
    {
        CFMutableArrayRef _objects;
    }

    + (void)addObject: (id)object;

    - (void)addObject: (id)object;

    @end

Additionally, to match the official implementation, there needs to be a helper method on NSObject. I called this ma_autorelease to distinguish it from the real thing and avoid name clashes:

    @interface NSObject (MAAutoreleasePool)

    - (id)ma_autorelease;

    @end

As I mentioned above, the implementation of this is just a cover on the class method:

    @implementation NSObject (MAAutoreleasePool)

    - (id)ma_autorelease
    {
        [MAAutoreleasePool addObject: self];
        return self;
    }

    @end

Pool Stack
Autorelease pools are kept in a stack. Each thread has its own stack of pools, which is used to determine which pool to put an object in when it gets autoreleased.

To encapsulate the management of the stack, I wrote a private method, +_threadPoolStack which returns the current thread's stack of MAAutoreleasePool instances, creating it if necessary. Like each pool's list of contained objects, the pool stack is a CFMutableArray in order to prevent NSMutableArray's automatic memory management from screwing things up.

The simplest way to handle thread-local storage in Cocoa is with the threadDictionary method on NSThread. This method returns an NSMutableDictionary which is unique to the current thread, and automatically destroyed when the thread terminates.

The first thing this method does, then, is fetch that dictionary and declare a unique key to associate with the pool stack:

    + (CFMutableArrayRef)_threadPoolStack
    {
        NSMutableDictionary *threadDictionary = [[NSThread currentThread] threadDictionary];

        NSString *key = @"MAAutoreleasePool thread-local pool stack";

Next, it fetches the stack (as a CFMutableArray) from that dictionary:

        CFMutableArrayRef array = (CFMutableArrayRef)[threadDictionary objectForKey: key];

The first time this method runs on any given thread, the stack won't exist yet. In that case, it needs to be created and stored in the dictionary:

        if(!array)
        {
            array = CFArrayCreateMutable(NULL, 0, NULL);
            [threadDictionary setObject: (id)array forKey: key];
            CFRelease(array);
        }

Finally, with the array either retrieved or newly created, it's returned:

        return array;
    }

Now that this is in place, the other methods can be implemented. The +addObject: method essentially just calls the above and then forwards the addObject: to the pool at the top of the stack. I added a bit of paranoia to make sure that the stack isn't empty, and to print an error if it is:

    + (void)addObject: (id)object
    {
        CFArrayRef stack = [self _threadPoolStack];
        CFIndex count = CFArrayGetCount(stack);
        if(count == 0)
        {
            fprintf(stderr, "Object of class %s autoreleased with no pool, leaking\n", class_getName(object_getClass(object)));
        }
        else
        {
            MAAutoreleasePool *pool = (id)CFArrayGetValueAtIndex(stack, count - 1);
            [pool addObject: object];
        }
    }

Instance Methods
The -init method for this class is really simple. Initialize the _objects array, add self to the top of the pool stack, and return:

    - (id)init
    {
        if((self = [super init]))
        {
            _objects = CFArrayCreateMutable(NULL, 0, NULL);
            CFArrayAppendValue([[self class] _threadPoolStack], self);
        }
        return self;
    }

The -addObject: method is even simpler: just add the given object to the end of _objects:

    - (void)addObject: (id)object
    {
        CFArrayAppendValue(_objects, object);
    }

The -dealloc method gets more interesting, due to the need to nest pools. It starts out easily enough: iterate through the _objects array and send a release to everything in it:

    - (void)dealloc
    {
        if(_objects)
        {
            for(id object in (id)_objects)
                [object release];
            CFRelease(_objects);
        }

Next, it removes self from the pool stack. Additionally, it also needs to remove any pools which sit above it in the pool stack. Because this process needs to iterate backwards and needs to be able to modify the stack while iterating, it uses a manual index-based loop:

        CFMutableArrayRef stack = [[self class] _threadPoolStack];
        CFIndex index = CFArrayGetCount(stack);
        while(index-- > 0)
        {
            MAAutoreleasePool *pool = (id)CFArrayGetValueAtIndex(stack, index);

If pool is self, then the loop is done. All that needs to be done is to remove the entry from the stack and then break out of the loop. All pools in the stack below the current one are left alone:

            if(pool == self)
            {
                CFArrayRemoveValueAtIndex(stack, index);
                break;
            }

If it's some other pool, then it needs to be destroyed. This is done by simply sending a release to it. Everything else that needs to happen will be done automatically:

            else
            {
                [pool release];
            }
        }

This may be a little hard to understand at first, and it took me a little while to realize how this code needed to be written. When this line is hit, pool is necessarily at the top of the stack. When it's released, it will be deallocated (it's not legal to retain autorelease pools). When it gets deallocated, it will call into its own -dealloc which enters into this same loop again. This loop immediately hits the pool == self condition, removing the pool from the stack and exiting. Thus, all pools on the stack above the one currently being destroyed also get destroyed and removed from the stack.

The loop is now done, and all that's left to do is call through to super:

        [super dealloc];
    }

The class is now complete!

Lessons
There are some good lessons to be learned from this exercise. Most importantly is that NSAutoreleasePool is a pretty straightforward class without much in the way of hidden gotchas. There's nothing complex going on behind the scenes. People new to Cocoa memory management often imagine that autorelease is much more complex than it really is, and ask questions like "How can I tell if an object has been autoreleased?" or "What happens if I autorelease an object twice?" Specifically, we can now see:

Conclusion
That wraps up today's exploration of Cocoa internals. Now you know approximately how NSAutoreleasePool gets its job done and how it works. The implementation specifics vary (especially on Lion), but the basic ideas are the same. By knowing how memory management internals work, you can write better and less error-prone code.

Unless you're completely new to this blog, you probably already know that Friday Q&A is driven by reader submissions. On that note, if you have a topic that you'd like to see covered, please send it in!

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Thank you for this great explanation. As a programmer trying to understand Cocoa, this explanation is really all you need to know. It is how NSAutoreleasePool conceptually works.

Of course, in reality, Apple has optimized the hell out of this class. AFAIR, it doesn't really create a new instance of the array. There is one shared CFMutableArrayRef per thread, and it just puts a marker on that when a new autorelease pool starts, and when you release a pool, releases down to that marker.

I think it even plays tricks to avoid creating a new object for every pool, but I'm not sure anymore. After all, when you release an outer pool, all the inner pools get released as well, and it couldn't do that by returning the same singleton per thread from alloc and init each time.
Lion details – not important for conceptual understanding, but fun:

In Lion, autorelease pools are implemented in the ObjC runtime (in objc-arr.mm). A single stack is maintained per thread, and manipulated through “pool tokens”. A pool token is simply a pointer to the top of an autorelease pool. (A NULL entry is kept a a boundary sentinel for verification.)

The @autorelease keyword generates calls to objc_autoreleasePoolPush() and objc_autoreleasePoolPop(), which use pool tokens directly. I presume an NSAutoreleasePool is now a simple wrapper for a pool token.

Another interesting optimization for ARC is runtime elision of autoreleases. An ARC function which semantically returns an autoreleased value can do so using objc_autoreleaseReturnValue(). If the calling function immediately calls objc_retainAutoreleasedReturnValue(id obj) – checked by looking for a specific instruction sequence – the object isn’t autoreleased, but instead stashed in a per-thread global, which causes objc_retainAutoreleasedReturnValue(id obj) to not retain.
Does this work under GCD queues: [[NSThread currentThread] threadDictionary]?

I know it's not the point of your post, but if that won't work on GCD queues then the semantics of it are quite different from NSAutoreleasePool.
Yes, NSThread's threadDictionary works fine on GCD queues. Remember that, from a plain multithreading standpoint, GCD queues aren't special in any way. They're just regular pthreads running regular code. Anything that works on a pthread works on a GCD queue, because you are on a pthread. The only trick is that you need to leave the thread as you found it, because code you don't own will run on it after you're done.

All pthreads are implicitly NSThreads as well. NSThread will create an instance on demand the first time you ask for one on a thread it doesn't know about.
Thanks Mike! I know most of that, but wasn't sure that the threadDictionary would be valid for threads not started by NSThread. This will be super handy to know when I need thread local storage in the future!
This is great! In this implementation, I think it might be more appropriate to release inner pools before releasing the objects in _objects.

It would be legal for an object 'owned' by an inner pool to send a message (or retain via perform... side effects) to a parent object 'owned' by an outer pool (that it might only have a weak reference to).
Well... thanks Mike, my favourite interview question, out the window.

This has always been a great "I never actually thought about that" / "separate the men from the boys" problem, which requires even the most experienced devs to think a little.
Jason: That's an interesting point. I think that, strictly speaking, ordering of releases is not guaranteed to have any relation to ordering of autoreleases, so this code is correct. Mostly I would expect people not to rely on the ordering, however across pool boundaries is probably a likely spot for people to do so, whether accidentally or on purpose. If one were doing it for real, it probably would be a good idea to make your change.

Terrence: Hey, at least you'll still know if they like to broaden their horizons and are capable of remembering stuff. Some advanced warning: if you liked to quiz people on how they'd implement reference counting, you may want to get that out of your system soon.
What motivated the use of a CFMutableArray over an NSPointerArray?
I shy away from the Leopard collection classes in most cases because of their general unavailability on iOS. It doesn't really matter in this case, of course, but it's a habit.
You state that nothing is changed with APC, however can you confirm if this is also true under ios5? We are subclassing NSAutoreleasePool and overriding the functions to use c arrays as the collection class. This has a MASSIVE improvement in performance, and worked well under ios4.2. With ios5 the overridden addObject is no longer being called so our 'enhanced' memory management is now broken. A change in the method signature perhaps?
Conceptually, as far as "autorelease pools are arrays of stuff that get released later", everything is the same. But the details have changed a great deal. In particular, there's not really an NSAutoreleasePool class anymore. The class still exists, but is now a thin wrapper around deeper runtime functionality. That functionality is, I think, directly accessed by the -autorelease method. In short, there's just enough left that you can still use NSAutoreleasePool objects in your code (when not using ARC), but you can't subclass or override things anymore and expect it to work.

I'd suggest ditching the custom class, at least on iOS 5. The new autorelease pool implementation is extremely fast, probably very difficult to improve upon, and I'd wager faster than what you have.

In particular, it uses a single array for the entire thread. Individual pools are just pointers into that array, and when popping a pool, that pointer is just used as a fence to know when to stop. Page-granular allocation and clever recycling means that no copying is needed when expanding the array (which doesn't need to be contiguous). For details, check out AutoreleasePoolPage and related code here: http://opensource.apple.com/source/objc4/objc4-493.9/runtime/objc-arr.mm
Thanks very much for your response Mike. I hadn't been able to find any information on what the change was under the covers and knowing that they're no longer storing everything in a Dictionary makes me very happy!! We overrode the autorelease function on a few of our base class objects to put themselves directly on our autorelease pools and as you said found it to be slower than using IOS5 autorelease pools. Looks like the NSAutoreleasePools are definitely the way to go now.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.