mikeash.com: just this guy, you know?

Posted at 2009-09-18 17:23 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2009-09-25: GCD Practicum
Previous article: The iPhone Development Story: One Year Later
Tags: fridayqna gcd performance
Friday Q&A 2009-09-18: Intro to Grand Central Dispatch, Part IV: Odds and Ends
by Mike Ash  

It's that time of the week again. Over the past three weeks I've introduced you to the major pieces Grand Central Dispatch, an exciting new API for parallel processing and event handling in Snow Leopard. The first week I covered basic concepts and dispatch queues. The second week I discussed how to use dispatch queues for parallel processing on multi-core computers. The third week I covered GCD's event handling system. This week I'm going to cover various odds and ends which I didn't get to before: dispatch queue suspension and targeting, semaphores, and one-time initialization.

As with the previous weeks, I will assume that you've read all of the previous articles before reading this one, and are thus familiar with all aspects of GCD discussed up to this point. If you have not already read those articles, please do so now.

Dispatch Queue Suspension
Dispatch queues can be suspended and resumed at will. To suspend, use the dispatch_suspend function, and to resume, use dispatch_resume. These work pretty much the way you'd expect them to. Note that they also work on dispatch sources.

One caveat to dispatch queue suspension is that suspension is block-granular. In other words, suspending a queue does not suspend the currently executing block. Instead what happens is that the block is allowed to run to completion, and then no further blocks are allowed to run until the queue (or source) is resumed.

One final note, straight from the man page: you must resume a queue/source before destroying it if you have previously suspended it.

Dispatch Queue Targeting
All custom dispatch queues have the concept of a target queue. In essence, a custom dispatch queue doesn't actually execute any work, but passes work to its target queue for execution. Normally the target queue of a custom queue is the default-priority global queue.

The target queue of a custom queue can be set by using the dispatch_set_target_queue function. You can pass it any other dispatch queue, even another custom queue, so long as you never create a cycle. This function can be used to set the priority of a custom queue by simply setting its target queue to a different global queue. If you set your custom queue's target to be the low priority global queue, all work on your custom queue will execute with low priority, and the same with the high priority global queue.

Another potential use of this is to target a custom queue to the main queue. This will cause all blocks submitted to that custom queue to run on the main thread. The advantage of doing this instead of simply using the main queue directly is that your custom queue can be independently suspended and resumed, and could potentially be retargeted onto a global queue afterwards, although you'll have to be careful to ensure that all blocks which run after that point can tolerate being away from the main thread!

Yet another potential use is to target custom queues to other custom queues. This will force multiple queues to be serialized with respect to each other, and essentially creates a group of queues which can all be suspended and resumed together by suspending/resuming the queue that they target. For a way that this could be used, imagine an application which is scanning a set of directories and loading the files within. In order to avoid disk contention, you want to make sure that only one file loading task is active for each physical disk. However, multiple files can be read from physically separate disks simultaneously. To make this happen, all you have to do is build a dispatch queue structure which mirrors your disk structure.

First, you would scan the system and find the disks, and create a custom dispatch queue for each one. Then you would scan the filesystems and create a custom queue for each one of those as well, pointing their target queues at the queue of the appropriate disk. Finally, each directory scanner can have its own queue as well, pointing the target at the filesystem on which the directory resides. The directory scanners can enumerate their directories and submit a block for each file directly to their own queue. Due to how the system is set up, this will inherently serialize all access to each physical disk while allowing parallel access to separate disks, all without any manual intervention beyond the initial queue setup process.

Semaphores
Dispatch semaphores work just like any other semaphore and if you're familiar with semaphores from other multithreading systems, this will look entirely familiar to you.

A semaphore is basically an integer which has an initial count and supports two operations: signal and wait. When a semaphore is signaled, the count is incremented. When a thread waits on a semaphore, it will block, if necessary, until the count is greater than 0, and then decrement the count.

Dispatch semaphores are created using dispatch_semaphore_create, signaled using dispatch_semaphore_signal, and waited on using dispatch_semaphore_wait. The man page for these functions shows two good examples of how to use semaphores, one involving synchronizing work completion and one involving controlling access to a finite resource. Rather than make up my own, inferior examples, I encourage you to simply read the man page to see some potential uses for these objects.

One-Time Initialization
GCD also has support for one-time initialization, which if you're familiar with pthreads is pretty much the same as the pthread_once call. The main advantage of the GCD approach is that it uses blocks instead of function pointers, allowing for a more natural flow of code.

A major use of this is for lazily initializing singletons or other shared data in a thread-safe manner. The typical singleton initialization technique looks like this, for singletons that need to be thread safe:

    + (id)sharedWhatever
    {
        static Whatever *whatever = nil;
        @synchronized([Whatever class])
        {
            if(!whatever)
                whatever = [[Whatever alloc] init];
        }
        return whatever;
    }
This is fine, but expensive; every call to +sharedWhatever incurs the expense of taking a lock, even though that lock is basically only needed once. There are fancier ways to approach this, using things like double-checked locking or atomic operations, but they're difficult and extremely error-prone.

Using GCD you can rewrite the above method using dispatch_once like so:

    + (id)sharedWhatever
    {
        static dispatch_once_t pred;
        static Whatever *whatever = nil;
        dispatch_once(&pred, ^{
            whatever = [[Whatever alloc] init];
        });
        return whatever;
    }
This is actually slightly simpler than the @synchronized method, and GCD makes sure to do these checks in a fast manner. It ensures that the code in the block will have run before any threads can pass beyond the call to dispatch_once, but doesn't force code to take the hit of synchronization every time it uses this function. In fact, if you look at the header where this function is declared, you'll discover that the current implementation is actually a macro which performs the initial test inline, meaning that you don't even incur function call overhead, much less synchronization overhead, for the common case.

Conclusion
That wraps up this series on Grand Central Dispatch. This week you saw how to suspend, resume, and retarget dispatch queues, and some uses for these facilities. You also saw how to use dispatch semaphores and one-time initialization facilities. In previous weeks you saw how to manage dispatch objects, how to create/access and use the different types of dispatch queues available for different tasks, strategies for taking advantage of multi-core systems, and how to monitor for events using GCD's events system. Now you have the complete picture of how GCD operates and how to use it, so go out there and write some great new software with it!

Come back next week for another exciting edition of Friday Q&A. As always, if you have a suggestion for a topic to cover, please post it in the comments or e-mail it directly to me.

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

For dispatch_once, I'm curious whether the use of an initial inline test constitutes double-checked locking, or if there are any protections against that.
Mike - a question about blocks (kind of tangential, as this series, which has been very informative, has been about GCD, but GCD does use blocks extensively…):

If you know….

Is code using blocks (not GCD, just blocks) runnable on previous versions of OS X? Is there a dependency on Snow Leopard's version of the C run-time or something?

I'm wondering, because knowing how lambdas are implemented in C++0x (as function objects), it seems like blocks shouldn't require tooo much in the way of run-time support?
Axel Andersson: A fine question, and one I asked myself as well when I saw that macro. You'll find the answer to that question in a comment in the source code which implements the one-time initialization functionality:

http://libdispatch.macosforge.org/trac/browser/trunk/src/once.c

Scroll down to where it reads "The next barrier must be long and strong" and go from there.

The summary is that raw double-checked locking has problems because you need memory barriers to make sure all threads see changes happen in the right order. Normally you need both read and write barriers. What the dispatch guys figured out was that you don't need read barriers if the write side ensures that enough cycles occur between the initialization and the signal that it's impossible for any reader to have performed an out-of-order read that sees the one but not the other.

In other words, modern CPUs perform out-of-order reads only in close temporal proximity. By waiting a few cycles before writing to the shared flag, they ensure that any CPU which sees the shared flag as being set will also have seen all initialization take place. This allows them to get away with only a write barrier, and no read barrier.

Stuart Dootson: Blocks don't work on previous versions of OS X by default, because they need a runtime library. It's not substantial, but it is required. Don't let C++0x lambdas trick you; Apple's blocks are a lot more capable in ways that require additional runtime support. (Namely, they can persist beyond the scope that created them, and they need to manage memory in order to make that happen.)

There is a third-party blocks runtime (built using Apple's source code) and associated compilers available called PLBlocks. More details on that here:

http://www.mikeash.com/?page=pyblog/friday-qa-2009-08-14-practical-blocks.html
With NSObject we have some functions like

- performSelector:withObject:afterDelay
- performSelector:withObject:afterDelay:inModes
- performSelectorInBackground
...

these functions are implemented with GCD too?
The first two do not. They are documented as setting up a timer in the current runloop, which GCD can't currently do. Until and unless GCD becomes the foundation for CFRunLoop, these won't involve GCD.

The last one almost could, except that it's documented as creating a new thread. Code using that method could expect that they own the thread that their code executes on and might do things that GCD doesn't like (you're not allowed to do things like mess with your thread's priority, kill the thread, etc. while executing on a GCD thread) so Apple couldn't really change that one either.
I'm curious about dispatch_barrier and the target queue. What happen when I dispatch_barrier blocks on the queue that targeted on the concurrent queue? Will I block the execution of the blocks submitted to the targeted queue too? Or if I dispatch_barrier blocks to the concurrent queue, will it block the execution of the blocks in the child queue?

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.