mikeash.com: just this guy, you know?

Posted at 2009-09-11 15:28 | RSS feed (Full text feed) | Blog Index
Next article: GCD Is Not Blocks, Blocks Are Not GCD
Previous article: Friday Q&A 2009-09-04: Intro to Grand Central Dispatch, Part II: Multi-Core Performance
Tags: fridayqna gcd performance
Friday Q&A 2009-09-11: Intro to Grand Central Dispatch, Part III: Dispatch Sources
by Mike Ash  

Welcome back to another Friday Q&A. This week I continue the discussion of Grand Central Dispatch from the past two weeks. In the last two weeks I mainly focused on dispatch queues. This week I'm going to examine dispatch sources, how they work, and how to use them.

Note that I assume you've already read the first two posts in this series. The first post is particularly important, the second one less so. If you have not, go read them now.

Before I go any further, there's been some great news this week: GCD has been open sourced! This is a very nice move on Apple's part. The source is relatively clean and very interesting to read through.

What Are Dispatch Sources
In short, a dispatch source is an object which monitors for some type of event. When the event occurs, it automatically schedules a block for execution on a dispatch queue.

That's kind of vague. What kind of events are we talking about?

Here is the full list of events supported by GCD in 10.6.0:

  1. Mach port send right state changes.
  2. Mach port receive right state changes.
  3. External process state change.
  4. File descriptor ready for read.
  5. File descriptor ready for write.
  6. Filesystem node event.
  7. POSIX signal.
  8. Custom timer.
  9. Custom event.
That's a lot of useful stuff. It's basically everything kqueue supports, plus mach ports, plus built-in support for timers (instead of having to build your own using the timeout parameter), plus custom events.

Custom Events
Most of these events are pretty much self explanatory, but you may be wondering what a custom event is. In short, this is an event which you signal yourself by calling the dispatch_source_merge_data function.

This is a bit of an odd name for a function that signals an event. The reason it's named this way is because GCD will automatically coalesce multiple events that happen before the event handler has a chance to run. You can "merge" data into the dispatch source as many times as you want, and if the dispatch queue was busy for this whole period, GCD will only invoke the event handler once.

Two types of custom events are available, DISPATCH_SOURCE_TYPE_DATA_ADD and DISPATCH_SOURCE_TYPE_DATA_OR. A custom event source has an unsigned long data attribute, and you also pass an unsigned long to dispatch_source_merge_data. When using the _ADD variant, events are coalesced by adding all of the numbers together. When using the _OR variant, events are coalesced by doing a logical or. When the event handler executes, it can access the current value using dispatch_source_get_data, and the data is then reset to 0.

Let's look at a scenario where this could be useful. Imagine some asynchronous code performing some work that needs to update a progress bar. Since the main thread is just another dispatch queue to GCD, we can push the GUI work onto the main queue. However, there may be a lot of events, and we don't want to make redundant updates to the GUI; it's much better to coalesce all of the changes as much as possible if the main thread is busy with other work.

Dispatch sources are perfect for this, using the DISPATCH_SOURCE_TYPE_DATA_ADD type. We can merge the amount of work done, and then the main thread code can find out how much work has been performed since the last event, and update the progress indicator by that amount.

Enough talk, here's some code:

    dispatch_source_t source = dispatch_source_create(DISPATCH_SOURCE_TYPE_DATA_ADD, 0, 0, dispatch_get_main_queue());
    dispatch_source_set_event_handler(source, ^{
        [progressIndicator incrementBy:dispatch_source_get_data(source)];
    });
    dispatch_resume(source);
    
    dispatch_apply([array count], globalQueue, ^(size_t index) {
        // do some work on data at index
        dispatch_source_merge_data(source, 1);
    });
(I want to make one note about this code, about something that stymied me to no end when I first started working with dispatch sources. It bothered me enough that I'm going to put it in bold. Dispatch sources always start out suspended! You must resume them after creating them if you want events to be delivered!)

Assuming you've configured the progress indicator to have the correct min/max value, this will all work perfectly. The data will be processed in parallel. As each chunk of data finishes, it signals the dispatch source and adds 1 to the dispatch source data, which we treat as the number of work units completed. The event handler increments the progress indicator by the number of work units that have been completed since the last time it ran. If the main thread is idle and work units complete slowly, the event handler will be called for every work unit completion, giving real time results. If the main thread is busy or work units complete quickly, completion events will be coalesced and the progress indicator will only be updated one time each time the main thread becomes available to process it.

At this point you may be thinking, this all sounds great, but what if I don't want my events to be coalesced? Sometimes you just want every signal to cause an action, without any smarts going on behind the scenes. Well, this is actually really easy, you just need to think a bit outside the box. If you want every signal to cause an action, use dispatch_async instead of a dispatch source. That's what it does, after all: schedules a block to be executed on the queue in question. In fact, the only reason to use a dispatch source instead of dispatch_async is to take advantage of coalescing.

Built-In Events
That's how to use a custom event, how about a built-in event? Let's look at an example of reading from standard input using GCD:

    dispatch_queue_t globalQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
    dispatch_source_t stdinSource = dispatch_source_create(DISPATCH_SOURCE_TYPE_READ,
                                                           STDIN_FILENO,
                                                           0,
                                                           globalQueue);
    dispatch_source_set_event_handler(stdinSource, ^{
        char buf[1024];
        int len = read(STDIN_FILENO, buf, sizeof(buf));
        if(len > 0)
            NSLog(@"Got data from stdin: %.*s", len, buf);
    });
    dispatch_resume(stdinSource);
This is pretty easy! Since we used a global queue, the handler automatically runs in the background, in parallel to the rest of the application, meaning an automatic parallel speed boost if the application is doing anything else at the time the event comes in.

This also has a nice benefit over the standard UNIX way of doing things in that there's no need to write a loop. With typical calls to read, you always have to be wary because it can return less data than requested, and can also suffer from transient "errors" like EINTR (interrupted system call). With GCD, you can just bail out in those cases and not do anything. If you leave unread data on the file descriptor, GCD will just invoke your handler a second time.

For standard input it's not a problem, but for other file descriptors you need to consider how to clean up once you're done reading from (or writing to) the descriptor. You must not close the descriptor while the dispatch source is still active. If another file descriptor is created (perhaps from another thread) and happens to get the same number, your dispatch source will suddenly be reading from (or writing to) something it shouldn't be. This will not be fun to debug.

The way to properly implement cleanup is to use dispatch_source_set_cancel_handler and give it a block which closes your file descriptor. You can then use dispatch_source_cancel to cancel the dispatch source, causing the handler to be invoked and the file descriptor to be closed.

Using other dispatch source types is much the same. In general, you give the identifier of the source (mach port, file descriptor, process ID, etc.) as the dispatch source handle. The mask argument is usually unused, but for DISPATCH_SOURCE_TYPE_PROC indicates what kind of process events you're interested in receiving. Then just provide a handler, resume the source, and off you go. These dispatch sources also provide source-specific data which can be accessed using the dispatch_source_get_data function. For example, file descriptors will give the rough number of bytes available on the descriptor as the dispatch source data. Process sources will give a mask of events which occurred since the last call. For a complete listing of what the data means for each type of source, see the man page.

Timers
Timer events are a bit different. They don't use the handle/mask arguments, but instead use a separate function, dispatch_source_set_timer, to configure the timer. This function takes three separate parameters to control when the timer fires:

The start parameter controls when the timer first fires. This parameter is of type dispatch_time_t, which is an opaque type that you can't manipulate directly. The functions dispatch_time and dispatch_walltime can be used to create them, and the constants DISPATCH_TIME_NOW and DISPATCH_TIME_FOREVER can be used if those are the values you're after.

The interval argument is an integer and is self explanatory. The leeway argument is an interesting one. This argument tells the system how much precision you want on your timer firing. Timers are never guaranteed to be absolutely 100% precise, but this argument lets you tell the system how hard you want it to try. If you want a timer to fire every 5 seconds and be as exact as possible, you would pass 0. On the other hand, consider a periodic task like checking for new e-mail. You want to check every 10 minutes, but this doesn't have to be exact. You might pass a leeway of 60 seconds, telling the system that you'll accept the timer running up to 60 seconds later than the scheduled time.

What's the point of this? In short, reduced power consumption. It's more energy efficient if the OS can let the CPU sleep for as long as possible, and then accomplish a bunch of things at once when it wakes up, rather than cycling between sleep and wake constantly to accomplish tasks in a spread-out manner. By giving a large leeway to your timer, you allow the system to lump your timer with other actions in order to group tasks together like this.

Conclusion
Now you know how to use GCD's dispatch source facilities to monitor file descriptors, run timers, coalesce custom events, and other similar activities. Because dispatch sources are fully integrated with dispatch queues, you can use any dispatch queue you have available. You can have a dispatch source run its handler on the main thread, in parallel on one of the global queues, or serialized with respect to a particular module of your program by using a custom queue.

That's it for this week. Come back next week as I wrap up the discussion of Grand Central Dispatch and talk about how to suspend, resume, and retarget dispatch queues, how to use dispatch semaphores, and how to use GCD's one-time initialization facility. As always, if you have a suggestion for a topic to cover for a future Friday Q&A, please post it in the comments or e-mail it directly to me.

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

I'm wondering if we're looking at a new basis for CFRunLoop here…
I don't know if you mean currently or in the future. Currently, we can see from the source release that CFRunLoop is not implemented on top of GCD, and that GCD has special code to integrate the GCD main queue with CFRunLoop.

For the future, I don't think CFRunLoop will become a thin wrapper around GCD, because the feature sets aren't quite identical. The main problem I see is the existence of modes. A single CFRunLoop can be put into different modes which have different sources installed. Any given source can be installed on one or more mode. As far as I can tell, GCD has no real equivalent to this. Perhaps something could be done using target queues? I could certainly be wrong, but they don't quite seem like they would fit together.
The "CFRunLoop replacement" occurred to me as well when I first saw this, but I'm not sure it would necesssarily make it easier to build a Cocoa application to simply replace all of the "performSelectorOnMainThread:" calls with dispatch_async calls pushing tasks on the main thread's queue. A queue seems like a much more clever way of working with the main thread, and I can see how people at NeXT would see dispatch and say "damn I wish we had that in 1988!", but given the current state of things, it seems like it'd be hard to replace CFRunLoop with a serial queue without completely ripping Cocoa apart, and again, the only gain you get is turning a run loop source into a dispatch source, which may run faster but won't be any easier to code or read.

OTOH, dispatch queues seem to be targeted squarely at new developers coming to Mac OS from Java and Windows who are completely turned-off by the whole run loop concept -- just based on my conversations with Blackberry and Windows developers coming from those environments, they find threading complicated, but an easier concept to understand than registering functions and delegates as "run loop sources" and all of that, even if run loops are much cleaner solutions given the way C works.

I personally can't wait to finish my special alertPanel class that runs --

panelWithTitle:(NSString *)title informationalText:(NSString *)text onError:(void(^errorBlock)) onButtonClicked:(void(^buttonBlock)(int))
I think the main gains with retargeting CFRunLoop would be efficiency and commonality. Aside from speed, there would also presumably be some benefit to having a single common code base instead of two disparate but similar ones.

The capabilities don't match too closely, though. Thinking about this more, it's a lot more than just modes. A major example: CFRunLoop is reentrant, and can be manually cycled. GCD has no such facilities at this time.

I completely disagree with your assessment of the target of GCD. I don't find run loops to be cleaner than GCD at all. Quite the opposite, in fact: I find GCD to be astonishingly clean in every respect, whereas every time I have to deal with runloops and runloop sources it's always extremely painful. GCD is a game-changing system-integrated multiprocessing and event processing library. That you see this as being something to lure developers over from other platforms which no "real" Mac programmer will use does not make any sense to me.
I was indeed referring to the future. But I think it would be possible to implement the CFRunLoop API on top of GCD (and perhaps deprecate certain functionality).
kqueue(2) does now support Mach ports via EVFILT_MACHPORT.
I completely disagree with your assessment of the target of GCD. I don't find run loops to be cleaner than GCD at all. Quite the opposite, in fact: I find GCD to be astonishingly clean in every respect, whereas every time I have to deal with runloops and runloop sources it's always extremely painful.


Really? I always thought the reason they invented runloops is because (1) it's more efficient on a uniprocessor (like, say, a NeXT cube) and (2) because C allows threads to share data like crazy, the RunLoop gives an application developer a way of running several processes at once while never having to share data between threads. The libraries do a lot of threading, but this is all hidden from the client through the run loop source mechanism -- NSApplications are like actor objects in actor model and the event queue is their mailbox. I didn't get it for ever, but once I did it struck me as a really clever solution.

A complete pain to explain to anyone though. I was thinking CGD was just about the best thing for concurrency in a practical platform until someone showed me future variables in C++0x. Not out yet, but I was suddenly filled with envy again...
Sure, runloops are clever, great for what they do, etc. They're also a royal pain to work with. What modes should I add this timer to? What's the best way to signal a custom event? Is it safe to add this timer to a runloop for another thread? When do the various cycling methods/functions stop? How can I stop one early?

Runloops primarily exist to multiplex inputs and timers. They are, in essence, a select() loop wrapped up nicely, except using mach ports instead of file descriptors. There's really nothing to do with multithreading there, except for the fact that mach ports make for a good way to do inter-thread messaging, and thus you can use runloops to multiplex these inter-thread messages.

As for C++0x futures, I had not heard of these before, but looked them up and I was not impressed. Like so many things in C++, this just does not need to be a language feature. Looks like a very thin wrapper around a lock and a condition variable that really ought to be a set of library functions, not Yet Another Language feature. C++ is already an order of magnitude larger than a reasonable language, and they just keep piling things on.

In Objective-C you can implement transparent futures (for objects only, obviously) by writing a fairly simple NSProxy subclass. Spin off the future, crunch in the background, and if any code actually tries to query your proxy then it blocks and waits for completion. Other code never knows that it has a proxy, not the real thing. It would of course be trivial to implement these lame-brained explicit futures from C++0x as well, if you wanted to.
That's a really brilliant idea -- of course the problem I can see is that for any gain you're getting in using lightweight tasks, you might be losing by using Obj-C objects, and the concomitant heap allocations and calls to objc_msgSend(). Even without GCD, you can implement Scheme-style delay/force using NSProxy as you describe, which is essentialy what futures are, they just use static analysis to automagically decide when to turn a delay into a force.

I guess this just goes to show that lazy evaluation != multithreading.
Yes, the proxy solution is somewhat heavyweight, but I think the magnitudes are fairly different. Even GCD's cheapness is still going to be considerably more expensive, at least in the worst case, than a few ObjC message sends and object allocations. In general, though, such an approach will only pay off if the value being promised is relatively expensive to compute.

I also have to wonder how often you encounter a case where you pass such a proxy into unknown code that won't use it for a while. It seems to me that in most cases, you'll control all the code and an explicit future would be simpler and faster.
If I wanted to read the standard out instead of standard in, do I just change STDIN_FILENO to STDOUT_FILENO?
You can't read from standard out, only write to it. If you wanted to write to it, you could create a dispatch source for STDOUT_FILENO so you know when you can write more data, but there's no way to read from it.
No way, this post is almost 5 years old and I only find out now, after reading this article, that GCD got open sourced. Thanks a lot Mike for all the work you invested into this blog, I enjoyed every article I read on this site.
Although veiled by generalities, the better part of Apple's documentation on GCD and threading progamming describes how GCD was used to make AVAssetReader work in conjunction with an AVSampleBufferDisplayLayer as it does [see the startReading method of AVAssetReader and the requestMediaDataWhenReadyOnQueue method of AVSampleBufferDisplayLayer], and, in particular, wherever it describes how to replace the traditional thread implementation of the producer (enqueue)/consumer(dequeue) model for acquiring and displaying frame data from a media file with the GCD equivalent [see Changing Producer-Consumer Implementations and Migrating Away From Threads, in general, in Apple's Concurrency Programming Guide].

Were I using my own, custom AVAssetReader-like implementation, I would be able to read a media file via a dispatch source as shown in this post, and then periodically read frames from the file using a timer (as also shown in this post). As noted, a timer suspends a thread until it fires, which would be necessary to allow reading from other media files in cases where the number of files being read exceeds the number of files that can be read concurrently (that's about 16 on the best iPhone).

Without reinventing the wheel, though, how would one suspend a thread like the one spawned by the startReading method of AVAssetReader, so that other threads so spawned have a little processor time of their own?

As it is, I can start reading from virtually any number of AVAssetReaders, having put each in a separate thread, but calls to their corresponding AVAssetReaderTrackOutputs' copyNextSampleBuffer method are ignored beyond the 16 or so thread limit.

Knowing how to engineer with GCD—versus just how to program with it—is the only way to make it truly useful to you or anyone else. In the past five years since you wrote this post, could you engineer a solution for the above-stated problem? If so, evolve us, who are still merely programmers, please.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.