mikeash.com pyblog/friday-qa-2009-09-25-gcd-practicum.html comments

Karl Peng - 2016-10-15 15:21:12

Sat, 15 Oct 2016 15:21:12 GMT

Seems Ken Thomases's solution still commit so many tasks to disk at once. This make the disk part to dispatch the task, imagine read 1000 file of 10M at the same time. Still can have a bunch of task in progress.

Pitiphong Phongpattranont - 2014-05-15 14:19:20

Thu, 15 May 2014 14:19:20 GMT

Does anyone know what is the maximum number of thread spawned by GCD? And where is the limit is set?

Thomas - 2010-02-24 10:49:15

Wed, 24 Feb 2010 10:49:15 GMT

Maybe we should ask Apple to make the following function public:

#define DISPATCH_QUEUE_WIDTH_ACTIVE_CPUS    -1
#define DISPATCH_QUEUE_WIDTH_MAX_PHYSICAL_CPUS    -2
#define DISPATCH_QUEUE_WIDTH_MAX_LOGICAL_CPUS    -3

void dispatch_queue_set_width(dispatch_queue_t dq, long width);

This function is currently implemented but is private.
With it, we could limit the number of jobs the queue spawns, avoiding all the issues mentioned above.

mikeash - 2010-02-13 20:47:03

Sat, 13 Feb 2010 20:47:03 GMT

That's certainly a very interesting technique, and clearly will help a lot. However, I don't think it's the "GCD for IO" that I and other commenters are after. This solution will, as far as I know, still spawn multiple jobs for a single spinning disk, and still cause disk contention because of it. Disk-based dispatch sources just get handed off to kqueue, and at that point queue priority and contention never come into it. Obviously it works pretty well with this case, but what would be really nice would be a system which would automatically limit you to one disk-based job at a time on a single disk, to two on a two-disk RAID 0, to a reasonable but large number on an SSD, etc.

Still, very nice technique, and thanks for posting it. Very useful in its own right.

Ken Thomases - 2010-02-13 19:34:13

Sat, 13 Feb 2010 19:34:13 GMT

This is a late comment to an old post, but...

I'm surprised that you bemoan the lack of "a sort of GCD for IO" since GCD has a solution for IO which I know you're familiar with: dispatch sources.

Basically, GCD's philosophy is to never block. All threads should always be able to advance on their current task at all times. So, one should use asynchronous, non-blocking IO techniques whenever possible. Dispatch sources provide a means for doing asynchronous, non-blocking IO that (obviously) integrates well with GCD.

I fixed your imagegcd2.m by implementing two utility functions:



void WithDataFromFile(NSString* filePath, dispatch_queue_t queue, void (^block)(NSData* data));

void AfterWritingDataToFile(NSData* data, NSString* filePath, dispatch_queue_t queue, void (^block)(BOOL))

The first creates a dispatch source to read from the given file path, accumulating an NSData. When it reaches EOF, it submits the provided block to the specified queue and passes in the data object. In event of error, nil is passed in.

The second function creates a dispatch source to write the given data object to the given path. When it's done, it submits the block to the queue, passing a success flag.

Both are pretty straightforward, and were based on the sample code that Apple provides in the Concurrency Programming Guide. I'm not posting them because they're a touch too large for a blog comment, but can if you like.

The read source is scheduled on the low-priority queue, and the write source on the high-priority queue. My reasoning is that reading acquires work and memory load while writing eliminates work and relieves memory load.

Where your code has:



    NSData *data = /* ... */;

    if(data) /* ... */

I've replaced it with:
<code
    dispatch_retain(group);
    dispatch_group_enter(group);
    WithDataFromFile(fullPath, globalQueue, ^(NSData* data){
        if (data) WithAutoreleasePool(^{
            NSData *thumbnailData = ThumbnailDataForData(data);
            if(thumbnailData)
            {
                NSString *thumbnailName = [NSString stringWithFormat: @"%d.jpg", OSAtomicIncrement32(&count)];
                NSString *thumbnailPath = [destination stringByAppendingPathComponent: thumbnailName];
                AfterWritingDataToFile(thumbnailData, thumbnailPath, globalQueue, ^(BOOL success){
                    dispatch_group_leave(group);
                    dispatch_release(group);
                });
            }
        });
    });

kl - 2009-11-06 10:21:12

Fri, 06 Nov 2009 10:21:12 GMT

There's definitely need for I/O GCD.

For SSD disks serial I/O queue is not very good. These drives handle parallel access fine (and I expect in the future to be even better at this, as it's easy to parallelize flash).

I'm tempted to write my own dispatch_* wrappers that take file path and amount of memory as a hint :)

Alex C. - 2009-10-23 03:57:03

Fri, 23 Oct 2009 03:57:03 GMT

Hey Mike, I believe that the problem you describe above (20 units, demand for 25) is called the dining philosophers problem ( http://en.wikipedia.org/wiki/Dining_philosophers_problem ) - that page shows an example solution in Pascal using a "monitor", and it seems to me that it could make sense to do something similar in ObjC by using a singleton with some domain-specific knowledge - maybe you could load N files at a time and then parse them in parallel. I'm not really sure if it'd yield any improvement, except maybe in providing a cleaner API.

mikeash - 2009-09-30 22:55:26

Wed, 30 Sep 2009 22:55:26 GMT

It's purely for convenience. There's no performance gain (indeed, there's a minor loss) but I find it easier to write and read. Blocks allow you to treat code as objects and modify/wrap/pass around code like you do anything else. It's a little strange at first but once you get used to it you can use it to do lots of interesting little convenient things like this.

For some more discussion of some non-GCD ways to take advantage of blocks in your code, you might be interested in reading this article I wrote from last winter which covers some ideas:

http://mikeash.com/?page=pyblog/friday-qa-2008-12-26.html

Robby Weinberg - 2009-09-30 21:33:15

Wed, 30 Sep 2009 21:33:15 GMT

Hello,

I am new to multi-threading, and I must say I have learned a TON from these articles. It has helped me understand GCD tremendously.

One question I have is why you made WithAutoreleasePool() method that takes a block of code and wraps it between a [NSAutoreleasePool newPool] and release. Is there a reason for this? I find it makes the code harder to read for me, is it supposed to make it easier to read or is it for performance or neither?

Thanks again,
Robby

Thomas - 2009-09-28 08:51:37

Mon, 28 Sep 2009 08:51:37 GMT

The NSOperationSample sample code has the exact same issue.

mikeash - 2009-09-27 15:12:12

Sun, 27 Sep 2009 15:12:12 GMT

I just realized that using a semaphore to represent memory could be potentially disastrous.

Imagine if you decide to make 20 "units" (whatever one count of the semaphore represents) of memory available. Then you have 5 jobs which each need 5 units to run. Because the only way to grab those units is to wait on the semaphore sequentially, you have a potential for deadlock. If all jobs manage to grab 4 units at the same time, then none can grab the 5th to be able to proceed. You'd have to use non-blocking waits and complicated logic to back off and try again. Not good.

A better way would be to use a sort of custom semaphore, using a lock and a condition variable to wrap a counter, which would allow threads to atomically decrement it.

One problem is that, while this will help with the problem of bringing your machine to a halt, it'll still give you poor performance on most systems if you don't explicitly serialize your IO. Limiting the number of jobs in flight is good, but if they're still fighting over the hard drive when they start up then your performance is not going to be as good as it could be.

mikeash - 2009-09-26 14:33:21

Sat, 26 Sep 2009 14:33:21 GMT

natevw: Using the semaphore to represent memory could be inefficient, but it all depends on how granular it is. Using the semaphore to represent bytes? Bad. Using it to represent megabytes? Should be fine.

Ultimately, though, I don't see a need for this. You never need more than CPU * 2 jobs in flight at once. The only reason to limit based on memory rather than CPU is if you expect even that small number of jobs to run out of memory. Definitely not the case here. I suppose if you had really big jobs then it might be worthwhile, but since it's extremely difficult to know the memory state of the system as a whole, probably not.

charles: Ultimately, your "IO flag" is simply a way of saying to GCD, "don't start a bunch of new jobs just because this one blocked". Well, that's exactly what using a serial queue for all IO does.

As for why GCD spawns more threads, it has no idea what the effect of doing this will be. Maybe your jobs are already swapped in, won't allocate more memory, and thus will run along maxing out the CPU just fine. Extra jobs could even deallocate memory, for all it knows. There's just no way it can look at the state of the system and conclude, "The system is swapping a lot, thus I should refrain from spawning more jobs."

charles - 2009-09-26 06:35:52

Sat, 26 Sep 2009 06:35:52 GMT

Thanks Mike, very interesting read.

Like natevw, i find it a little bit disappointing that we have to limit concurrency based on cpuCount, when GCD is supposed to abstract that away from us. GCD is a nice abstraction, but like any technology, it can be a leaky abstraction...

Like you say in the comments, the problem seems to be that there's no way for GCD to know that a certain thread is going to block on IO. Thus, maybe one option for "GCD 2.0" would be to allow us GCD users to add a hint in the dispatch calls to let GCD know that the code to run will likely block on IO at some point. A simple flag like this would not be able to describe to GCD all the subtle things that you are going to need with IO, but it might be sufficient to let GCD make the right decision and avoid most of the cases you describe.

In addition, there is one thing that seems like GCD/OS X should be better at, and that is to avoid the swap hell. Why does GCD even start more threads when the OS is struggling with memory? Since GCD never makes any promises on when it will run any of the code, couldn't it simply stop creating more threads? Are there any reason for GCD to think that creating more threads could solve the memory swaps and that it should just keep happily going? I am really curious what I am missing here. Thanks for your insights :-)

natevw - 2009-09-26 01:12:26

Sat, 26 Sep 2009 01:12:26 GMT

It seems a shame to limit concurrency based on cpuCount, since that was what GCD was supposed to eliminate.

Since memory is the problem resource after dealing with the IO contention, why not have your semaphore represent a memory cap in KB or MB? Would it be inefficient to call dispatch_semaphore_wait repeatedly (proportional to the amount of memory you estimate the image needing) before dispatching the operation?

mikeash - 2009-09-25 16:12:58

Fri, 25 Sep 2009 16:12:58 GMT

I think that what's really needed, at some point in the future, is a sort of GCD for IO. Just as GCD frees me from having to think about how many worker threads to spawn, GCD/IO would free me from having to think about which files are best read concurrently and which are best read serially.

It's impossible for GCD as it stands right now to be able to properly deal with these things, because it can't predict the future. There's no way for GCD to know that a certain thread is going to block on IO, and no way to know whether creating more worker threads is a good idea or not when it does. However, with the addition of more explicit IO facilities it certainly could. For now, at least it provides more than enough infrastructure for you to solve the problem yourself with a fairly small amount of additional work.

Steve Madsen - 2009-09-25 16:00:34

Fri, 25 Sep 2009 16:00:34 GMT

Another fantastic post, Mike.

@kusmi: I think that's a slippery slope. In order for that to be useful, GCD (and, by extension, the kernel) would need to profile every thread to know its specific impact on the system.

It would be a net loss for GCD to throttle a queue if the jobs on that CPU are purely CPU-bound, just because another process on the system is hitting the disk or using a lot of memory.

kusmi - 2009-09-25 15:34:20

Fri, 25 Sep 2009 15:34:20 GMT

Interesting article - this leads me to the question: Wouldn't it be the job of GCD itself to prevent such runaways?
E.g. in a future GCD version to also include IO & memory characteristics to control the amount of worker threads?

Jesse - 2009-09-25 13:58:57

Fri, 25 Sep 2009 13:58:57 GMT

Great article, Mike. GCD is a nice tool to have in the toolbox, for sure. The syntax can get fairly jarring though, but I guess that's C for ya. I really hope blocks are incorporated in a future revision of the C spec.

Stuart Dootson - 2009-09-25 13:27:20

Fri, 25 Sep 2009 13:27:20 GMT

Nice article, Mike - it illustrates nicely that while GCD has abstracted away a lot of the nitty-gritty of multithreading (which was the difficult stuff that caused programs to lock/crash/not work right), there is still a significant effort required to get programs to scale well, and that the problems you're going to face aren't necessarily the obvious ones…