mikeash.com: just this guy, you know?

Posted at 2010-01-22 16:57 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2010-01-29: Method Replacement for Fun and Profit
Previous article: Friday Q&A 2010-01-15: Stack and Heap Objects in Objective-C
Tags: bridging corefoundation fridayqna
Friday Q&A 2010-01-22: Toll Free Bridging Internals
by Mike Ash  

It's been a week, and once again, it's time for a Friday Q&A. For this week's edition, I'm going to talk about how toll-free bridging works, a topic suggested by Jonathan Mitchell.

What It Is
I hope that everyone reading this already knows what toll-free bridging, but if you don't, here's a summary.

Toll-free bridging, or TFB for short, is a mechanism which allows certain Objective-C classes to be interchangeable with certain CoreFoundation classes. For example, NSString and CFString are bridged, which means that you can treat any NSString as if it were a CFString and vice versa. Example:

    CFStringRef cfStr = SomeFunctionThatReturnsCFString();
    NSUInteger length = [(NSString *)cfStr length];
    
    NSString *nsStr = [self someString];
    CFIndex length = CFStringGetLength((CFStringRef)nsStr);
Most (but not all!) classes which exist in both Cocoa and CoreFoundation are bridged in this way. A bridged class will mention the bridging in its documentation.

Bridging From CF to ObjC
The way that classes are bridged from CoreFoundation to Objective-C (how a CFString can act like an NSString) is fairly straightforward.

Every bridged class is actually a class cluster, which means that the public class is abstract, and core functionality is implemented in private subclasses. The CoreFoundation class is given a memory layout that matches one of these private subclasses, which is built just for the job of being the Objective-C counterpart to the CoreFoundation class. Other Objective-C classes may also exist independent of this. From the outside, they all look and work the same, because they all share the same interface.

To put it concretely, look at NSString. NSString is an abstract class. Every time you create one, you actually get an instance of one of its subclasses.

One of those subclasses is NSCFString. This is the direct counterpart to CFString. The first field of a CFString is an isa pointer which points to the NSCFString class, which allows it to function as an Objective-C object.

NSCFString implements methods to work properly as an NSString. There are two ways that it can do this. One way is to implement every method as a stub which just calls through to its CoreFoundation counterpart. Another way is to implement every method to match what its CoreFoundation counterpart does. In reality, the code is probably a mix of the two.

For this direction, the mechanism of bridging is so simple it's almost not there at all. CFString objects just happen to be instances of NSCFString, which is a subclass of NSString, and which implements the methods needed to act like one. Many of those implementations just happen to call through to CoreFoundation to get their work done.

Bridging from ObjC to CF
Bridging in the opposite direction gets a bit more complicated. This is because any given instance of a TFB Objective-C class could be an instance of any number of classes, even custom classes created within the application. Just write a subclass of NSString and you have such a custom class. And yet these custom classes still work transparently with CoreFoundation function calls. You can call CFStringGetLength on an instance of your custom NSString subclass and it will invoke your -length method and return the result to the caller.

As it turns out, there's no particular magic to make this work. It's just pure brute force. The implementation of CFStringGetLength looks like this:

    CFIndex CFStringGetLength(CFStringRef str) {
        CF_OBJC_FUNCDISPATCH0(__kCFStringTypeID, CFIndex, str, "length");
    
        __CFAssertIsString(str);
        return __CFStrLength(str);
    }
The first line is an ugly macro that hides the secret to how TFB works on this side of things. It checks the isa of the object to see if it matches NSCFString. If it doesn't, then it's not a "real" CFString, but just some other Objective-C class. In that case, the CoreFoundation code doesn't know how to look up the length, so it just sends the length message to the object and returns the result. This is how custom subclasses work. If it is a "real" CFString, then it simply calls __CFStrLength which does the actual work of looking up the length of the string within the CFString structure, and returns that value.

In short: every CoreFoundation function for a TFB class first checks to see if the object being passed in is a "real" CoreFoundation object or a pure Objective-C class. If it's pure Objective-C, it simply calls through to the Objective-C side, and it's done. Otherwise, it proceeds normally. This is why I said it's pure brute force: every single function call has one of these checks at the top in order to make TFB work.

This implementation has an interesting consequence. Consider for a moment what would happen if you messed up and passed, say, a CFArray to CFStringGetLength. The isa check would show that it's not an NSCFString, so it would go for the Objective-C dispatch. The end result is that you get an error like this:

    -[NSCFArray length]: unrecognized selector sent to instance 0x100108e50
That's an Objective-C error coming from pure CoreFoundation code!

Bridging Basic Behavior
That's how classes which are explicitly bridged work. But there's one more interesting aspect to TFB: basic behaviors shared by all objects are also bridged for all classes. In essence, NSObject is bridged to CFType. As one of the most common examples, it's possible to CFRetain any Objective-C object, and retain any CoreFoundation object. Just like the other bridging, if you've overridden retain in your Objective-C code, CFRetain will call that override. This works not only for memory management, but for any CFType function, like CFCopyDescription, and for any NSObject method, like performSelector:withObject:afterDelay:.

For the bridging to Objective-C, the first field of any CoreFoundation object points to an Objective-C class. For bridged classes it points to the Objective-C counterpart class, and for non-bridged classes it points to a special __NSCFType class. All of these classes are subclasses of NSObject (most of them indirectly), so naturally they inherit all of their behavior. For methods which map to CoreFoundation counterparts, these classes simply override them and call through to the CoreFoundation side as necessary.

For bridging to CoreFoundation, the mechanism is just like the specific bridging. The first line of CFRetain and all the other CFType functions checks to see if the object is a "real" CoreFoundation object or if it's some random Objective-C class. If it's a "real" CF object, then it does its normal job. Otherwise, it dispatches through to Objective-C and lets that side of things handle all the work.

Creating Bridged Classes
I hope the title of this section didn't get anybody's hopes up, because the simple answer to this is: you can't. Now that we know how bridging works, it should be obvious why. You can't bridge an existing, unbridged CoreFoundation class because it requires massive cooperation on the CoreFoundation side. Every single function call needs to have a line at the top which checks the class of the object being passed in and dispatches to Objective-C if necessary, and you can't add that if it's not already there. And you can't create a new bridged CoreFoundation class because you can't create new CoreFoundation classes, period. That's capability that Apple keeps for itself, and doesn't expose to the outside world. (And really, would you want to pepper class checks into every function you write? Just write a pure Objective-C class, it's simpler and prettier.)

Conclusion
Now you know the basics of how toll-free bridging works. If you're interested in the deeper technical details of just what the dispatching code looks like and how it works, check out ridiculous_fish's article on bridging.

That brings this week's edition to a close, but come back next week for yet another one. Friday Q&A is, of course, driven by user submissions. If you have an idea you would like to see covered in this space, send it in!

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

On creating bridged classes, you sort of can, you just make the ObjC the primitive and the C side a wrapper for ObjC method dispatch.

From the outside, it looks the same. There's a CF-level function interface, there's an ObjC class, the types are interchangeable, and you can subclass the ObjC class and pass it to the the C function interface.

Why would you do this? Well, there isn't a lot of reason, but sometimes people have files that they want to be pure C.
You can also optimise what Ken suggests by putting the implementation code into a set of C implementation functions, calling those from the ObjC methods, and calling them directly from the C API when you know that the object you've been given is of the class you expect.

Obviously this kind of thing is only for use in extreme cases :-)
I don't consider either of those to be bridging in the sense used by CoreFoundation. If you write an Objective-C class with a C wrapper, that's just an Objective-C class with a C wrapper. If you implement the class in C, that's just a C library with an Objective-C wrapper which then has a C wrapper too. Not to say that it isn't sometimes useful to do that, it's just not bridging.
What's the overhead of the macro CF_OBJC_FUNCDISPATCH0? I would have expected calling the CF C functions a little more efficient, particularly on a resource constrained device like the iPhone.

Perhaps I'll do a few micro-benchmarks and publish the results on my blog (which is in serious need of updating anyhow). My thought was that when building a reusable iPhone framework using CF/NS classes, one would be better off using the CF* equivalents.

Cheers,

Stu
The macro looks up the appropriate isa for the given CFType (which I believe is a single array lookup) and then compares it with the object's isa, so the overhead is basically two loads from memory. Not much. Using CF functions is faster than ObjC methods if the object is a "real" CF object; obviously it's slower if it's actually a pure Objective-C class and the call has to be bounced back to that side. Whether the overhead matters is another question entirely. As with any programming task, favor writing clear code over writing fast code unless speed is proven to matter in the real world. Just because you're on an iPhone doesn't make objc_msgSend automatically horrible. Remember that NeXTSTEP originally ran on 25MHz 68080s, a CPU roughly 10x slower than the iPhone's ARM, and it did everything with Objective-C.
Absolutely @mikeash - how far our hardware has come. I'm not one to prematurely optimize code (especially if I don't have numbers to back it up), but I thought it an interesting micro-benchmark exercise. I'm a firm believer that understanding the mechanics of your "hardware" helps you to make it perform better.

Cheers and thanks for the informative post,

Stu
This stuff is definitely still interesting and good to know even if you shouldn't be thinking about it all the time. Perhaps you could add some CF benchmarks to this stuff:

http://mikeash.com/?page=pyblog/performance-comparisons-of-common-operations-iphone-edition.html
Great idea, I'll take a look at the test suite and add an additional comparison between equivalent Obj-C / CF. I'll post the results here when I give it a go.
G'Day again Mike - ended up adding the benchmarks as you suggested. I posted some information on my blog, and the results are certainly interesting - tested 2nd and 3rd gen iPhones. My results show the IMP call is faster than the C++ virtual method call, as you originally had expected.

Blog post here: http://bit.ly/5pxjgk
Very nice. I like that you were able to benchmark two different types of hardware too.

It looks like the difference between the ObjC and CF calls is approximately equal to the cost of a message send, which is what I'd expect to see.

Note that in the particular case of iterating over stuff, the fast enumeration for/in syntax is almost certain to be the winner. It uses ObjC messaging to talk to the object, but returns objects in bulk and can return interior pointers, which means that the number of calls/messages will be much smaller than the number of objects in the array.

Of course, there's a lot more stuff out there than just iteration, and no equivalent to fast enumeration for the rest.
Indeed, the ObjC / CF is pretty much a non-issue on newer hardware. Certainly not worth the effort.

I was incredibly impressed by the IMP call on the Cortex CPU...wow.

I've implemented the enumerator code before to use the for/in syntax (a simple state machine). It would be nice if Apple introduced a 'yield' keyword like other languages, to save us having to write the state machine code all the time. Regardless, it's fairly easy to write and the performance pretty close to the metal.
Folks, this whole discussion is pointless regarding the iPhone: If you look at the Darwin sources, you'll see that the iPhone's CoreFoundation classes are actually implemented in ObjC.

That's right: CFArray and the likes are ObjC classes on the iPhone.
I do not know what percentage of CF are implemented in ObjC or using the Cocoa apis. What I found is that if you decide to toll-brige among them then you should expect penalties only if you do so from Cocoa to CF, not the opposite, but still using only pure CF calls is faster than their Cocoa equivalents. This is what I found:

Fastest: CFArray accessed with CFArrayGetValueAtIndex
Faster: CFArray accessed with -objectAtIndex
Reference: NSArray accessed with -objectAtIndex
Slower: NSArray accessed with CFArrayGetValueAtIndex

This is consistent among several devices and OS, although actual differences may vary.

Joan Lluch-Zorrilla

btw. Just let me point out that some of your site challege-response questions are not fair for non US people or to whom do not have English as their primary language (specially US English). Consither this one: "Type the word "humour", but with American spelling". Definitely that question can not be solved in a fair way for all humans.

Are there actually people in the world who are sufficiently literate enough in English to be able to make an intelligent post to this site, but who are incapable of googling for "humour american spelling"?
Mike,

Well, that was my point with the "humour" word thing. I certainly could assume the right spelling before searching but I *had* to google it to know for sure. I am not English native, English is my fourth language actually, and I learned the little I know in a non-English speaking country where only British spelling was taught. But yes, you are right that anyone should be able to solve that response challenge after all, if she/he is able to write some English. I think I should #define UIColour UIColor to avoid compiler errors in my code (sorry, just joking).

Thanks and keep with your excellent blog which definitely has taught me a lot.
I can certainly see how it's a little harder for you to fill in the blank, but you can google for it and solve it without much trouble. If it were a question that a non-native speaker simply couldn't answer at all that would be one thing, but as it is, it just makes it slightly harder.

Your #define is amusing. Probably a bad idea, sadly. Although Apple has not been above the occasional #define to fix misspellings in their public APIs.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.