mikeash.com: just this guy, you know?

Posted at 2009-02-20 20:40 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2009-02-27: Holistic Optimization
Previous article: Friday Q&A 2009-02-13: Operations-Based Parallelization
Tags: cocoa fridayqna ipc performance
Friday Q&A 2009-02-20: The Good and Bad of Distributed Objects
by Mike Ash  

Welcome back to another Friday Q&A. This week I'm going to take Erik's (no last name given) suggestion from my interprocess communication post and expand a bit on Distributed Objects, what makes it so cool, and the problems that it has.

Overview
I'm not going to take a detailed look at the basics of Distributed Objects, as there are plenty of resources out there for that. But for those readers who are unfamiliar with Distributed Objects, I'll give a quick look at what it is.

Distributed Objects (DO) is a Cocoa API for interprocess communication built on automatic proxying of messages, so that remote objects look and act mostly like local objects. The primary interface to the DO system is the NSConnection class. Vending an object with DO is as easy as this:

    NSConnection *connection = [[NSConnection connectionWithReceivePort:[NSPort port]] sendPort:nil];
    [connection setRootObject:theObject];
    [connection registerName:@"com.example.whatever"];
And accessing that vended object from another process is as easy as this:
    id theObject = (id)[NSConnection rootProxyForConnectionWithRegisteredName:@"com.example.whatever" host:nil];
    [theObject someMethod];
From this point, theObject acts as though it were local. You can send messages to it just like any other object. You can pass it local objects as arguments or get remote objects back as return values, and DO takes care of proxying or serializing the objects as appropriate. It's very cool, and there's literally nothing else that needs to be done other than the above code if you just need basic functionality.

The Good
I think this should be pretty clear from the above description, but let's quickly compare it to other IPC mechanisms:

  1. It's easy. Just a few lines of code to set up a fully functional connection.
  2. It's transparent. For the most part, remote objects can be passed around just like local objects. This means that little of your code needs to be DO-aware.
  3. It's flexible. DO can be used over mach ports or sockets. It can be used to communicate between threads or between processes. It's reasonably configurable.
  4. It's robust. Because DO works the same way as Objective-C messages, you don't have the same problems you might have trying to support two different protocol versions. Of course it's not always rosy, but if your changes involve implementing different methods, it's easy to check for what's going on using respondsToSelector: and the like, rather than having to give up.
All of this makes DO a very useful facility.

The Bad
So if DO is so great, why am I not a bigger fan? Part of it is simply because DO can't completely abstract away the fact that it's running over a transport layer and talking to a remote process, and part of it is because DO itself is just not as good as it could be.

One leaky abstraction is primitive types. DO needs to be able to either serialize (i.e. copy across the connection) or proxy everything that gets used as an argument or returned as a value with a "distant object". For objects, this is fine. Objects that want to be copied can implement NSCoding, and all other objects can use Objective-C's built-in message capturing facilities to proxy all requests across the connection.

For primitives, things get harder. For scalars and even structs, the Objective-C runtime provides enough type information that these can be copied across. But once you hit pointers, things fall apart. Imagine trying to proxy a call like this:

    [array getObjects:objarray range:NSMakeRange(5, 13)];
It basically can't be done. DO would have to somehow know that the length of the objarray parameter is determined by the length of the range being passed in, and copy only that much memory across the connection. It would also have to know that this is a return-by-reference only, and that it shouldn't be trying to serialize or proxy the contents of objarray across the connection (it could be filled with junk, and an attempt to proxy that junk would crash). Yes, DO could special-case this particular method, but it won't be able to deal with arbitrary such methods.

DO does have some interesing language-level facilities to help with this. You can specify a pointer parameter as being in, out, or inout so that it knows which way to serialize or proxy. But this only works with pointers to single objects. For arrays, it just can't cope.

Another leaky abstraction is that the process that you're talking to could disappear at any moment. Objective-C just isn't set up to deal with this very well:

    id obj = [self method];
    [obj thing1];
    [obj thing2];
In normal Objective-C, obj is either broken from the start (in which case you'll crash) or it remains valid throughout the method. But if obj is a distant object, suddenly things are not so clear. The remote process could disappear (or freeze up and time out) in between the call to thing1 and thing2.

When that happens, DO deals with it by throwing an exception. Surprise!

Most Objective-C code is not exception safe. Although there's no particular reason that exceptions can't be used more frequently the way they are in languages like Java, convention is that exceptions in Objective-C are only used to indicate a programming error. In order to really robustly use DO, you need to write your code such that it can handle an exception being thrown by any interaction with a distant object. Worse yet, this includes Cocoa code, meaning that you essentially cannot pass a distant object to any Cocoa code. (Ever wonder what happens when an NSSet sends -hash to your object, and -hash throws an exception? Odds are fair that it leaves the NSSet in an inconsistent state that will lead to a crash.)

This requirement for all code touching distant objects to be exception safe is tough, and greatly limits the places in which DO can be practically used. The promise is that remote objects look like local objects, and they mostly do, but this one (absolutely necessary) detail means that they can't be used like local objects at all.

Lastly, DO is not very modular or extensible. Ideally, DO would be a fully modular system. You'd have the DO system which would sit on top of some kind of interchangeable transport class. Customizing the transport class (for example, to make it use encryption, or talk to a serial port, or use avian carriers) would simply be a matter of subclassing a public abstract class and implementing a documented set of primitive methods.

The reality is not so simple. The classes that DO uses internally are fairly tightly coupled, and there's a lot of legacy cruft. Implementing a custom NSPort subclass that works with NSConnection is so difficult that I'm only aware of one working example (Secure Distributed Objects, which appears to be a dead project now). This pretty much sinks the idea of using DO for any serious network communication, since DO doesn't encrypt the transport and it's not practical to add encryption to it.

Conclusion
Distributed Objects is a very cool system that has many uses. Unfortunately, due to both the costraints under which it works and some poor design decisions, it's not as useful as it could be. It can still be handy for doing IPC and it's a great tool to have in Cocoa, but it falls short of being a no-brainer way to talk to other processes.

That wraps up this week's Friday Q&A. Check back next week for another exciting installment. In the meantime, keep those ideas coming. Friday Q&A is powered by your submissions, so don't be shy. Post your topic ideas in the comments below or e-mail them directly to me. (Yes, I link my real e-mail address directly on the web! How can you refuse that!) If I use your suggestion then I will use your name unless you tell me otherwise.

Love Distributed Objects? Think your pet RPC mechanism is better? Fire away.

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

I love the idea of DO, but the problems rearing up can be huge. It's much easier to use DO for quick communications than to actually hang on to a remote object.

You forgot one thing, too. DO uses NSCoding to send objects, but AFAIK it doesn't support keyed coding. Which means you need to keep supporting the much more easy-to-break unkeyed system, which is all but legacy now.

I wish DO+GC resulted in dropped objects becoming nil, or that you could set something so that return values became nil when the connection disappears. Throwing an exception is one of the more inconvenient choices for signaling this error.
Having dropped objects become nil is problematic for various reasons. First, there's no way to find all references to an object and reliably nil them out (not even under GC, zeroing weak references are a special case and only work in heap memory). Second, having the contents of a Cocoa collections class which doesn't allow nil to suddenly become nil would be a bad thing.

A more benign failure mode would be useful, certainly. That's actually something that you could add yourself with a little NSProxy subclass. The bad news is that it would require an explicit check and instantiation of your proxy class in all code that could obtain a distant object.
Are there any potential problems developers should be aware of when using DO between processes of which one is running under GC and another's memory is managed manually? There have been some posts on mailing lists about some problems potentially related to DO & GC. I haven't noticed any problems with this setup myself yet, though.

In my own Cocoa projects I have noticed that DO's transparency disappears pretty quickly in practice when one starts to build exception-catching, operation-retrying, non-blocking, time outing proxies around objects vended over DO. However, it still seems like a good choice for IPC between two Cocoa processes.
I've never used DO and GC together so this is purely theory, but I don't think there would be a problem. Memory management stays internal to each process. Proxies get created in each process, and those proxies will live or die by the memory management system in use in that process. The major problem with GC is that I don't believe that circular references which cross over the DO boundary will be collected, because the collector can't trace the connections in the remote process.
I'm a little confused about your hypothetical of calling -hash on an NSSet with DO and having that set be left in an inconsistent state. Could you please elaborate on this example?
It's not calling hash on an NSSet, it's NSSet calling hash on your object, and this call throws an exception because the DO connection died. Imagine code structured like this:

@interface NSSet

- (void)doSomeThing
{
    [self startTheThing];
    [self findBucket: [yourObject hash]];
    [self endTheThing];
}

@end

If startTheThing puts the set into an inconsistent state, and endTheThing completes the work and puts the set back where it was, throwing an exception inside hash will cause problems.
Oh okay, thanks for clarifying, I understand better now.
Is it possible that distributed objects (with a small d, small o; ie the concept, not Apple's implementation) is simply a bad abstraction?

There are a number of abstractions that seemed cool from an engineering standpoint (RPC, networked VM, publish-and-subscribe, DDE then OLE) but which basically never took off. In at least some of these cases there has been massive company pressure behind them, but no real traction among users or developers. These things seem to be what I'll call "bad abstractions". They're trying to solve a problem by reducing it to an already solved problem, but the gap between the reality and the metaphor is just too large to be straddled.

In the 90s there was this whole industry that grew up around the idea that computers would talk to servers via RPC and distributed objects, and it didn't happen. Not because talking to servers was a bad idea but, as far as I can tell, because talking to servers using network-aware constructs (and with the consequences in terms of making errors --- the errors specific to networks --- more visible, but also then easier to handle appropriately). And so --- HTTP.
Similarly with RPC where it's still done, of course, but with things like REST and SOAP, not the classic SUN-style RPC.

We saw a similar thing with presentation where, rather than the "one GUI everywhere" idea of Java (and some competitors) the solution that won didn't exactly pretend to be something it wasn't. HTML wasn't powerful, but it did what it did in a way that matched the task to be solved, it didn't create a UI layer that sorta pretended to be Mac or Windows while not actually working like either.

Point is --- Apple have probably look at this history and concluded that, regardless of what people might claim, distributed objects is a loser technology, and there's no point in trying to make it work better. If you want to do anything serious over the network, you're probably going to be a lot happier using network appropriate primitives rather than trying to hide the existence of the network.
A more benign failure mode would be useful, certainly. That's actually something that you could add yourself with a little NSProxy subclass. The bad news is that it would require an explicit check and instantiation of your proxy class in all code that could obtain a distant object.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.