mikeash.com: just this guy, you know?

Posted at 2010-03-12 18:51 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A Skips a Week
Previous article: Friday Q&A 2010-03-05: Compound Futures
Friday Q&A 2010-03-12: Subclassing Class Clusters
by Mike Ash  

Welcome to another chewy edition of Friday Q&A. This week, Gwendal Roué has suggested talking about the techniques of subclassing class clusters.

Abstract Classes
To subclass a class cluster, you need to know what it is, and to understand class clusters you must first understand the concept of abstract classes. It's an easy concept, though.

An abstract class is a class which is not fully functional on its own. It must be subclassed, and the subclass must fill out the missing functionality.

An abstract class is not necessarily an empty shell. It can still contain a lot of functionality all on its own, but it's not complete without a subclass to fill in the holes.

Class Clusters
A class cluster is a hierarchy of classes capped off by a public abstract class. The public class provides an interface and a lot of auxiliary functionality, and then core functionality is implemented by private subclasses. The public class then provides creation methods which return instances of the private subclasses, so that the public class can be used without knowledge of those subclasses.

Take NSArray as an example. It's an abstract class which requires its subclasses to provide implementations of the count and objectAtIndex: methods. It then provides a bunch of methods built on top of those two, such as indexOfObject:, objectEnumerator, makeObjectsPerformSelector:, and many more.

The core functionality is then implemented in private subclasses such as NSCFArray. The NSArray creation methods such as +arrayWithObjects: or -initWithContentsOfFile: then produce instances of those private subclasses.

From the outside, the cluster nature of NSArray is not readily apparent most of the time. It usually makes itself known if you start introspecting the classes of objects, and confuses programmers when they create an NSArray and then start getting messages about an NSCFArray. Other than that, NSArray mostly looks and acts like any other class.

There is one place where the cluster nature is hugely important, and that's if you subclass the public class yourself.

Subclassing
Subclassing a class cluster (which means subclassing an abstract class) is completely different from subclassing a normal class.

When subclassing a normal class, your superclass provides full functionality for whatever it does. A subclass with an empty implementation is completely valid in this case, and will behave just like the superclass. You can then add methods to your implementation to add new functionality or override existing functionality.

When subclassing a class cluster, your superclass does not provide full functionality. It provides a lot of ancillary functionality, but you must provide the core yourself. This means that an empty subclass is not valid. There is a minimum set of methods that you must implement.

In class cluster teminology, those methods that you must implement are called primitive methods. How do you find them? There are two easy ways.

The first way is to crack open the documentation for the cluster class and search it for the word "primitive". The docs will tell you which methods you have to override.

The second way is to open the header for the cluster class. Primitive methods are always found in the class's main @interface block. Additional methods provided by the cluster are always found in categories.

Watch out when looking at cluster classes which are themselves subclasses of another cluster class. The result inherits all primitive methods, and you must implement both sets. For example, NSMutableArray has five primitive methods of its own plus the two from NSArray. If you subclass NSMutableArray, you must provide implementations for all seven.

Techniques
Now you know what to implement, but how? There are three main ways.

First, you can simply provide your own implementation of the primitive methods, implementing them all from scratch. For example, imagine you're writing a specialized array optimized for holding two elements:

    @interface MyPairArray : NSArray
    {
        id _objs[2];
    }
    
    - (id)initWithFirst: (id)first second: (id)second;
    
    @end
    
    @implementation MyPairArray
    
    - (id)initWithFirst: (id)first second: (id)second
    {
        if((self = [self init]))
        {
            _objs[0] = [first retain];
            _objs[1] = [second retain];
        }
        return self;
    }
    
    - (void)dealloc
    {
        [_objs[0] release];
        [_objs[1] release];
        [super dealloc];
    }
    
    - (NSUInteger)count
    {
        return 2;
    }
    
    - (id)objectAtIndex: (NSUInteger)index
    {
        if(index >= 2)
            [NSException raise: NSRangeException format: @"Index (%ld) out of bounds", (long)index];
        return _objs[index];
    }
    
    @end
Precisely how you implement the primitives depends, of course, on precisely what you want them to do.

Second, you can keep a working instance around, obtained from the public API, and pass your calls through to it:

    @interface MySpecialArray : NSArray
    {
        NSArray *_realArray;
    }
    
    - (id)initWithArray: (NSArray *)array;
    
    @end
    
    @implementation MySpecialArray
    
    - (id)initWithArray: (NSArray *)array
    {
        if((self = [self init]))
        {
            _realArray = [array copy];
        }
        return self;
    }
    
    - (void)dealloc
    {
        [_realArray release];
        [super dealloc];
    }
    
    - (NSUInteger)count
    {
        return [_realArray count];
    }
    
    - (id)objectAtIndex: (NSUInteger)index
    {
        id obj = [_realArray objectAtIndex: index];
        // do some processing with obj
        return obj;
    }
    
    // maybe implement more methods here
    
    @end
This technique allows you to reuse the existing implementations of the primitive methods, and then add more functionality.

The third technique is to simply add a category to the cluster class instead of subclassing it. People often subclass simply to add new methods, and not to modify existing functionality. In Objective-C, you can add new methods in a category:

    @interface NSArray (FirstObjectAdditions)
    
    - (id)my_firstObject;
    
    @end
    
    @implementation NSArray (FirstObjectAdditions)
    
    - (id)my_firstObject
    {
        return [self count] ? [self objectAtIndex: 0] : nil;
    }
    
    @end
(The method is prefixed to prevent a conflict if Apple should ever add a firstObject method.)

Conclusion
Class clusters are different from normal classes, but are easy to subclass once you understand the differences and what they mean. You're required to implement the class cluster's primitive methods, which you can do by providing a from-scratch implementation, or by passing through to another instance. Finally, if your only purpose in subclassing is to add new methods, create a category instead.

That's it for this week. Come back in seven days for another crunchy post. Until then, keep your ideas coming. Friday Q&A is driven by reader ideas, so if you have a topic that you would like to see covered here, send it in!

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Are you sure you meant to do:

if ((self = [self init])) { ... }

Shouldn't it be [super init]?
No, because I'm not implementing init, so I don't need super. In this particular case, both are equivalent, because I don't override init, but if I did, I would want it to be called.

See http://www.mikeash.com/pyblog/the-how-and-why-of-cocoa-initializers.html
Is it true that there are no compiler-enforced abstract classes in Obj-C? That's a shame if so.
There aren't. I've never found it to be a problem. The typical pattern is to call out the necessary overrides in the docs/header, and implement the methods to log an error so it's easy to find at runtime:

- (void)somePrimitiveMethod
{
    NSLog(@"-[AbstractClass somePrimitiveMethod] called, this should never happen. Did you forget to implement -[%@ somePrimitiveMethod]?", [self class]);
    [self doesNotUnderstand: _cmd];
}

This works well enough.
There’s not much compiler-enforced anything in Objective-C, and the idea doesn’t fit the nature of the language. How would you go about doing this in the compiler? Object creation doesn’t involve a special operator, and you need to be able to refer to the class object for things like isKindOfClass: checks.

If you really wanted, you could override +alloc to refuse to allocate instances of an abstract class, but there’s little advantage to having a runtime failure in +alloc rather than slightly later when you try to actually use the object (or in the initializer).
For those wondering... Class clusters are more commonly known as the Abstract Factory design pattern. Not sure why Cocoa has to use different terminology.
Trevor: for the simple reason that Cocoa’s usage predates the GoF book and the habit of Patterns With Names in Capital Letters.

Incidentally, there’s another problem with compiler checking of virtual classes: it’s normal and correct to use the standard allocation pattern with a class cluster, as in [[NSArray alloc] initWithSomething]. In this case, a subclass (or proxy) could validly be returned by +alloc, -initWithSomething, or both, and this could change between system versions.
From what I can tell, "class cluster" describes much more than Abstract Factory, although it involves an Abstract Factory. Abstract Factory is simply a class which instantiates objects of other, hidden classes. Class clusters involve that, but also have the notion of explicitly allowing subclasses, with required primitive methods and a great deal of concrete functionality implemented in the base class that's built on top of the primitives. I don't see any of those aspects in the descriptions of Abstract Factory that I've found.
There are two things that a compiler can enforce with abstract classes: definition and instantiation. While I agree that detecting instantiation of abstract classes wouldn't make a lot of sense in Obj C, it would be possible to treat an incomplete implementation as an error. It would b fairly easy to tag abstract classes and methods (e.g. @abstract) and then refuse to compile if a concrete subclass does not implement all of the abstract methods from its superclasses. Personally, I am a fan of lettin the compiler do this kind of work fir me, rather than scouring the documentation or blowing up at runtime.
Alex J: I don't think your plan is as simple as you think. What if I want to subclass an abstract class to create another abstract class by not overriding all of the necessary methods, like NSMutableString does? You could allow that with extra keywords and such, but it adds complication.

Worse, I think, is the fact that method implementations cannot all be seen by the compiler at compile time. What if your class is concrete by virtue of a category that the compiler doesn't know about? What if it's concrete because you add the necessary methods at runtime? Again, you can fix this with more extra keywords, but that's more complication. Although ObjC 2.0 is getting away from it to some extent, ObjC's underlying philosophy is to be a minimal set of extensions to C, so I think a complicated system of abstract class keywords wouldn't be worth it.
Great write-up as usual. I'm more curious though about your take on creating class clusters; specifically, when it's appropriate or not, how to safely patch the init chain when instantiating a subclass, etc.
I don't think I've ever created one or seen a reason to. Class clusters are usually most useful for an API provider, since they make it easy to change the implementation of something behind the scenes without breaking client code, and provide specialized implementations for different uses. When you're writing both sides, you can just access stuff more directly, or use a more explicit factory pattern.

For the implementation, I'd probably do something like:

    + (id)allocWithZone: (NSZone *)zone
    {
        if(self == [MyAbstractClass class])
            return [MyAbstractClassInitFactory allocWithZone: zone];
        else
            return [super allocWithZone: zone];
    }
    
    @implementation MyAbstractClassInitFactory
    
    - (id)init
    {
        [self release];
        return [[MyConcreteClassThatHandlesInit alloc] init];
    }
    
    - (id)initWithSomeObject: (id)
    {
        [self release];
        return [[MyConcreteClassThatHandlesSomeObject alloc] init];
    }
    
    // etc.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.