mikeash.com: just this guy, you know?

Posted at 2011-02-18 16:20 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2011-03-04: A Tour of OSAtomic
Previous article: Complete Friday Q&A Direct-Sell ePub, PDF, and Print on Demand
Tags: c99 fridayqna
Friday Q&A 2011-02-18: Compound Literals
by Mike Ash  

We're back to our regular schedule at last. For today's edition, I'm taking a break from the usual reader-driven format to discuss a topic of my own choosing: compound literals in C99.

Compound literals are a relatively unknown feature in C. They are fairly new. Introduced as part of the C99 standard in 2000, they've been around for a while, but for a language that dates to the 1960s, it's a recent addition.

C99 added a lot of useful features to the language that modern Mac and iOS programmers tend to take for granted. Many of these existed as compiler extensions beforehand. Simple features like // comments, the long long type, and the ability to mix variable declarations and code are all new in C99. Compound literals are much less well known than these features, but are equally standard and can be handy, and this is why I want to talk about them today.

Compound Literal Basics
Compound literals provide a way to write values of arbitrary data types in code. An expression like "hello" is a string literal that has type char *. A compound literal is simply a different kind of expression that has whatever type you're after. For example, it's possible to create an expression which produces the same C string as the string literal, but with explicit character-by-character construction:

    (char []){ 'h', 'e', 'l', 'l', 'o', '\0' }
This is not particularly useful, of course. However, more useful things can be done with the syntax as well:
    (NSSize){ 1, 2 }
This is equivalent to NSMakeSize(1, 2) but without the need for an external function. Similar syntax will work for any type, even custom-defined structs.

Compound literal syntax closely matches variable initialization syntax. For example:

    NSSize s = { 1, 2 };
    (NSSize){ 1, 2 }; // same value
    
    int x[] = { 3, 4, 5 };
    (int []){ 3, 4, 5 }; // same
And in general, if a variable is declared with an initializer, then a compound literal with the same type and value can be written by sticking the type in parentheses and placing the initializer immediately after it:
    Type name = { val };
    (Type){ val };
There is one exception to this rule. Primitive types (like int) don't require {} to be initialized, but {} is still required to create a compound literal. It is not the same to write (int)3 and (int){ 3 }, although they act similarly in many cases. The former simply takes the integer constant 3 and uselessly casts it to int, whereas the latter is essentially a variable declaration with no name.

Basic Tricks
The ability to create custom struct values is probably the most useful obvious application of compound literals. Although Cocoa takes care of its most common types with NSMakeRect and friends, there are still places to put compound literals to good use.

For example, a CGRect is really just an CGPoint and an CGSize. CGRectMake takes four discrete numbers, but sometimes it's more convenient to just deal with those two elements. Compound literals let you do that inline:

    [layer setFrame: (CGRect){ origin, size }];
The ability to create array literals can also be useful. For example, this creates a string containing a copyright symbol:
    [NSString stringWithCharacters: (unichar []){ 0x00a9 } length: 1]
Scope
A compound literal is essentially an anonymous variable declaration and initialization. As such, it follows the same scoping rules as regular variables. For example, this is perfectly legal:
    int *ptr;
    ptr = (int []){ 42 };
    NSLog(@"%d", *ptr);
The compound literal is still in scope when the NSLog executes, so it is legal to dereference the pointer. This, however, is not legal:
    int *ptr;
    do {
        ptr = (int []){ 42 };
    } while(0);
    NSLog(@"%d", *ptr);
The compound literal's lifetime is tied to the scope of the do/while loop, and it no longer exists afterwards. The NSLog statement may print junk or crash.

Mutability
One really unintuitive thing about compound literals is that, unless you declare their type as const, they produce mutable values. The following is perfectly legal, albeit completely pointless, code:

    (int){ 0 } = 42;
Less uselessly, this fact means that you can take the address of compound literals, and it's safe to pass them to code which will modify the pointed-to value. There are a lot of functions which take a pointer as a parameter purely to allow passing different data types, but ultimately they just want a primitive. Using compound literals, you can pass that primitive value inline instead of having to create a temporary variable.

For example, a common operation on sockets is to set the SO_REUSEADDR option. This tells the OS to free up the socket's port for use as soon as the socket is closed, instead of the default behavior of waiting a few minutes first. This option is set using setsockopt. It can be used to set various parameters which need different argument types, so it simply takes a void * and a length. This is how it's normally used to set this option:

    int yes = 1;
    setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(yes));
Using compound literals, we can make it look more natural:
    setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &(int){ 1 }, sizeof(int));
This is a fairly minor aesthetic thing, but it is a bit cleaner and more readable. It'll also clutter up the debugger with one fewer local variable.

Another place where this comes in handy is writing methods which take an NSError ** parameter to pass error information to the caller. By convention, it's legal to pass NULL as the pointer to indicate that the caller doesn't care about the error. This means that at every place where an error can occur, the pointer must be checked. This gets a bit tedious:

    - (BOOL)doWithError: (NSError **)error
    {
        if(fail1)
        {
            if(error)
                *error = [NSError ...];
            return NO;
        }
        if(fail2)
        {
            if(error)
                *error = [NSError ...];
            return NO;
        }
        if(fail3)
        {
            if(error)
                *error = [NSError ...];
            return NO;
        }
        
        return YES;
    }
By using a compound literal to create some local storage, you can ensure that the error pointer is always valid, and thus eliminate the constant checks:
    - (BOOL)doWithError: (NSError **)error
    {
        error = error ? error : &(NSError *){ nil };
        
        if(fail1)
        {
            *error = [NSError ...];
            return NO;
        }
        if(fail2)
        {
            *error = [NSError ...];
            return NO;
        }
        if(fail3)
        {
            *error = [NSError ...];
            return NO;
        }
        
        return YES;
    }
This costs some efficiency, because it creates error objects unnecessarily if the parameter is NULL, but that generally wouldn't matter, and the result is somewhat more readable. It also allows the method to call other error-returning methods in a natural way and make use of the result before returning the error to the caller:
    - (BOOL)doWithError: (NSError **)error
    {
        error = error ? error : &(NSError *){ nil };
        
        BOOL success = [obj doWithError: error];
        if(!success)
        {
            // don't bail out if we can work around it
            if(![[*error domain] isEqual: CanWorkAroundDomain])
                return NO;
        }
        
        if(fail1)
        {
            *error = [NSError ...];
            return NO;
        }
        
        return YES;
    }
Vararg Macros
I discussed using compound literals and macros a bit in my post on C macros, but it's useful enough that it bears repeating. By using a compound literal to create an array, you can easily create a macro which takes variable arguments and then does something useful with them. As an example, this macro makes it simpler to create NSArray objects:
    #define ARRAY(...) [NSArray \
                         arrayWithObjects: (id []){ __VA_ARGS__ } \
                         count: sizeof((id []){ __VA_ARGS__ }) / sizeof(id)]
By using the id [] syntax with compound literals, and by using sizeof on the resulting array, you can create macros which do useful things with an arbitrary number of arguments.

Conclusion
Compound literals are a nice trick to simplify and clarify code. They are not universally applicable, and you must take care not to use them in situations where they hurt more than they help. However, they are a nice tool to have in your bag of tricks, and they help make C a little more useful and generic.

That's it for this time. I hope to be back to my regular schedule now, so look for another post in two weeks. Until then keep sending your ideas. Friday Q&A is (usually!) driven by reader suggestions, so if you have a topic that you would like to see covered here, send it to me.

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

I think you missed my favourite trick:

someView.frame = (CGRect){ .size = someSize };

Any member you don't specify is zero-ed out for you.
It's amazing how you manage to take a subject I think I know everything about, and teach me something new about it, over and over again. Every time I tell myself I should read more docs and specs :P Is that your secret to knowing every piece of programming related knowledge in the universe?

(this time it was the mutability of the literals, that they're actually like anonymous variables. Thanks!)
In this example

if(!error)
    error = &(NSError *){ nil };

wouldn't the literal become invalid after the scope of the if clause is left, thus resulting in a dangling pointer just like in the "Scope" case above?
nevyn: I do read a lot, that is true. I think some of it is having a somewhat perverted mind when it comes to programming languages. I discovered the mutability of compound literals by just seeing if it worked or not, and being somewhat surprised when it did.

Johannes Fortmann: If statements don't actually create a new scope unless you also include explicit {} after it. It's safe to take the address of compound literals (and make blocks without copying them) inside a bare if.
Another great trick one in the vein of Jonathan's:

int myarray[3] = { [1] = 3 }; //results in {0, 3, 0}

beautiful when you're declaring something at file scope and using enum constants for indices, because it means no more counting the lines and then comparing that to the enum values
Regarding scoping I was just going to ask the same thing. Comparing the assembly output between the two cases shows no differences (gcc -std=c99 -O0) so it looks to me either both create a new scope or neither one does.

Then again, I tested with 'struct T { int i; }' rather than NSError*, maybe there's a difference there somewhere?
Jared: That is nice. It should be especially nice for writing state transition tables. I have a chunk of hairy code that could be made significantly more clear using that.

Tommi: The subject of if statements and scopes came up on this blog in the comments to this post:

http://www.mikeash.com/pyblog/friday-qa-2010-01-15-stack-and-heap-objects-in-objective-c.html

I had originally stated that the if statement had its own scope, and the comments said otherwise. However, now that I actually look it up, I believe you (and my original position) are right. From my copy of a draft C standard:

A selection statement is a block whose scope is a strict subset of the scope of its enclosing block. Each associated substatement is also a block whose scope is a strict subset of the scope of the selection statement.

The term "selection statement" covers if, if/else, and switch.

Although it seems that gcc and clang will let you get away with this construct, to be completely safe you should probably write it using the conditional operator instead:

error = error ? error : &(NSError *){ nil };

Or using the gcc extension that lets you omit the middle:

error = error ?: &(NSError *){ nil };
Clearly the next step is to support this:

error ?= &(NSError *){ nil };

But yeah, I love the field initialization syntax that Jonathan and Jared brought up.
I think there's a good reason to use functions like NSMakeRect(): They make the code very readable. Compound literals are generally used very little. If I had to debug your code before reading this post, I'd have had no clue what this fancy syntax does. I wouldn't even have any idea where to look for the specific documentation that tells me about compound literals. If I see "NSMakeRect" somewhere, I can just click on it and XCode shows me the documentation.

In an ideal world, every developer should know every detail about every language or API they use. But in practice, I keep forgetting stuff I don't use regularily.
That is exactly why I wrote this post. People don't know about this syntax, despite the fact that it's useful and has been a standardized part of the language for over a decade. Now, a few more people know about it than before.

I have seen this argument pop up before and to me it sounds like an excuse not to learn. If we can't use things that most people don't use regularly, then there is no room for change or growth.
Jakob:

Well, if I see

    [layer setFrame: (CGRect){ origin, size }];

then it's pretty clear that a rectangle is being constructed. If I'm not sure, then a quick click on setFrame: in Xcode will tell me.
Mike, I have deep respect for you as a developer.

As a potential influencer of C-writing youth, however, this post is a bull in a china shop.


setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &(int){ 1 }, sizeof(int));


is neither "cleaner" nor "more readable" than the two-step alternative. The two-step example tells a maintainer what '1' means to the called function! And sure, reuseaddr might be an easy-to-remember case, but what about o_nonblock?

Your example with the NSError initializer creates a function that no-ops silently (and probably leads to a memory leak) instead of failing conspicuously. If it's supposed to output an NSError but the incoming output location is NULL, you could scarcely do worse than creating a local, fake output address.

Compound literals have a place in the world, but if I were interviewing someone and they tried to tell me how great this compound literal style is, and referenced these examples, I would consider it a mercy to yell at them before asking them to get the hell out of my office.
Would &(int){ YES } be more to your liking? I see no difference between declaring a temporary variable to hold the value and doing so for any function or method which takes an int by value. Do you declare temporary variables in all of those cases as well?

Regarding NSError, you seem to have deeply misunderstood how the whole system works. It is the standard Cocoa convention that you can pass NULL to an NSError ** parameter if you don't care about the particulars of the error. Success or failure is still signaled by use of the return value. If the caller passel NULL, you are still supposed to indicate success or failure, and you are not supposed to fail just because you pass NULL. I have no idea why you think there would be a memory leak here. I can't even begin to correct it without knowing why you think there would be, but suffice it to say that there is not.

No offense, but if you were interviewing me and you started going on about silent failures and memory leaks in a situation where they clearly do not apply, I would consider it a stroke of luck to not get the job.
I don't like the &(int)(1), but &(int)(YES) is just as readable as the two step code. Slightly more so, actually, since I know even if the called function changes the value I don't care.

(And yeah, there's no leak. Obviously.)
I love how someone can give credential before lecturing someone to end up being completely wrong. I'd love to know where that leaks could come from.
I just discovered that the compiler accepts

(CGRect){ x, y, w, h}

instead of

(CGRect){{x, y}, {w, h}}

So you can "flatten" the struct.
Hopefully they behave the same.

int *ptr;
    do {
        ptr = (int []){ 42 };
    } while(0);
    NSLog(@"%d", *ptr);


Neither doesn't crash nor prints junk value in Xcode 5. It just prints 42. How could it work like this?

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.