mikeash.com: just this guy, you know?

Posted at 2013-05-31 13:46 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2013-06-14: Reachability
Previous article: Friday Q&A 2013-05-17: Let's Build stringWithFormat:
Tags: c fridayqna quiz
Friday Q&A 2013-05-31: C Quiz
by Mike Ash  

I thought I'd mix things up a bit today and give my readers a quiz. The C language is perhaps the most popular computer language in existence, but it's also quite odd, and because of that often poorly understood. I'd like to give you a quiz to see how much you know about some of the odd but useful corners of the language.

Questions
Here are the questions. The answers will follow. Try to answer all of the questions yourself before reading the answers. Try to do it from memory alone first, then check the answer using your compiler, the language spec, or whatever else you want to use. Keep in mind that answers found by testing your compiler may only reflect your environment, and will not necessarily be generally correct.

  1. What is the type of the character literal 'a'?
  2. What is the type of the expression a == b?
  3. What is the value of the expression 1 == 1? How about 0 == 1?
  4. What is the value of the expression 42 || 0?
  5. What is the value of the expression -1 < 1?
  6. What is the value of the expression -1 < 1U?
  7. Given a local variable declared as char a[10], what is the value of sizeof(a)?
  8. Given a function declared as void f(char a[10]), what is the value of sizeof(a) within the function body?
  9. What is the value of UINT_MAX + 1?
  10. What is the value of INT_MAX + 1?
  11. What is the type of NULL?
  12. What is sizeof for types char, short, int, long, and long long?
  13. What is the format specifier for printing an int using printf?
  14. What is the format specifier for printing a short using printf?
  15. What is the format specifier for printing a double using printf?
  16. What is the format specifier for printing a float using printf?
  17. What is the value of the expression *(char *)NULL?
  18. What is the type of the string literal "abcde"?
  19. What is the value of sizeof("abcde")?
  20. What is the result of executing free(NULL)?
  21. What is the result of executing realloc(NULL, sizeof(int))?

Intermission
Let's give a little space for people whose eyes drifted down while reading the last few questions.

A little more space than that, I think.

Just another paragraph or two.

Did you try to figure out the answers yourself? You should give it a shot before you continue on.

Last chance! Answers follow.

1. What is the type of the character literal 'a'?
This one's easy. It's a character literal, so its type is char. Right?

Nope. The type is int. I'm not entirely sure why, but that's what the standard says. The value is still what you'd expect, so there's little practical consequence to this. Generally, you'd only notice if you tried to use sizeof on one.

2. What is the type of the expression a == b?
The natural response for someone coming from a more sane language would be BOOL, or bool, or maybe even _Bool, which is C's official built-in boolean type as of the C99 standard. However, this is not the case: comparison expressions always have type int.

3. What is the value of the expression '1 == 1'? How about '0 == 1'?
Although the type is always int, at least some sanity prevails: the value is always 1 when the comparison is true, and 0 when false. The answers here are, respectively, 1 and 0.

4. What is the value of the expression 42 || 0?
The || operator is just like the == in that it always returns 1 or 0. The same is true of all other logical and comparison operators. In this case, since at least one of the operands is non-zero, the value of the expression is 1. This can be an unpleasant surprise if you're used to languages where the logical or returns the first true value it sees, which in this case would be 42.

5. What is the value of the expression -1 < 1?
This is as obvious as it looks: the value is 1, because the comparison is true. This question is really only here to set up the next one.

6. What is the value of the expression -1 < 1U?
This should be the same as before... but it's not. The type of -1 is int, while the type of 1U is unsigned int. In order to make the comparison, the -1 is converted to an unsigned int and the resulting value compared with 1. The resulting value is larger than 1, so the comparison is false, and the expression yields 0.

7. Given a local variable declared as char a[10], what is the value of sizeof(a)?
The answer is pretty simple: a is an array of 10 chars, and the value of sizeof(a) is 10. You might be tempted to answer 10 * sizeof(char), which is technically correct, but redundant. By definition, sizeof(char) is always 1. Although the C standard talks about "bytes", it has its own peculiar definition of the word, which corresponds to the char type. Even on an exotic system where char is 32 bits, sizeof(char) is still 1. If you want to know how big a char is in absolute terms, the CHAR_BIT macro will tell you how many bits it contains.

8. Given a function declared as void f(char a[10]), what is the value of sizeof(a) within the function body?
This is the same as the previous question, so the answer is 10, right?

You surely know better than that by now.

This is a seriously weird corner of C. Function parameters declared as array types are invisibly converted to pointer types. These two function prototypes are exactly the same:

    void f(char a[10]);
    void f(char *a);

The array notation is basically just a way for the programmer to document the code a bit. We can read this as saying that f takes an array of 10 chars. To the compiler, it just says that f takes a pointer to char.

Accordingly, sizeof(a) is whatever the size of a pointer is on your system. On Apple platforms, it'll be 4 or 8 depending on whether you're running in 32-bit mode or 64-bit mode.

9. What is the value of UINT_MAX + 1?
In a language full of pitfalls and undefined behavior the moment you step outside the rules, the behavior of unsigned types is comfortingly well-specified. All results are computed modulo the largest representable number plus one. Thus, UINT_MAX + 1 simply produces zero.

10. What is the value of INT_MAX + 1?
Our joy is short-lived. You might think this would also be zero, or that it would wrap around to INT_MIN, or that it would clamp to INT_MAX. All of these are possible. The value can also be 42, or a randomly generated number. Or executing the statement could cause your web browser to visit zombo.com, reboot your computer, or erase everything in ~. Signed integer overflow is undefined behavior, which means the compiler is allowed to do basically whatever it wants at this point.

11. What is the type of NULL?
It's a pointer type, so it must be some kind of pointer. Maybe void *?

It can be void *. It can also be int or another integer type! The standard says that NULL is a null pointer constant, which in turn is "An integer constant expression with the value 0, or such an expression cast to type void *...."

Because of this, you have to be careful when passing NULL (or nil) into a variadic function, since the type isn't well defined unless you cast it. This is technically incorrect:

    printf("%p", NULL);

Although modern compilers are generally good about ensuring that this is safe. Clang, for example, uses a compiler built-in for NULL which behaves appropriately here.

12. What is sizeof for types char, short, int, long, and long long?
The answer for char is, of course, 1. The answers for the rest are, "it depends".

It would be perfectly legal for a C implementation to make char be a 64-bit quantity, and have the size of every data type on this list be 1. I'm not sure if this has ever been done, but there are some environments, such as digital signal processors, where char is a 32-bit quantity, and sizeof everything up to int is 1.

However, while the "it depends" answer is technically correct, it's also useful to know the actual quantities for the systems we use. For Apple platforms, the numbers are:

Note that you should still use the uintXX_t types from stdint.h when precise sizes are important in your code, rather than relying on the above quantities. You'll be glad you did when Apple's next big platform suddenly has weird sizes for everything.

13. What is the format specifier for printing an int using printf?
There's no trickery here: it's either %d or %i, depending entirely on your preference.

14. What is the format specifier for printing a short using printf?
You use an h modifier to indicate that the argument is a short, e.g. %hd. However, this is unnecessary: you can simply use the format specifier for int. Types smaller than int are promoted to int when passed as a variable argument to a variadic function like printf. That means that %d works not only for int, but also short and char.

(Technically, %d may not be valid for char. If char is unsigned, and if sizeof(int) == 1, then a char will be promoted to an unsigned int instead of an int. Then, if the value of the char can't be represented in a signed int, attempting to treat it as one is invalid. This is unlikely to be a concern on any platform you encounter.)

15. What is the format specifier for printing a double using printf?
Another easy one: %f is the most common one, and printf also supports %F, %e, %E, %g, %G, %a, and %A. See the documentation for an explanation of what all these variants are for.

16. What is the format specifier for printing a float using printf?
This is another trick question due to argument promotion. When used as a variable argument, float is promoted to double, so all of the format specifiers for double also work for float.

17. What is the value of the expression *(char *)NULL?
A common response to this would be "it crashes", which is certainly one possible outcome. However, this is yet another instance of undefined behavior: you're simply not allowed to dereference NULL, and when you try, the compiler is allowed to do anything. In particular, if the compiler can figure out at compile time that you're trying to dereference NULL, it's free to do things like assume that code can never actually execute (because it would be illegal) and optimize out branches accordingly. Don't write the above to intentionally cause a crash; call abort() instead.

18. What is the type of the string literal "abcde"?
String literals are char arrays with the appropriate size. The specific string literal has type char[6]. Note the extra array element for the terminating NUL character.

It is illegal to modify the contents of a string literal, but the type is not const char[6]. This is a holdover from the days when C had no const keyword. You're just supposed to know that you can't modify them.

19. What is the value of sizeof("abcde")?
This should be obvious given the previous answer: 6. Again, note the extra element for the terminating NUL. It's easy to count five characters in the string and mistakenly think that the size is 5. Also note that, while string literals are usually treated as char *, they are arrays, not pointers, and so sizeof returns the size of the array, not the size of a pointer.

20. What is the result of executing free(NULL)?
This is my favorite question of the bunch, because it's so poorly known despite being so simple. I've encountered many smart people who I'd have no qualms describing as "C experts" who get this wrong.

The answer is: nothing. free(NULL) is defined as a no-op by the standard. Every time you see an if guard on a free call, it can be removed:

    if(ptr != NULL) // unnecessary!
        free(ptr);

Of course, there's nothing wrong with leaving the check for clarity if you prefer.

Virtually everyone I've talked to about this thought that the check is necessary and free(NULL) would crash or otherwise invoke undefined behavior. I'm not sure where that idea came from, but it's fascinating.

21. What is the result of executing realloc(NULL, sizeof(int))?
realloc is, conceptually, a malloc, memcpy, and a free. Much like free(NULL) is a no-op, realloc on NULL just allocates some fresh memory. The above is exactly equivalent to malloc(sizeof(int)). This can be convenient when you have a dynamically-resized buffer. You can start the pointer out as NULL and use the same realloc for the initial allocation as for resizes.

Conclusion
That's it for today. I hope you know a little more about C than you did before.

Friday Q&A is driven by reader suggestions. If you have something you'd like to see covered here, send it in!

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

On 17. Another thing that the compiler can do is to dereference the memory at location 0; thus working like any other memory pointer. I've only seen this on a few very small embedded processors. (small as in less than 1K of RAM.)

On 20. I've seen a at least one implementation where calling free(NULL) does in fact fail in strange ways. The detail here is to make sure you know how well your tools follow the standard. (it was a bug that did get fixed; but not before I had to ship, so if() checks abounded.)
The reason for #1 is the same as for many oddities about C types: the type system started as a hack to make B programs portable to platforms where types were needed. (In B, every value was a word that could be treated as an integer or a pointer.) The goal was to require minimal changes for existing B code, hence the various "implicit int" rules and free casting between integer and pointer types. Character literals are ints because character literals existed in B.

This is also why "auto" exists: in B, in the absence of types, every variable had to be introduced with a storage qualifier.

Regarding free(), I’ve always felt the misconception might be more common among old Mac programmers because DisposePtr() wasn’t null-safe.
Oh, yeah: B was designed for 36-bit systems, hence the “convenient” syntax for octal numbers that plagues us to this day.
You're the first person to explain the existence of auto to me, thanks very much for that. I knew it was there but never could figure out why.
Tadpol: It's worth noting that the numeric value of a null pointer does not, in fact, have to be 0 -- the only rule is that when an integral 0 is converted to a pointer, you get a null pointer. That is, this code might not leave 0 in u.i:

union {
  intptr_t i;
  void *p;
} u;
u.p = 0; // or NULL
printf("%" PRIdPTR "\n", u.i);


This is one way in which a system that has to be able to access address 0 can still have a null pointer representation that's distinct from all valid pointers. On Apple systems, though, the representation of a null pointer is 0, presumably because most CPUs have instructions that special-case 0 anyway.
Re 20: the same also applies for delete in C++... I removed a bunch of 'if(ptr) delete ptr' from a project at work recently. It's just unnecessary clutter.

And if you overload new in C++, it's easy enough to remove the 'if(!ptr)' checks as well -- just have the new operator call abort() instead, since if you are really out of memory on today's machines, that's probably one of the sanest ways to handle it. (On iOS, I would do something different, this was more for desktop machines).
Regarding free(NULL), I remember using a C compiler for the Macintosh in the late 80s/ early 90s (I'm pretty sure it was ThinkC but I could be wrong) that did very strange things if you freed a null pointer...
On #1, at least on Mac C compilers, constants like 'ABCD', 'XYZ' and 'MN' have always been ints, too, and 4-character literals were extremely common in the Carbon and Classic days.
On 19: It turns out that the parens are not necessary in sizeof("abcde"), sizeof "abcde" works just as well.

sizeof is a unary operator, the parens are only necessary if the argument is a type.
Jordan: Please note that using a union like you propose is actually another case of undefined behavior according to the C standard.

You can't write one element and then read another. This is a common misconception of the use of unions. The safe way to do this would be to use memcpy.
Nikolai: Yes, of course you're right. Both Clang and GCC will DWYM in this case, but you would never want to use this in production code. reinterpret_cast in C++ might have been a more compliant-choice. (IIRC, behavior there is merely implementation-defined and not fully undefined.)
With regards to free(NULL) being commonly thought to crash or do something else undefined, I wonder if this might be because calling free() twice on a non-NULL pointer causes undefined behaviour. Since free() doesn't NULL out the pointer passed in, I wonder how many bugs might be out there because someone wrote if (ptr) free(ptr) incorrectly thinking they were defending against double free().
Nikolai/Jordan: I believe the union type punning behavior was defined in a C99 update, and in C11 proper.

The issue was reported in Defect Report #283 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm), and C99 Technical Corrigendum 3 was issued with the following footnote:

"If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation. "

C99 TC3: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1235.pdf
(17) On an ARM Cortex-M3 processor, I use *(uint32_t *) 0 to access the initial stack pointer. This is almost the same as *(char *) NULL. It's completely legal with the IAR compiler.
Rainer: char constants like that are ints with the value implementation-defined (meaning that the Standard isn't going to tell you how it's handled, but the documentation for the compiler has to). MacOS files (before Mac OSX) had two four-character literals assigned as file and document type (I don't remember the details for applications), which made this very convenient.
@Ricky Bennett
That looks weird. Does dereferencing 0 on ARM mean dereferencing the beginning of stack?

How does it compare to this?

int main() { // assuming constant size for argv[0]
  int a;
  int stack_addr = (int)&a;

  // ...
}
Looks like this is the Microsoft C version, and not ANSII. There are some differences in Microsoft and ANSII.

For instance, a char is never an int. C will convert it dynamically when used, but it is 8bits and the int is system dependent in C.

Ditto for short. Short is system dependent. It causes some confusion when compilers are implemented. ANSII tried to straighten some of these discrepancies out, but there are issues that make it very difficult (legacy code, differing standards, etc.)

Char also becomes more convoluted when using modern syntax, due to the extended charactersets.
I haven't done much programming in recent years, but I realize the issues that will arise. ASCII, which is the 8 bit set is not international.

BOOL has the value 0f 0 or -1 and is an int, but only accepts these values. Basically 0 or non zero.

Signed to unsigned comparison is also implementation dependent. Best to avoid this. Your compiler should emit a warning.

There are several other areas that are implementation dependent. For example, free(x) may result in garbage collection immediately, or set a marker which will ultimately result in garbage collection on most operating systems. The difference means that on some systems the data will still be present although you have nulled the pointer. Using free a second time on that pointer will result in the same as free(null). The result will depend a lot on the compiler, the library and the OS. There are lots of corners in any language that can drive you bonkers, and C because it is so close to the machine has greater power to wreck havoc when misused.
OldETC: The quiz is on ANSI/ISO standard C. It doesn't appear to have anything in it from C99. Any differences between what Microsoft C does and the standard requires are Microsoft's fault.

A character constant is indeed an int in C. It's a char in C++, but not in C. A char must have at least 8 bits and is permitted to have more (although there are very few systems any more where that matters).

short and int are indeed implementation-defined, the idea being that C should run efficiently on various architectures. This does cause confusion, and people need to be careful when transferring data values around. I am unaware that ANSI ever did try to straighten that out, since standardization of data type sizes would make C less useful for systems programming. In C99, there are standard definitions you can #include and use for fixed-length integral types.

Char does indeed become more of a problem when going beyond 7-bit ASCII. You'd have no problem storing a UTF-8 string in an array of char, but the standard C library functions won't handle it well. There's the wchar_t type for wide characters, but that's not normally 32 bits, and a Unicode character won't necessarily fit in 16 bits. C's string handling is sufficiently primitive that it wouldn't be hard to make it work just as well with UTF-8.

BOOL is not a type in C. It isn't possible to restrict the values that can be assigned to an int.

Signed to unsigned comparison is perfectly well defined: convert to unsigned. It can be confusing, and can easily cause surprising results, so it isn't a good idea.

It really doesn't matter what free(x) does; all that's necessary is to reclaim the memory somehow at some time. Doing a double free is undefined behavior, and so is using data after it has been freed. (If you want to make sure it isn't used later, write zeros to it and then free it.) Assigning NULL to a pointer when you free it means that double frees are harmless, will bite you more certainly if you try to use that pointer again, and can be a good idea.

C does have a lot of dark corners in the language, and it really helps to know what the limits of defined behavior are.
1. My guess is that to allow all posible 256 values and still have space for an extra one meaning "nothing there" the type had to be int, otherwise the EOF symbol would have had to hijack an otherwise very valid character slot.
a) I got them all right. I suppose it helped to have been a member of X3J11.

b) I'm impressed by the quality of the comments, with the exception of the one from OldETC, which is largely wrong, starting with the first sentence (there is no ANSII, and these answers apply to standard C and to Microsoft's implementation of it.
I'm always impressed with the quality of the comments on my blog. Don't know if I had anything to do with it, but they're almost always great. One of my favorite things about it.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.