mikeash.com: just this guy, you know?

Posted at 2006-02-06 00:00 | RSS feed (Full text feed) | Blog Index
Next article: Braaaaaaaains
Previous article: Bolo Ecology
Tags: bug unicode
Bug Reversal
by Mike Ash  

The most interesting bugs for me are the bugs which appear at the confluence of many different things. Module A has a small defect, which happens to expose a lack in module B, causing it to feed bad data to module C, which then behaves oddly. With a difference in any of those modules, the problem might never have appeared. I recently saw a non-bug which is a great example of this kind of interaction.

I was hanging out in Freenode #macdev as I often do, when someone came in with a problem. The problem, he said, was that NSString's stringWithFormat: method was reversing the order of its arguments. The failing line was:

[NSString stringWithFormat:@"qlex.1.%C%C.n.he.0", 0x05D1, 0x05D0]

And yet, he said, when he printed this, 0x05D0 was coming out before 0x05D1.

(For those of you not intimately familiar with this end of Cocoa, %C is a format specifier that prints a single unichar, much like %c prints a single char.)

Now, this problem makes no sense. A format string can't just arbitrarily decide to print things in the wrong order. We asked the usual questions ("Is that your actual code?" "Are there variables involved?" "What does %C do, anyway?") but nothing was apparent. Despite the apparent impossibility of things just coming out in the wrong order, he was insistent that this was in fact happening.

Finally, I tossed the suspect line into a quick test program and ran it. And, lo and behold, it really was coming out backwards! Fortunately, it was instantly apparent as to why.

If any of you are unicode experts, or happened to toss the line into a test program yourself, you probably know too. The characters in question are both Hebrew characters. Hebrew, as you might know, is written from right to left. OS X's text system is smart enough that, when it encounters multiple Hebrew characters next to each other, it will print them out right to left. The last character in the sequence appears "first" to eyes used to reading English.

So it turns out that every subsystem was working correctly. NSString was correctly constructing itself, and the text system was correctly displaying it. In fact, this wasn't even a bug at all, just a misinterpretation of what the correct result should be.
Did you enjoy this article? I'm selling a whole book full of them. It's available for iBooks and Kindle, plus a direct download in PDF and ePub format. It's also available in paper for the old-fashioned. Click here for more information.

No comments have been posted.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Web site:
Formatting: <i> <b> <blockquote> <code>. URLs are automatically hyperlinked.
Hosted at DigitalOcean.