Movatterモバイル変換

Posted at 2012-07-06 15:08 |RSS feed (Full text feed) |Blog Index
Next article:Friday Q&A 2012-07-27: Let's Build Tagged Pointers
Previous article:Friday Q&A 2012-06-22: Objective-C Literals
Tags:fridayqna letsbuild objectivec

Friday Q&A 2012-07-06: Let's Build NSNumber

byMike Ash

NSNumber is a deceptively simple class with some interesting implementation details. In today's edition of Friday Q&A, I'll explore how to build a class that works likeNSNumber, a topic suggested by Jay Tamboli.

Overview
Like many (but not all) object-oriented languages, Objective-C has a divide between objects and non-objects. Objects respond to messages, can be queried at runtime without knowing their exact type, placed in collections, compared for equality, and share a common set of behavior. Non-objects are largely compile-time constructs, with all of their type information essentially gone at runtime. In Objective-C, these non-objects are everything that comes from C, from the integer42 to the string"Hello, world" to complicated structs.

Boxing is the process of placing these non-objects into an object so that they can be used like other objects, typically so that they can be placed in a collection.NSNumber is the Cocoa class used to box C numbers. You can't have anNSArray ofint, but you can have anNSArray ofNSNumber.NSNumber shows up a lot in Cocoa programming. Just about any place a Cocoa collection is used to store a number,NSNumber is there. Among many other places,NSNumber objects are whatNSUserDefaults stores and retrieves when you ask it to save a number.

Interface
Our surrogateNSNumber will be calledMANumber. Unlike the Cocoa version, which is a subclass of the more general boxing classNSValue, this one will directly subclassNSObject:

@interfaceMANumber :NSObject

There are alot of methods for initializing an instance. There's one initializer for each C numeric type, plus some extra ones for types specific to Cocoa:

-(id)initWithChar:(char)value;-(id)initWithUnsignedChar:(unsignedchar)value;-(id)initWithShort:(short)value;-(id)initWithUnsignedShort:(unsignedshort)value;-(id)initWithInt:(int)value;-(id)initWithUnsignedInt:(unsignedint)value;-(id)initWithLong:(long)value;-(id)initWithUnsignedLong:(unsignedlong)value;-(id)initWithLongLong:(longlong)value;-(id)initWithUnsignedLongLong:(unsignedlonglong)value;-(id)initWithFloat:(float)value;-(id)initWithDouble:(double)value;-(id)initWithBool:(BOOL)value;-(id)initWithInteger:(NSInteger)value;-(id)initWithUnsignedInteger:(NSUInteger)value;

There are also getters for these types:

-(char)charValue;-(unsignedchar)unsignedCharValue;-(short)shortValue;-(unsignedshort)unsignedShortValue;-(int)intValue;-(unsignedint)unsignedIntValue;-(long)longValue;-(unsignedlong)unsignedLongValue;-(longlong)longLongValue;-(unsignedlonglong)unsignedLongLongValue;-(float)floatValue;-(double)doubleValue;-(BOOL)boolValue;-(NSInteger)integerValue;-(NSUInteger)unsignedIntegerValue;

Note that any of these getters works no matter which initializer was used.MANumber will have to perform the appropriate conversions.

Finally, there are a few other methods for string conversion and comparison:

-(NSString*)stringValue;-(NSComparisonResult)compare:(MANumber*)otherNumber;-(BOOL)isEqualToNumber:(MANumber*)number;-(NSString*)descriptionWithLocale:(id)locale;

Implementation Strategy
MANumber will use aunion to store the underlying numeric value.union is a rarely-seen feature of standard C. It looks just like astruct, but works differently. Astruct stores many values together in one spot. Aunion does this as well, but you can only access the last one you stored. When you store a value in aunion, the value of all other fields becomes undefined.

In typical unhelpful-but-efficient C fashion, the compiler doesn't enforce that rule, nor does it help you follow it by, say, letting you query which field was the last one set. You have to keep track of this yourself, typically with an accompanyingenum.

Theunion could be used to hold every C numeric type, with a bigenum to say which one is in use. However, this is unnecessarily complex. All we really need is three fields: the largest possible integer type, the largest possible unsigned integer type, and the largest possible floating-point type. From the types we have to handle, these arelong long,unsigned long long, anddouble. Everything else can be converted to and from those without loss.

This implementation does not precisely match that ofNSNumber, which keeps track of the specific type used to create it. However, using these three types is plenty close enough, and eliminates a lot of extra repetitive code. The fact thatNSNumber precisely tracks the original type isn't visible most of the time, and only shows up when using a method like-descriptionWithLocale: or-objCType.

Storage
Here are the instance variables:

@implementationMANumber{enum{INT,UINT,DOUBLE}_type;union{longlongi;unsignedlonglongu;doubled;}_value;}

The_type variable holds an anonymousenum saying whether the value is anINT (long long),UINT (unsigned long long), orDOUBLE (guess). The_value variable then holds the actual number, using aunion so that it only ends up storing one.

The code will set_type and the corresponding_value in the initializers. The getters can then check the_type and extract the value accordingly.

Initializers
There's a ton of boilerplate to deal with all of the different types. All of the signed integer types just call through toinitWithLongLong:, and the unsigned types call through toinitWithUnsignedLongLong:

-(id)initWithChar:(char)value{return[selfinitWithLongLong:value];}-(id)initWithUnsignedChar:(unsignedchar)value{return[selfinitWithUnsignedLongLong:value];}-(id)initWithShort:(short)value{return[selfinitWithLongLong:value];}-(id)initWithUnsignedShort:(unsignedshort)value{return[selfinitWithUnsignedLongLong:value];}-(id)initWithInt:(int)value{return[selfinitWithLongLong:value];}-(id)initWithUnsignedInt:(unsignedint)value{return[selfinitWithUnsignedLongLong:value];}-(id)initWithLong:(long)value{return[selfinitWithLongLong:value];}-(id)initWithUnsignedLong:(unsignedlong)value{return[selfinitWithUnsignedLongLong:value];}-(id)initWithBool:(BOOL)value{return[selfinitWithLongLong:value];}-(id)initWithInteger:(NSInteger)value{return[selfinitWithLongLong:value];}-(id)initWithUnsignedInteger:(NSUInteger)value{return[selfinitWithUnsignedLongLong:value];}

Those initialisers then simply set the_type,_value, and returnself. (Note that I'm leaving out the traditional call to[super init] for brevity, as it's not strictly necessary when your superclass isNSObject, although still a good idea.)

-(id)initWithLongLong:(longlong)value{_type=INT;_value.i=value;returnself;}-(id)initWithUnsignedLongLong:(unsignedlonglong)value{_type=UINT;_value.u=value;returnself;}

The floating-point initializers are similar. The one forfloat just calls through toinitWithDouble:, and that one just sets_type and_value appropriately:

-(id)initWithFloat:(float)value{return[selfinitWithDouble:value];}-(id)initWithDouble:(double)value{_type=DOUBLE;_value.d=value;returnself;}

Getters
The getters are even more similar then the initializers. They all check the_type, then return the appropriate field of_value. The compiler will handle the final conversion from the active field of_value to the requested return type.

Since these methods all contain the same code, this is a perfect candidate for a macro to encapsulate the identical bits. Here's a macro that checks_type and then returns the corresponding field of_value:

#defineRETURN()do{ \if(_type==INT) \return_value.i; \elseif(_type==UINT) \return_value.u; \else \return_value.d; \}while(0)

With that macro, the getters pretty much write themselves:

-(char)charValue{RETURN();}-(unsignedchar)unsignedCharValue{RETURN();}-(short)shortValue{RETURN();}-(unsignedshort)unsignedShortValue{RETURN();}-(int)intValue{RETURN();}-(unsignedint)unsignedIntValue{RETURN();}-(long)longValue{RETURN();}-(unsignedlong)unsignedLongValue{RETURN();}-(longlong)longLongValue{RETURN();}-(unsignedlonglong)unsignedLongLongValue{RETURN();}-(float)floatValue{RETURN();}-(double)doubleValue{RETURN();}-(NSInteger)integerValue{RETURN();}-(NSUInteger)unsignedIntegerValue{RETURN();}

That's a lot of boring and ugly code.

The one exception to this uniform sea of macro invocations is the-boolValue method. SinceBOOL pretends to be a real boolean value, this method should always returnYES for any non-zero value stored in theMANumber object. The compiler's built-in conversion won't do this. For example, the integer256 will return NO if converted to aBOOL, sinceBOOL is just asigned char, which is an 8-bit integer. Because of that,-boolValue replicates the macro logic, but with an explicit check for zero:

-(BOOL)boolValue{if(_type==INT)return_value.i!=0;elseif(_type==UINT)return_value.u!=0;elsereturn_value.d!=0;}

String Conversion
There are two string conversion methods:-stringValue and-descriptionWithLocale:.-stringValue simply calls-descriptionWithLocale: with anil parameter:

-(NSString*)stringValue{return[selfdescriptionWithLocale:nil];}

-descriptionWithLocale: then uses-[NSString initWithFormat:locale:] to build the string. There's no fancy way to deal with the different numeric types here, so it simply checks_type and uses a different format string for each case:

-(NSString*)descriptionWithLocale:(id)locale{if(_type==INT)return[[NSStringalloc]initWithFormat:@"%lld"locale:locale,_value.i];elseif(_type==UINT)return[[NSStringalloc]initWithFormat:@"%llu"locale:locale,_value.u];elsereturn[[NSStringalloc]initWithFormat:@"%f"locale:locale,_value.d];}

Note that I'm using ARC, which is why there are noautorelease calls here.

Comparison
The comparison methods get interesting, because they need to work betweenMANumber objects of different types. For example, thedouble value-1.1 should compare less than the unsigned integer value99999.

There are nine permutations of the types, so nine different cases to handle. This can be reduced to only six cases by enforcing an order. If the two objects have typesINT andUINT, the two cases for that can be reduced to one by only handling the case whereself isINT and the other object isUINT, and swapping the two objects if they show up the other way around.

To help with comparison between the different types, I wrote a simple macro that takes two numbers and returns the appropriateNSComparisonResult. All it does is take two arguments, save them into temporary variables to avoid multiple evaluation, then return the appropriate constant depending on how they're ordered. There's also a bit of floating-point trickery here. With floating-point numbers,NAN (not a number) never compares equal to anything, and all comparisons with it are false. SinceNSComparisonResult has no way to represent an ordering which means, "this number is not equal to anything, not even itself," I arbitrarily decide to makeNAN equal to itself and less than any other number, for the purposes ofMANumber comparison:

#defineCOMPARE(a,b)do{ \__typeof__(a)__a_local=a; \__typeof__(b)__b_local=b; \BOOL__a_isnan=isnan(__a_local); \BOOL__b_isnan=isnan(__b_local); \if(__a_isnan&&__b_isnan) \returnNSOrderedSame; \elseif(__a_isnan) \returnNSOrderedAscending; \elseif(__b_isnan) \returnNSOrderedDescending; \elseif(__a_local>__b_local) \returnNSOrderedDescending; \elseif(__a_local<__b_local) \returnNSOrderedAscending; \else \returnNSOrderedSame; \}while(0)

The first thing the comparison method itself does is extract the types of the two objects to compare:

-(NSComparisonResult)compare:(MANumber*)otherNumber{intselfType=_type;intotherType=otherNumber->_type;

If the two types aren't in order, we reverse the comparison by callingcompare: again with the arguments reversed, and returning the inverse of the result. SinceNSComparisonResult is just-1,0, or1, we can invert its meaning by negating it:

if(selfType>otherType)return-[otherNumbercompare:self];

Now we're left with sorted types. There are six cases. IfselfType is INT, thenotherType could be anything. IfselfType isUINT, thenotherType can only beUINT orDOUBLE. IfselfType isDOUBLE, thenotherType must beDOUBLE as well.

Let's look at the cases whereselfType isINT. If both values areINT, the code is easy:

if(selfType==INT){if(otherType==INT){COMPARE([selflongLongValue],[otherNumberlongLongValue]);}

IfotherType isUINT, there's a bit of extra work. Directly comparing with[otherNumber unsignedLongLongValue] won't work. C will promote[self longLongValue] to unsigned before the comparison, turning negative numbers into positive numbers and wrecking the comparison.-1 will compare greater than1 because of this. To prevent that, we make a special check for negative numbers, then compare their unsigned values if both are known to be positive:

elseif(otherType==UINT){if([selflongLongValue]<0)returnNSOrderedAscending;elseCOMPARE([selfunsignedLongLongValue],[otherNumberunsignedLongLongValue]);}

Next comes the case forDOUBLE. This gets pretty complicated, because floating-point numbers work fairly differently from integers. There are several different subcases here, which I'll take one by one. However, the first thing it does is extract thedoubleValue from the other number to make it more convenient to work with:

else{doubleother=[otherNumberdoubleValue];

double can hold a much larger range thanlong long. The first subcase is to figure out the largest possible number along long can hold, and see ifother is beyond it. If it is, it's obviously larger thanself, since self is along long.

The built-in macroLLONG_MAX gives us the largest number along long can hold. However, we can't directly convert this to adouble. That number is equal to 2⁶³-1, which can't be represented in adouble. Due to the internal format ofdouble, it can only represent even numbers when it gets beyond 2⁵⁴. To perform the comparison accurately, we calculate one number beyond the largestlong long, careful to use anunsigned one when adding, and compare against that:

doublelongLongMaxPlusOne=LLONG_MAX+1ULL;if(other>=longLongMaxPlusOne)returnNSOrderedAscending;

We also check in the negative direction. This is a bit easier, as the smallest possiblelong long can be directly represented in adouble:

if(other<LLONG_MIN)returnNSOrderedDescending;

If we're still running at this point, then thedouble is within the range of along long and they need to be compared directly. However, we can't just whip out the> operator, because there are a lot ofdoubles that can't be represented inlong long (e.g.1.5), and there are a lot oflong longs that can't be represented as adouble (e.g. any odd number above a threshold, as mentioned above).

Beyond a certain threshold,double can only represent integer values, as the magnitude of the value exceeds the precision of the representation. When beyond that threshold, and below the maximum possiblelong long, thedouble can safely be converted to along long with no loss of precision. The two values can then be compared aslong longs. Below that threshold,double can represent any integer, and so thelong long can safely be converted to adouble with no loss of precision, and the two values compared asdoubles.

The location of that threshold is actually fairly easy to figure out. C provides a macro,DBL_MANT_DIG, which gives the precision of thedouble type. By raising that to a power of two (sincedouble is a binary representation), we get the threshold:

doublepureIntegerStart=1LL<<DBL_MANT_DIG;

Then we simply compare based on whereother lies relative to that. Note that the threshold applies equally for negative numbers, so we must check it in both directions:

if(other>=pureIntegerStart||other<=-pureIntegerStart)COMPARE([selflongLongValue],(longlong)other);elseCOMPARE([selfdoubleValue],other);}}

Next up comes the case whereselfType isUINT. As before, whenotherType is alsoUINT, the code is easy:

elseif(selfType==UINT){if(otherType==UINT){COMPARE([selfunsignedLongLongValue],[otherNumberunsignedLongLongValue]);}

Note that we don't have to handleINT, due to the type sorting performed above. We move on toDOUBLE, which is once again complicated. As before, we fetch the value ofotherNumber into a local variable:

else{doubleother=[otherNumberdoubleValue];

The first thing we do is see ifother is negative. If it is, then we know the order, asself is unsigned (and thus either zero or positive):

if(other<0)returnNSOrderedDescending;

Otherwise, we do the same basic threshold calculations as before. This time we have to compareother against the largest possibleunsigned long long. Doing this is a bit tricky. Just like withlong long, we have to add1 to get a number that works as adouble. However, we can't represent anything greater than the largest possibleunsigned long long as an integer, sinceunsigned long long is the largest integer type we have. Instead, we calculate(LLONG_MAX + 1) * 2, which gives one greater than the largestunsigned long long, carefully doing so with all the right types to avoid overflow or imprecision:

doubleunsignedLongLongMaxPlusOne=(double)(LLONG_MAX+1ULL)*2.0;if(other>=unsignedLongLongMaxPlusOne)returnNSOrderedAscending;

At this point, we know that both numbers are within each type's range, and so we use the samepureIntegerStart strategy as before to compare them directly:

doublepureIntegerStart=1LL<<DBL_MANT_DIG;if(other>=pureIntegerStart)COMPARE([selfunsignedLongLongValue],(unsignedlonglong)other);elseCOMPARE([selfdoubleValue],other);}}

All that's left now is theDOUBLE case, which is actually really easy. Due to the type sorting, the only possible case here is when they're bothDOUBLE, so we can just directly compare them:

else{COMPARE([selfdoubleValue],[otherNumberdoubleValue]);}}

Now thatcompare: implemented, equality checking is trivial:

-(BOOL)isEqualToNumber:(MANumber*)number{return[selfcompare:number]==NSOrderedSame;}

We also wantisEqual: fromNSObject. This can simply check the class of the other object, then leverageisEqualToNumber:

-(BOOL)isEqual:(id)other{if(![otherisKindOfClass:[MANumberclass]])returnNO;return[selfisEqualToNumber:other];}

Finally, since we overrideisEqual:, we must also overridehash. The implementation ofhash gets mildly tricky due to the semantics of floating-point numbers. For non-floats, we can simply return the straight integer value as the hash:

-(NSUInteger)hash{if(_type!=DOUBLE)return[selfunsignedIntegerValue];

For floats that are integer values, we want to do the same thing. Since ourisEqual: considers an integer-valuedDOUBLE equal to anINT orUINT of the same value, wemust return the same hash as theINT andUINT equivalent. To accomplish this, we check to see if theDOUBLE value is actually an integer, and return the integer value if so:

if(_value.d==floor(_value.d))return[selfunsignedIntegerValue];

Beyond this, we have non-integer values. The ultimate goal is to simply return the bit pattern of thedouble, which will give a nice hash. However, this only works for numbers where bit pattern equality impliesisEqual:. This isnot true for alldoubles. First isNAN, which we made compare equal to itself, but which has many different possible bit representations. To handle that, we check forNAN explicitly and return a constant hash for it:

if(isnan(_value.d))return0;

The other special case is a bit weirder. IEEE 754 floats (the kind used by just about any modern CPU) have two possible values for zero: positive and negative. These are typically indistinguishable, as they compare equal and produce the same results for most calculations. However, they have different bit patterns, so we have to special-case them. I take advantage of the fact that negative zero compares equal to positive zero to make a simple check and return a constant hash for both zeroes:

if(_value.d==0.0)return0;

Having ruled out all the special cases, if the code reaches this point then the number must be one where numerical equality is the same as bit pattern equality. Thus we simply return the bit pattern for the hash. We do this by returning theu field of theunion:

return_value.u;}

But wait! Previously I said that you're not allowed to access any field in aunion besides the one that was last set, so this is clearly not allowed. While technically correct according to the language spec, C compilers have generally settled on allowing it and simply reinterpreting the existing value. This code takes thedouble that's stored in theunion and reinterprets its bits as anunsigned long long, which is exactly what we want. Technically this relies on undefined behavior, but it's officially blessed by the compilers we're actually using.

Conclusion
NSNumber is a conceptually simple class which mainly exists so that we can stuff numeric values into Cocoa collections, but its flexibility implies a fair amount of underlying complication. By implementing a workalikeMANumber class, we can see what kinds of thingsNSNumber has to be doing on the inside. Automatic conversion to different integer types requires a fair amount of boilerplate code, and reliable conversion between number of different types can get pretty complicated.

That's it for today. Come back next time for yet another Friday Q&A. As always, Friday Q&A is driven by reader suggestions, so if you have a topic you'd like to see covered, pleasesend it in!

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle.Click here for more information.

Comments:

Jesperat2012-07-06 16:21:10:

Note that I'm leaving out the traditional call to [super init] for brevity, as it's not strictly necessary when your subclass is NSObject

I would love to see a Friday Q&A on makingNSObject be a subclass of yours, but I'm not sure that's what you meant here.

mikeashat2012-07-06 16:59:19:

Oops, got it backwards. That should be, when your superclass is NSObject. Fixed now, thanks for pointing it out.

You can make NSObject a subclass, of course, by calling class_setSuperclass. Not recommended, though.

Chris Lat2012-07-06 17:50:51:

Great article! Occasionally I do rely on NSNumber preserving the original type. It is necessary for generating JSON. For example, it is not acceptable to generate "1" instead of "true".

Charlie Monroeat2012-07-07 09:04:52:

Thanks for the insight on NSNumber.

I'd really like to see how to store the number in a tagged pointer, though (kind of hoped you'd touch that matter).

Charlie

mikeashat2012-07-07 11:23:33:

Chris L: Good point on that. I'm not sure if there's even a public way to obtain that information nicely, though. The documentation explicitly states, "Note that number objects do not necessarily preserve the type they are created with." And yet, preserving things like BOOL and float/int are important for plist and, as you point out, JSON serialization. It's a bit of an odd situation.

Charlie Monroe: I'm thinking of doing that as a followup, so you may get it in a couple of weeks!

Dhavalat2012-07-26 03:12:15:

Mike, cool article!

Apple reference docs mention that NSNumber is a class cluster. It'll be helpful (at least to me) to know how that might work.

PFat2013-02-12 21:29:04:

Kind of necro-threading at this point, but one technicality:

Structures allow you to group data elements, but unions allow you to store them in THE SAME place.

When you write to one value of a union it does NOT invaliate the other portion, it just overwrites it. One example of this is a common union used for embedded software:

union {
            u32 myWord;
            char[4] myBytes;
        } someData;

Now you can write 4 individual bytes an read them out as a word, or vice versa. This is particularly nice if you are reading a bitstream that has endianness inconsistencies

mikeashat2013-02-21 15:07:03:

According to the language, it's only legal to access the field of a union that was last stored. Many popular implementations allow it as you say, of course.

diggaat2016-01-14 21:38:58:

Even older necro, but as someone who type-puns a lot I had to repond that accessing a different member of a union then the last one asssigned is allowed. The one caveat is that the value may be a trap representation.

With C99 Technical Corrigenda:http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm
And confirmed in C11:http://www.iso-9899.info/n1570.html

Quote from the standard:

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ``type punning''). This might be a trap representation.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Code syntax highlighting thanks toPygments.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
	Formatting:`<i> <b> <blockquote> <code>`.
	NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.