Welcome back to a late edition of Friday Q&A. WWDC pushed the schedule back one week, but it's finally time for another one. This week, I'm going to discuss the implementation of equality and hashing in Cocoa, a topic suggested by a reader.
Equality
Object equality is a fundamental concept that gets used all over the place. In Cocoa, it's implemented with theisEqual: method. Something as simple as[array indexOfObject:] will use it, so it's important that your objects support it.
It's so important that Cocoa actually gives us a default implementation of it onNSObject. The default implementation just compares pointers. In other words, an object is only equal to itself, and is never equal to another object. The implementation is functionally identical to:
-(BOOL)isEqual:(id)other{returnself==other;}
NSView is never considered equal to anotherNSView, only to itself. ForNSView, and many other classes which behave that way, the default implementation is enough. That's good news, because it means that if your class has that same equality semantic, you don't have to do anything, and get the correct behavior for free.Implementing Custom Equality
Sometimes you need a deeper implementation of equality. It's common for objects, typically what you might refer to as a "value object", to be distinct from another object but be logically equal to it. For example:
// use mutable strings because that guarantees distinct objectsNSMutableString*s1=[NSMutableStringstringWithString:@"Hello, world"];NSMutableString*s2=[NSMutableStringstringWithFormat:@"%@, %@",@"Hello",@"world"];BOOLequal=[s1isEqual:s2];// gives you YES!
NSMutableString implements this for you in this case. But what if you have a custom object that you want to be able to do the same thing?MyClass*c1=...;MyClass*c2=...;BOOLequal=[c1isEqual:c2];
isEqual:.Testing for equality is fairly straightforward most of the time. Gather up the relevant properties of your class, and test them all for equality. If any of them are not equal, then returnNO. Otherwise, returnYES.
One subtle point with this is that the class of your object is an important property to test as well. It's perfectly valid to test aMyClass for equality with anNSString, but that comparison should never returnYES (unlessMyClass is a subclass ofNSString, of course).
A somewhat less subtle point is to ensure that you only test properties that are actually important to equality. Things like caches that do not influence your object's externally-visible value should not be tested.
Let's say your class looks like this:
@interfaceMyClass :NSObject{int_length;char*_data;NSString*_name;NSMutableDictionary*_cache;}
-(BOOL)isEqual:(id)other{return([otherisKindOfClass:[MyClassclass]]&&[otherlength]==_length&&memcmp([otherdata],_data,_length)==0&&[[othername]isEqual:_name])// note: no comparison of _cache}
NSDictionary andNSSet. They allow fast lookups of objects no matter how many objects you put in the container.If you're familiar with how hash tables work, you may want to skip the next paragraph or two.
A hash table is basically a big array with special indexing. Objects are placed into an array with an index that corresponds to their hash. The hash is essentially a pseudorandom number generated from the object's properties. The idea is to make the index random enough to make it unlikely for two objects to have the same hash, but have it be fully reproducible. When an object is inserted, the hash is used to determine where it goes. When an object is looked up, its hash is used to determine where to look.
In more formal terms, the hash of an object is defined such that two objects have an identical hash if they are equal. Note that the reverse is not true, and can't be: two objects can have an identical hash and not be equal. You want to try to avoid this as much as possible, because when two unequal objects have the same hash (called acollision) then the hash table has to take special measures to handle this, which is slow. However, it's provably impossible to avoid it completely.
In Cocoa, hashing is implemented with thehash method, which has this signature:
-(NSUInteger)hash;
NSObject gives you a default implementation that just uses your object's identity. Roughly speaking, it does this:-(NSUInteger)hash{return(NSUInteger)self;}
self. And just as with equality, if object identity equality is all you need, then the default implementation will do fine for you.Implementing Custom Hashing
Because of the semantics ofhash, if you overrideisEqual: then youmust overridehash. If you don't, then you risk having two objects which are equal but which don't have the same hash. If you use these objects in a dictionary, set, or something else which uses a hash table, then hilarity will ensue.
Because the definition of the object's hash follows equality so closely, the implementation ofhash likewise closely follows the implementation ofisEqual:.
An exception to this is that there's no need to include your object's class in the definition ofhash. That's basically a safeguard inisEqual: to ensure the rest of the check makes sense when used with a different object. Your hash is likely to be very different from the hash of a different class simply by virtue of hashing different properties and using different math to combine them.
Generating Property Hashes
Testing properties for equality is usually straightforward, but hashing them isn't always. How you hash a property depends on what kind of object it is.
For a numeric property, the hash can simply be the numeric value.
For an object property, you can send the object thehash method, and use what it returns.
For data-like properties, you'll want to use some sort of hash algorithm to generate the hash. You can use CRC32, or even something totally overkill like MD5. Another approach, somewhat less speedy but easy to use, is to wrap the data in anNSData and ask it for its hash, essentially offloading the work onto Cocoa. In the above example, you could compute the hash of_data like so:
[[NSDatadataWithBytes:_datalength:_length]hash]
The easiest way is to simply add them together, or use the bitwise xor property. However, this can hurt your hash's uniqueness, because these operations are symmetric, meaning that the separation between different properties gets lost. As an example, consider an object which contains a first and last name, with the following hash implementation:
-(NSUInteger)hash{return[_firstNamehash]^[_lastNamehash];}
How to best combine hashes is a complicated subject without any single answer. However, any asymmetric way of combining the values is a good start. I like to use a bitwise rotation in addition to the xor to combine them:
#defineNSUINT_BIT(CHAR_BIT*sizeof(NSUInteger))#defineNSUINTROTATE(val,howmuch)((((NSUInteger)val)<<howmuch)|(((NSUInteger)val)>>(NSUINT_BIT-howmuch)))-(NSUInteger)hash{returnNSUINTROTATE([_firstNamehash],NSUINT_BIT/2)^[_lastNamehash];}
-(NSUInteger)hash{NSUIntegerdataHash=[[NSDatadataWithBytes:_datalength:_length]hash];returnNSUINTROTATE(dataHash,NSUINT_BIT/2)^[_namehash];}
A Note on Subclassing
You have to be careful when subclassing a class which implements custom equality and hashing. In particular, your subclass should not expose any new properties which equality is dependent upon. If it does, then it must not compare equal with any instances of the superclass.
To see why, consider a subclass of the first/last name class which includes a birthday, and includes that as part of its equality computation. It can't include it when comparing equality with an instance of the superclass, though, so its equality method would look like this:
-(BOOL)isEqual:(id)other{// if the superclass doesn't like it then we're not equalif(![superisEqual:other])returnNO;// if it's not an instance of the subclass, then trust the superclass// it's equal there, so we consider it equal hereif(![otherisKindOfClass:[MySubClassclass]])returnYES;// it's an instance of the subclass, the superclass properties are equal// so check the added subclass propertyreturn[[otherbirthday]isEqual:_birthday];}
A, and an instance of the subclass for "John Smith" with a birthday of 5/31/1982, which I'll callB. Because of the definition of equality above,A equalsB, andB also equals itself, which is expected.Now consider an instance of the subclass for "John Smith" with a birthday of 6/7/1994, which I'll callC.C is not equal toB, which is what we expect.C is equal toA, also expected. But now there's a problem.A equals bothB andC, butB andC do not equal each other! This breaks the standard transitivity of the equality operator, and leads to extremely unexpected results.
In general this should not be a big problem. If your subclass adds properties which influence object equality, that's probably an indication of a design problem in your hierarchy anyway. Rather than working around it with weird implementations ofisEqual:, consider redesigning your class hierarchy.
A Note on Dictionaries
If you want to use your object as a key in anNSDictionary, you need to implement hashing and equality, but you also need to implement-copyWithZone:. Techniques for doing that are beyond the scope of today's post, but you should be aware that you need to go a little bit further in that case.
Conclusion
Cocoa provides default implementations of equality and hashing which work for many objects, but if you want your objects to be considered equal even when they're distinct objects in memory, you have to do a bit of extra work. Fortunately, it's not difficult to do, and once you implement them, your class will work seamlessly with many Cocoa collection classes.
That's it for this week. Check back in two more weeks for another edition. Until then, keep sending me your requests for topics. Friday Q&A is driven by user submissions, so if you have an idea for a topic to cover here, pleasesend it in.
NSMapTable orNSHashTable as detailed in my previous post:(var) ? 1231 : 1237 for those.
- (BOOL)isMostSpecializedEqual:(NSObject *)anObject
{
if (anObject == self)
return YES;
NSObject *parent = nil;
NSObject *child = nil;
if ([self isKindOfClass:anObject.class])
{
parent = anObject;
child = self;
}
else if ([anObject isKindOfClass:self.class])
{
parent = self;
child = anObject;
}
else
return NO;
return [child isEqual:parent];
}
- (BOOL)isEqual:(id)anObject
{
if (anObject == self)
return YES;
else if ([self isKindOfClass:anObject.class])
// perform memberwise-equality checks
else
return [self SR_isMostSpecializedEqual:anObject]
}
Add your thoughts, post a comment:
Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.