Being able to see all stages of your work can be immensely helpful when debugging a problem. Although you can get a lot done only looking at the source code and the app's behavior, some problems benefit immensely from being able to inspect the preprocessed source code, the assembly output from the compiler, or the final binary. It can also be handy to inspect other people's binaries. Today, I want to talk about various tools you can use to inspect binaries, both your own and other people's, a topic suggested by Carlton Gibson.
The Tools
Two of the tools I'm going to discuss today,otool andnm, come with Xcode, so you probably already have them installed. The other two,otx andclass-dump, are third-party tools you'll have to obtain separately. You can getotx here:
Note that the prepackaged download is a bit old, and in particular doesn't handlex86_64 binaries, so the best way to get it is to check out the source code from Subversion and build it yourself. You can getclass-dump here:
http://www.codethecode.com/projects/class-dump/
Note that this will not be acomprehensive guide to these tools, but rather a tour of some of the more useful facilities that they offer.
Sample App
In order to have something to inspect, I put together a sample application to play with. Here is the code for that:
// clang -framework Cocoa -fobjc-arc test.m#import<Cocoa/Cocoa.h>@interfaceMyClass :NSObject{NSString*_name;int_number;}-(id)initWithName:(NSString*)namenumber:(int)number;@property(strong)NSString*name;@propertyintnumber;@end@implementationMyClass@synthesizename=_name,number=_number;-(id)initWithName:(NSString*)namenumber:(int)number{if((self=[superinit])){_name=name;_number=number;}returnself;}@endNSString*MyFunction(NSString*parameter){NSString*string2=[@"Prefix"stringByAppendingString:parameter];NSLog(@"%@",string2);returnstring2;}intmain(intargc,char**argv){@autoreleasepool{MyClass*obj=[[MyClassalloc]initWithName:@"name"number:42];NSString*string=MyFunction([objname]);NSLog(@"%@",string);return0;}}
Library Paths
A common source of frustration on the Mac is debugging dynamic linker problems when using embedded frameworks and libraries. The dynamic linker uses paths stored in the various binaries to figure out where to find libraries. Being able to inspect those binaries is extremely useful when debugging these problems.
Theotool -L command will show all of the libraries a binary links against, as well as where those libraries are expected to be located at runtime. Here's the output ofotool -L on our sample app:
$otool-La.outa.out:/System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa(compatibilityversion1.0.0,currentversion17.0.0)/usr/lib/libSystem.B.dylib(compatibilityversion1.0.0,currentversion159.1.0)/usr/lib/libobjc.A.dylib(compatibilityversion1.0.0,currentversion228.0.0)/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation(compatibilityversion150.0.0,currentversion635.15.0)/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation(compatibilityversion300.0.0,currentversion833.20.0)
We can see that it links against Cocoa,libSystem (which contains the standard C library, POSIX functions, and other common code),libobjc (the Objective-C runtime), CoreFoundation, and Foundation. We can also see exactly where each one is expected to be when this app is run, as well as the version of each library that was linked against.
This also works on libraries. Let's see whatlibSystem links against:
$otool-LlibSystem.dyliblibSystem.dylib:/usr/lib/libSystem.B.dylib(compatibilityversion1.0.0,currentversion159.1.0)/usr/lib/system/libcache.dylib(compatibilityversion1.0.0,currentversion47.0.0)/usr/lib/system/libcommonCrypto.dylib(compatibilityversion1.0.0,currentversion55010.0.0)/usr/lib/system/libcompiler_rt.dylib(compatibilityversion1.0.0,currentversion6.0.0)/usr/lib/system/libcopyfile.dylib(compatibilityversion1.0.0,currentversion85.1.0)...
That's a lot of libraries! I snipped out about twenty additional lines. We can see thatlibSystem includes alot of functionality.
Note how the first line points back tolibSystem itself. That's because each library contains a reference to its own canonical path, referred to as the "install name". For more details on what all these paths mean and how they work, see my previous article,Linking and Install Names.
Garbage Collection Support and Other Metadata
Theotool -o command shows various Objective-C metadata, including, perhaps most usefully on the Mac, the binary's garbage collection status. Let's compile the test program with garbage collection and see what the output is:
$otool-oa.outa.out:Contentsof(__DATA,__objc_classlist)section00000001000020800x10d2a52bf+0x100002250Contentsof(__DATA,__objc_classrefs)section00000001000022400x10d2a52bf+0x100002250Contentsof(__DATA,__objc_superrefs)section00000001000022480x10d2a52bf+0x100002250Contentsof(__DATA,__objc_msgrefs)sectionimp0x0sel0x100001de9allocContentsof(__DATA,__objc_imageinfo)sectionversion0flags0x2OBJC_IMAGE_SUPPORTS_GC
The flags at the bottom show that this supports garbage collection. Let's re-run it on the regular ARC version of the binary:
...flags0x0
This isn't something you need often, but it can be invaluable when you're trying to track down why a library or plugin refuses to load. This occasionally appears when using Xcode unit tests. The tests are loaded as a plugin, and garbage collection capability mismatches can cause bizarre errors there.
While we're at it, let's check out the output fromotool -l, which is a more generalized version ofotool -o that dumps a lot more info. There's a tremendous amount of output, so I won't print it all, but there are some interesting bits.
Here, we can see the binary specify its dynamic linker:
Loadcommand7cmdLC_LOAD_DYLINKERcmdsize32name/usr/lib/dyld(offset12)
It seems that if one wanted to, one could write a different dynamic linker and specify that one instead, although this would no doubt be a huge undertaking.
This section defines the minimum OS requirement:
Loadcommand9cmdLC_VERSION_MIN_MACOSXcmdsize16version10.7
Now you know what happens when you set that value in Xcode.
This one defines the full register state for when the app starts:
Loadcommand10cmdLC_UNIXTHREADcmdsize184flavorx86_THREAD_STATE64countx86_THREAD_STATE64_COUNTrax0x0000000000000000rbx0x0000000000000000rcx0x0000000000000000rdx0x0000000000000000rdi0x0000000000000000rsi0x0000000000000000rbp0x0000000000000000rsp0x0000000000000000r80x0000000000000000r90x0000000000000000r100x0000000000000000r110x0000000000000000r120x0000000000000000r130x0000000000000000r140x0000000000000000r150x0000000000000000rip0x0000000100001880rflags0x0000000000000000cs0x0000000000000000fs0x0000000000000000gs0x0000000000000000
You may have wondered, just what is the initial state of an executing program when it first starts running? Well, now you know: the registers contain these values. Or perhaps different ones, depending on what the linker put in there when you built your app.
Symbols
It's often useful to see exactly what symbols are present in a binary. Thenm command displays these. Here's the result of runningnm on the test app:
0000000100001a90t-[MyClass.cxx_destruct]00000001000018c0t-[MyClassinitWithName:number:]00000001000019c0t-[MyClassname]0000000100001a40t-[MyClassnumber]00000001000019f0t-[MyClasssetName:]0000000100001a60t-[MyClasssetNumber:]0000000100001ad0T_MyFunctionU_NSLog0000000100002350S_NXArgc0000000100002358S_NXArgv0000000100002290S_OBJC_CLASS_$_MyClassU_OBJC_CLASS_$_NSObject00000001000022e0S_OBJC_IVAR_$_MyClass._name00000001000022e8S_OBJC_IVAR_$_MyClass._number00000001000022b8S_OBJC_METACLASS_$_MyClassU_OBJC_METACLASS_$_NSObjectU___CFConstantStringClassReference0000000100002368S___progname0000000100000000A__mh_execute_headerU__objc_empty_cacheU__objc_empty_vtable0000000100002360S_environU_exit0000000100001b70T_mainU_objc_autoreleasePoolPopU_objc_autoreleasePoolPushU_objc_autoreleaseReturnValueU_objc_getPropertyU_objc_msgSendU_objc_msgSendSuper2U_objc_msgSend_fixupU_objc_releaseU_objc_retainU_objc_retainAutoreleasedReturnValueU_objc_setPropertyU_objc_storeStrong0000000100002000s_pvarsUdyld_stub_binder0000000100001880Tstart
We get an interesting mix of obvious and less-obvious symbols. Most of theMyClass symbols are methods we wrote. The-[MyClass .cxx_destruct] method is generated by the compiler. It was originally intended for calling C++ destructors (thuscxx) but now serves double duty as the method where ARC disposes of your strong instance variables.
The first column of the output is the address of the symbol, and the last column is the name, but what's the second column? This is the symbol's type. The symbols marked asT indicate symbols that are in the text section, which is the strange name given to the section which contains the program's executable code. The symbols marked ast are also in the text section, but are not visible outside the binary where they're stored. Symbols markedU are "undefined", which means that they are expected to be found in another library when the program is run. If you look at this listing, you'll see that all of theU symbols are functions and classes which come from Cocoa, the Objective-C runtime, orlibSystem. Thenm man page has a complete listing of what these type letters mean.
Examining the symbols in a library can be really useful for figuring out linker errors. For this, we don't care about symbols which are local to the library, only those which are visible to the outside world. Thenm -g flag filters out all local symbols, giving you a less cluttered list to examine when tracking down these errors.
Class Dumps
There's tons of useful information available, but some of it can be difficult to decode. When you're trying to figure out the guts of some Objective-C code, it can be nice to have all of the information presented in a more familiar manner. Fortunately, there's enough metadata stored in the binary to allow completely reconstructing an@interface of a class. Theclass-dump tool does exactly that. Let's run this tool on the test app and see what it produces (block comments omitted for brevity):
$class-dumpa.out...@interfaceMyClass :NSObject{NSString*_name;int_number;}@propertyintnumber;// @synthesize number=_number;@property(retain)NSString*name;// @synthesize name=_name;-(void).cxx_destruct;-(id)initWithName:(id)arg1number:(int)arg2;@end
There's the whole interface to our test class laid out in valid Objective-C. Of course you don't get an@implementation, which would be much more complicated. You also lose parameter names, but the descriptiveness of Objective-C method names usually makes it clear enough what the parameters are.
Dumping out your own code is not all that interesting. Runningclass-dump /System/Library/Frameworks/AppKit.framework/AppKit produces much more interesting results. Here's an amusing excerpt from the massive quantity of data that results:
@interfaceNSStopTouchingMeBox :NSBox{NSView*sibling1;NSView*sibling2;doubleoffset;}-(id)initWithFrame:(structCGRect)arg1;-(void)setSibling1:(id)arg1;-(void)setSibling2:(id)arg1;-(void)setFrameSize:(structCGSize)arg1;-(void)setOffset:(double)arg1;-(void)tile;-(void)viewDidEndLiveResize;@end
Of course, you should never ship code that uses the private classes and methods that you'll discover, but it can still be very interesting and even useful to see these internals.
Disassembly
Now we finally reach the juicy part. That which separates the men from the boys. Where few dare to tread. The howling darkness. The tangible substance of earth's supreme terror. Abandon hope all ye who enter here.
Now that we've gotten rid of all the lightweights, let's proceed.
As you probably already know, compiled Objective-C code consists of machine code. This is raw bytes that are executed directly by your computer's CPU. It's extremely tedious to manually interpret.
Between Objective-C and machine code is assembly language. This is a low level language which translates more or less directly to machine code, but is, relatively speaking, much more readable. This translation goes both ways: you can take machine code and turn it back into somewhat more readable assembly code.
I don't plan to provide a comprehensive guide on reading and interpreting assembly, but I will show how to obtain it and give a few handy pointers.
You can disassemble a binary using theotool -tV command. Thet flag tellsotool to display the text segment (where the code lives), and theV flag tellsotool to disassemble it.
The output ofotool -tV omits some useful data, however. For example, here's a snippet from the disassembly of the test app'smain function:
0000000100001bddcallq0x100001c90;symbolstubfor:_objc_msgSend0000000100001be2movq%rax,0xe8(%rbp)0000000100001be6movq0xe8(%rbp),%rax0000000100001beamovq0x0000066f(%rip),%rsi0000000100001bf1movq%rax,%rdi0000000100001bf4callq0x100001c90;symbolstubfor:_objc_msgSend
We can see two calls toobjc_msgSend, the function that's used to send Objective-C messages, but we can't really see any other information about those calls. It turns out that for just about all message sends, it's usually possible to figure out which selector was being sent as well, which is tremendously useful.
Enterotx. This is a third-party wrapper aroundotool which adds better annotations to the output, including Objective-C message send selectors. Simply runotx on a binary (after obtaining it from the site discussed at the beginning of this article) and out comes the disassembly, fully annotated. I like to add the-b flag, which tellsotx to add a blank line between logical blocks of instructions, making it much easier to see the structure of the code. Here's the above section of code disassembled byotx:
+1090000000100001bdde8ae000000callq0x100001c90-[%rdiinitWithName:number:]+1140000000100001be2488945e8movq%rax,0xe8(%rbp)+1180000000100001be6488b45e8movq0xe8(%rbp),%rax+1220000000100001bea488b356f060000movq0x0000066f(%rip),%rsiname+1290000000100001bf14889c7movq%rax,%rdi+1320000000100001bf4e897000000callq0x100001c90-[%rdiname]
Now we can see the methods in question, not just the fact that a message send is occurring. Instead of a relatively opaque disassembly like before, we can now see that this section of code simply calls the initializer and then thename accessor.
Let's check out the annotated disassembly of theinitWithName:number: method:
-[MyClassinitWithName:number:]:+000000001000018c055pushq%rbp+100000001000018c14889e5movq%rsp,%rbp+400000001000018c44883ec60subq$0x60,%rsp+800000001000018c8488d45f0leaq0xf0(%rbp),%rax+1200000001000018cc4c8d45c8leaq0xc8(%rbp),%r8+1600000001000018d048897df0movq%rdi,0xf0(%rbp)+2000000001000018d4488975e8movq%rsi,0xe8(%rbp)+2400000001000018d84889d7movq%rdx,%rdi+2700000001000018db894dc0movl%ecx,0xc0(%rbp)+3000000001000018de4c8945b8movq%r8,0xb8(%rbp)+3400000001000018e2488945b0movq%rax,0xb0(%rbp)+3800000001000018e6e8b7030000callq0x100001ca2_objc_retain+4300000001000018eb488945e0movq%rax,0xe0(%rbp)+4700000001000018ef8b4dc0movl0xc0(%rbp),%ecx+5000000001000018f2894ddcmovl%ecx,0xdc(%rbp)+5300000001000018f5488b45f0movq0xf0(%rbp),%rax+5700000001000018f948c745f000000000movq$0x00000000,0xf0(%rbp)+650000000100001901488945c8movq%rax,0xc8(%rbp)+690000000100001905488b057c090000movq0x0000097c(%rip),%rax+76000000010000190c488945d0movq%rax,0xd0(%rbp)+800000000100001910488b3531090000movq0x00000931(%rip),%rsiinit+870000000100001917488b7db8movq0xb8(%rbp),%rdi+91000000010000191be876030000callq0x100001c96-[[%rdisuper]init]+9600000001000019204889c2movq%rax,%rdx+990000000100001923488955f0movq%rdx,0xf0(%rbp)+1030000000100001927488b55b0movq0xb0(%rbp),%rdx+107000000010000192b4889c6movq%rax,%rsi+110000000010000192e4889d7movq%rdx,%rdi+1130000000100001931488945a8movq%rax,0xa8(%rbp)+1170000000100001935e87a030000callq0x100001cb4_objc_storeStrong+122000000010000193a488b45a8movq0xa8(%rbp),%rax+126000000010000193e483d00000000cmpq$0x00000000,%eax+13200000001000019440f8430000000je0x10000197areturn;+138000000010000194a488b45e0movq0xe0(%rbp),%rax+142000000010000194e488b4df0movq0xf0(%rbp),%rcx+1460000000100001952488b1587090000movq0x00000987(%rip),%rdx_name+15300000001000019594801caaddq%rcx,%rdx+156000000010000195c4889d7movq%rdx,%rdi+159000000010000195f4889c6movq%rax,%rsi+1620000000100001962e84d030000callq0x100001cb4_objc_storeStrong+1670000000100001967448b45dcmovl0xdc(%rbp),%r8d+171000000010000196b488b45f0movq0xf0(%rbp),%rax+175000000010000196f488b0d72090000movq0x00000972(%rip),%rcx_number+182000000010000197644890408movl%r8d,(%rax,%rcx)+186000000010000197a488b45f0movq0xf0(%rbp),%rax+190000000010000197e4889c7movq%rax,%rdi+1930000000100001981e81c030000callq0x100001ca2_objc_retain+1980000000100001986488945f8movq%rax,0xf8(%rbp)+202000000010000198ac745c401000000movl$0x00000001,0xc4(%rbp)+2090000000100001991488b45e0movq0xe0(%rbp),%rax+21300000001000019954889c7movq%rax,%rdi+2160000000100001998e8ff020000callq0x100001c9c_objc_release+221000000010000199d488b45f0movq0xf0(%rbp),%rax+22500000001000019a14889c7movq%rax,%rdi+22800000001000019a4e8f3020000callq0x100001c9c_objc_release+23300000001000019a9488b45f8movq0xf8(%rbp),%rax+23700000001000019ad4883c460addq$0x60,%rsp+24100000001000019b15dpopq%rbp+24200000001000019b2c3ret
There are a lot of stuff in here that would take quite a while to analyze, but simply from looking at the annotations and basic control flow, we can still see a lot. It's particularly interesting to examine code compiled with ARC, since all of the extra memory management calls inserted by ARC show up in the dump.
After the initial setup, this code callsobjc_retain. Given the context, we can deduce that this is a call to retain thename parameter, which ARC does in order to ensure that thename object remains live even if subsequent code zeroes out all other strong references to it. We can verify that it is indeed thename parameter by looking at themovq %rdx,%rdi instruction a couple of lines prior.%rdx contains the third parameter to a function, or the first explicit Objective-C method parameter, which in this case isname.%rdi contains the first parameter to a function. So this code movesname into the spot whereobjc_retain will expect to find its parameter.
Next comes the call to[super init]. The annotation is a little confusing here, but-[[%rdi super] init] means that asuper call is being made with the object stored in%rdi as the target of the call. In this case, we know that'sself, which should be the case for anysuper call.
After that, there's a call toobjc_storeStrong. This one is a little strange. After considerable investigation, it appears that this call is a redundant assignment toself after the call tosuper completes, and after the= assignment in the source code takes place. This call disappears when the code is compiled with optimizations, so it seems to be bit of ARC defensiveness that doesn't actually need to be there in this case.
Next, there's a compare and then a conditional jump. This is theif statement. If the return value isnil, then control jumps down to the third block of code, otherwise control continues with the second block of code. In the second block of code, we can see the two instance variable assignments, with the assignment to_name using a call toobjc_storeStrong that's actually useful this time. Since_number is just anint, it doesn't need any fancy calls.
Finally, we do a bit of memory management and then return. There's a redundant pair ofobjc_retain/objc_release, which again appears to be ARC defensiveness leaking out (and which also disappears under optimizations), anobjc_release on thename parameter to balance theobjc_retain at the beginning of the function, and then control is returned to the caller.
Even without understanding the meaning and purpose of every single instruction, we can still get a lot out of this dump. This can be incredibly useful for checking into possible compiler bugs or figuring out how some Cocoa method works on the inside.
Conclusion
We've taken a tour of several different facilities for inspecting executables, libraries, and plugins. Whether you're tracking down library paths, figuring out missing symbols, or diving into the disassembly of a problematic method, the developer tools (and third parties) provide ways to get a huge amount of information. There's more out there as well, and this is just a sampling of the parts I find most useful. Whenever you have a mysterious problem, don't be afraid to dive in and figure out exactly what's happening underneath the covers. Being able to inspect low-level information can often make the difference between a frustratingly difficult bug and a trivial one.
That wraps things up for today. Friday Q&A relies on you, the reader, for a steady supply of interesting subjects to discuss. If you have a topic that you'd like to see written up,send it in!
Add your thoughts, post a comment:
Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.