Swift'sclasses tend to be straightforward for most people new to the language to understand. They work pretty much like classes in any other language. Whether you've come from Objective-C or Java or Ruby, you've worked with something similar. Swift'sstructs are another matter. They look sort of likeclasses, but they're value types, and they don't do inheritance, and there's this copy-on-write thing I keep hearing about? Where do they live, anyway, and how do they work? Today, I'm going to take a close look at just howstructs get stored and manipulated in memory.
Simplestructs
To explore howstructs get stored in memory, I built a test program consisting of two files. I compiled the test program with optimizations enabled but without whole-module optimization. By building tests that make calls from one file to the other, I was able to prevent the compiler from inlining everything, providing a clearer picture of where everything gets stored and how the data is passed between functions.
To start out, I created a simplestruct with three elements:
structExampleInts{varx:Intvary:Intvarz:Int}
I created three functions that take an instance of thisstruct and return one of the fields:
funcgetX(parameter:ExampleInts)->Int{returnparameter.x}funcgetY(parameter:ExampleInts)->Int{returnparameter.y}funcgetZ(parameter:ExampleInts)->Int{returnparameter.z}
In the other file, I created an instance of thestruct and called each get function:
functestGets(){lets=ExampleInts(x:1,y:2,z:3)getX(s)getY(s)getZ(s)}
The compiler generates this code forgetX:
pushq%rbpmovq%rsp,%rbpmovq%rdi,%raxpopq%rbpretq
Consulting ourcheat sheet, we recall that arguments are passed sequentially in registersrdi,rsi,rdx,rcx,r8, andr9, and return values are placed inrax. The first two instructions here are just the function prologue, and the last two are the epilogue. The real work being done here is themovq %rdi, %rax, which takes the first parameter and returns it. Let's look atgetY:
pushq%rbpmovq%rsp,%rbpmovq%rsi,%raxpopq%rbpretq
This is almost the same, but it returns the second parameter. How aboutgetZ?
pushq%rbpmovq%rsp,%rbpmovq%rdx,%raxpopq%rbpretq
Again, almost the same, but it returns the third parameter. From this we can see that the individualstruct elements are treated as separate parameters and passed to the functions individually. Picking out an element on the receiving end is a simple matter of picking the right register.
Let's confirm this on the calling end. Here's the generated code fortestGets:
pushq%rbpmovq%rsp,%rbpmovl$1,%edimovl$2,%esimovl$3,%edxcallq__TF4main4getXFVS_11ExampleIntsSimovl$1,%edimovl$2,%esimovl$3,%edxcallq__TF4main4getYFVS_11ExampleIntsSimovl$1,%edimovl$2,%esimovl$3,%edxpopq%rbpjmp__TF4main4getZFVS_11ExampleIntsSi
We can see it constructing thestruct instance directly in the parameter registers. (Theedi,esi, andedx registers refer to the lower 32 bits of therdi,rsi, andrdx registers, respectively.) It doesn't even bother trying to save the values across the calls, but just rebuilds thestruct instance each time. Since the compiler knows exactly what the contents are, it can deviate significantly from how the Swift code is written. Note how the call togetZ is generated a bit differently from the other two. Since it's the last thing in the function, the compiler generates it as a tail call, cleaning up the local call frame and setting upgetZ to return directly to the function that calledtestGets.
Let's see what sort of code the compiler generates when it doesn't know thestruct contents. Here's a variant on this test function which gets thestruct instance from elsewhere:
functestGets2(){lets=getExampleInts()getX(s)getY(s)getZ(s)}
getExampleInts just creates thestruct instance and returns it, but it's in the other file so the compiler can't see what's going on when optimizingtestGets2. Here's that function:
funcgetExampleInts()->ExampleInts{returnExampleInts(x:1,y:2,z:3)}
What sort of code doestestGets2 generate, now that the compiler can't know what thestruct contains? Here it is:
pushq%rbpmovq%rsp,%rbppushq%r15pushq%r14pushq%rbxpushq%raxcallq__TF4main14getExampleIntsFT_VS_11ExampleIntsmovq%rax,%rbxmovq%rdx,%r14movq%rcx,%r15movq%rbx,%rdimovq%r14,%rsimovq%r15,%rdxcallq__TF4main4getXFVS_11ExampleIntsSimovq%rbx,%rdimovq%r14,%rsimovq%r15,%rdxcallq__TF4main4getYFVS_11ExampleIntsSimovq%rbx,%rdimovq%r14,%rsimovq%r15,%rdxaddq$8,%rsppopq%rbxpopq%r14popq%r15popq%rbpjmp__TF4main4getZFVS_11ExampleIntsSi
Since the compiler can't reconstitute the values at each step, it has to save them. It places the threestruct elements into the registersrbx,r14, andr15, then loads the parameter registers from those registers at each call. Those registers are saved by the caller, which means that their values are preserved across the call. And again, the compiler generates a tail call forgetZ, with some more extensive cleanup beforehand.
At the top of the function, it callsgetExampleInts and then loads values fromrax,rdx, andrcx. Apparently thestruct values are returned in those registers. Let's look atgetExampleInts to confirm:
pushq%rbpmovl$1,%edimovl$2,%esimovl$3,%edxpopq%rbpjmp__TFV4main11ExampleIntsCfMS0_FT1xSi1ySi1zSi_S0_
This places the values1,2, and3 into the argument registers, then calls thestruct's constructor. Here's the generated code for that constructor:
pushq%rbpmovq%rsp,%rbpmovq%rdx,%rcxmovq%rdi,%raxmovq%rsi,%rdxpopq%rbpretq
Sure enough, it returns the three values inrax,rdx, andrcx. Thecheat sheet says nothing about returning multiple values in multiple registers. How about theofficial PDF? It does say that two values can be returned inrax andrdx, but there's no mention of returning a third value inrcx. That's clearly what's happening, though. That's the fun of a new language: it doesn't always have to play by the old rules. If it was interoperating with C code it would have to follow the standard conventions, but Swift-to-Swift calls can invent new ones.
How aboutinout parameters? If they work like we'd do it in C, we'd expect thestruct to be laid out in memory and a pointer passed in. Here are two test functions (in two different files, of course):
functestInout(){vars=getExampleInts()totalInout(&s)}functotalInout(inoutparameter:ExampleInts)->Int{returnparameter.x+parameter.y+parameter.z}
Here's the generated code fortestInout:
pushq%rbpmovq%rsp,%rbpsubq$32,%rspcallq__TF4main14getExampleIntsFT_VS_11ExampleIntsmovq%rax,-24(%rbp)movq%rdx,-16(%rbp)movq%rcx,-8(%rbp)leaq-24(%rbp),%rdicallq__TF4main10totalInoutFRVS_11ExampleIntsSiaddq$32,%rsppopq%rbpretq
In the prologue, it creates a 32-byte stack frame. It then callsgetExampleInts, and after the call saves the resulting values into stack slots at offsets-24,-16, and-8. It then calculates a pointer to offset-24, loads that into therdi parameter register, and callstotalInout. Here's the generated code for that function:
pushq%rbpmovq%rsp,%rbpmovq(%rdi),%raxaddq8(%rdi),%raxjoLBB4_3addq16(%rdi),%raxjoLBB4_3popq%rbpretqLBB4_3:ud2
This loads the values by offset from the parameter that's passed in, totaling them up and returning the result inrax. Thejo instructions are checking for overflow. If either of theaddq instructions produce an oveflow, thejo instructions will jump down to theud2 instruction which terminates the program.
We can see that it's exactly as we expected: when passing thestruct to aninout parameter, thestruct is laid out contiguously in memory and then a pointer to it is passed in.
Bigstructs
What happens if we're dealing with a larger struct, bigger than fits comfortably in registers? Here's a teststruct with ten elements:
structTenInts{varelements=(1,2,3,4,5,6,7,8,9,10)}
Here's a get function that constructs an instance and returns it. This is placed in a separate file to avoid inlining:
funcgetHuge()->TenInts{returnTenInts()}
Here's a function that gets an element out of thisstruct:
funcgetHugeElement(parameter:TenInts)->Int{returnparameter.elements.5}
Finally, a test function that exercises these:
functestHuge(){lets=getHuge()getHugeElement(s)}
Let's look at the generated code, starting withtestHuge:
pushq%rbpmovq%rsp,%rbpsubq$160,%rspleaq-80(%rbp),%rdicallq__TF4main7getHugeFT_VS_7TenIntsmovups-80(%rbp),%xmm0movups-64(%rbp),%xmm1movups-48(%rbp),%xmm2movups-32(%rbp),%xmm3movups-16(%rbp),%xmm4movups%xmm0,-160(%rbp)movups%xmm1,-144(%rbp)movups%xmm2,-128(%rbp)movups%xmm3,-112(%rbp)movups%xmm4,-96(%rbp)leaq-160(%rbp),%rdicallq__TF4main14getHugeElementFVS_7TenIntsSiaddq$160,%rsppopq%rbpretq
This code (excluding the function prologue and epilogue) can be broken into three pieces.
The first piece calculates the address for offset-80 relative to the stack frame, and callsgetHuge, passing that address as a parameter. ThegetHuge function has no parameters in the source code, but it's common to use a hidden parameter to return largerstructs. The caller allocates storage for the return value, then passes a pointer to that storage in the hidden parameter. That appears to be what's going on here, with that allocated storage residing on the stack.
The second piece copies the returnedstruct from stack offset-80 to stack offset-160. It loads pieces of thestruct sixteen bytes at a time into fivexmm registers, then places the contents of those registers back on the stack starting at offset-160. I'm not clear why the compiler generates this copy rather than using the original value in place. I suspect the optimizer just isn't quite smart enough to realize that it doesn't need the copy.
The third piece calculates the address for stack offset-160 and then callsgetHugeElement passing that address as a parameter. In our previous experiment with a three-elementstruct, it was passed by value in registers. With this largerstruct, it's passed by pointer instead.
The generated code for the other functions confirms this: thestruct is passed in and out by pointer, and lives on the stack. Here'sgetHugeElement to start with:
pushq%rbpmovq%rsp,%rbpmovq40(%rdi),%raxpopq%rbpretq
This loads offset40 from the parameter passed in. Each element is eight bytes, so offset40 corresponds toelements.5. The function then returns this value.
Here'sgetHuge:
pushq%rbpmovq%rsp,%rbppushq%rbxsubq$88,%rspmovq%rdi,%rbxleaq-88(%rbp),%rdicallq__TFV4main7TenIntsCfMS0_FT_S0_movups-88(%rbp),%xmm0movups-72(%rbp),%xmm1movups-56(%rbp),%xmm2movups-40(%rbp),%xmm3movups-24(%rbp),%xmm4movups%xmm0,(%rbx)movups%xmm1,16(%rbx)movups%xmm2,32(%rbx)movups%xmm3,48(%rbx)movups%xmm4,64(%rbx)movq%rbx,%raxaddq$88,%rsppopq%rbxpopq%rbpretq
This looks a lot liketestHuge above: it allocates stack space, calls a function, in this case, theTenInts constructor function, then copies the return value to its final location. Here, that final location is the pointer passed in as the implicit parameter.
While we're here, let's take a look at theTenInts constructor:
pushq%rbpmovq%rsp,%rbpmovq$1,(%rdi)movq$2,8(%rdi)movq$3,16(%rdi)movq$4,24(%rdi)movq$5,32(%rdi)movq$6,40(%rdi)movq$7,48(%rdi)movq$8,56(%rdi)movq$9,64(%rdi)movq$10,72(%rdi)movq%rdi,%raxpopq%rbpretq
Like the other functions, this takes a pointer to memory for the newstruct as an implicit parameter. It then stores the values1 through10 into that memory and returns.
I came across an interesting case while building out these test cases. Here's a test function which makes three calls togetHugeElement intsead of just one:
functestThreeHuge(){lets=getHuge()getHugeElement(s)getHugeElement(s)getHugeElement(s)}
Here's the generated code:
pushq%rbpmovq%rsp,%rbppushq%r15pushq%r14pushq%r13pushq%r12pushq%rbxsubq$392,%rspleaq-120(%rbp),%rdicallq__TF4main7getHugeFT_VS_7TenIntsmovq-120(%rbp),%rbxmovq%rbx,-376(%rbp)movq-112(%rbp),%r8movq%r8,-384(%rbp)movq-104(%rbp),%r9movq%r9,-392(%rbp)movq-96(%rbp),%r10movq%r10,-400(%rbp)movq-88(%rbp),%r11movq%r11,-368(%rbp)movq-80(%rbp),%raxmovq-72(%rbp),%rcxmovq%rcx,-408(%rbp)movq-64(%rbp),%rdxmovq%rdx,-416(%rbp)movq-56(%rbp),%rsimovq%rsi,-424(%rbp)movq-48(%rbp),%rdimovq%rdi,-432(%rbp)movq%rbx,-200(%rbp)movq%rbx,%r14movq%r8,-192(%rbp)movq%r8,%r15movq%r9,-184(%rbp)movq%r9,%r12movq%r10,-176(%rbp)movq%r10,%r13movq%r11,-168(%rbp)movq%rax,-160(%rbp)movq%rax,%rbxmovq%rcx,-152(%rbp)movq%rdx,-144(%rbp)movq%rsi,-136(%rbp)movq%rdi,-128(%rbp)leaq-200(%rbp),%rdicallq__TF4main14getHugeElementFVS_7TenIntsSimovq%r14,-280(%rbp)movq%r15,-272(%rbp)movq%r12,-264(%rbp)movq%r13,-256(%rbp)movq-368(%rbp),%raxmovq%rax,-248(%rbp)movq%rbx,-240(%rbp)movq-408(%rbp),%r14movq%r14,-232(%rbp)movq-416(%rbp),%r15movq%r15,-224(%rbp)movq-424(%rbp),%r12movq%r12,-216(%rbp)movq-432(%rbp),%r13movq%r13,-208(%rbp)leaq-280(%rbp),%rdicallq__TF4main14getHugeElementFVS_7TenIntsSimovq-376(%rbp),%raxmovq%rax,-360(%rbp)movq-384(%rbp),%raxmovq%rax,-352(%rbp)movq-392(%rbp),%raxmovq%rax,-344(%rbp)movq-400(%rbp),%raxmovq%rax,-336(%rbp)movq-368(%rbp),%raxmovq%rax,-328(%rbp)movq%rbx,-320(%rbp)movq%r14,-312(%rbp)movq%r15,-304(%rbp)movq%r12,-296(%rbp)movq%r13,-288(%rbp)leaq-360(%rbp),%rdicallq__TF4main14getHugeElementFVS_7TenIntsSiaddq$392,%rsppopq%rbxpopq%r12popq%r13popq%r14popq%r15popq%rbpretq
The structure of this function is similar to the previous version. It callsgetHuge, copies the result, then callsgetHugeElement three times. For each call, it copies thestructagain, presumably to guard againstgetHugeElement making modifications. What I found really interesting is that the copies are all done one element at a time using integer registers, rather than two elements at a time inxmm registers astestHuge did. I'm not sure what causes the compiler to choose the integer registers here, as it seems like copying two elements at a time with thexmm registers would be more efficient and result in smaller code.
I also experimented with really largestructs:
structHundredInts{varelements=(TenInts(),TenInts(),TenInts(),TenInts(),TenInts(),TenInts(),TenInts(),TenInts(),TenInts(),TenInts())}structThousandInts{varelements=(HundredInts(),HundredInts(),HundredInts(),HundredInts(),HundredInts(),HundredInts(),HundredInts(),HundredInts(),HundredInts(),HundredInts())}funcgetThousandInts()->ThousandInts{returnThousandInts()}
The generated code forgetThousandInts is pretty crazy:
pushq%rbppushq%rbxsubq$8008,%rspmovq%rdi,%rbxleaq-8008(%rbp),%rdicallq__TFV4main12ThousandIntsCfMS0_FT_S0_movq-8008(%rbp),%raxmovq%rax,(%rbx)movq-8000(%rbp),%raxmovq%rax,8(%rbx)movq-7992(%rbp),%raxmovq%rax,16(%rbx)movq-7984(%rbp),%raxmovq%rax,24(%rbx)movq-7976(%rbp),%raxmovq%rax,32(%rbx)movq-7968(%rbp),%raxmovq%rax,40(%rbx)movq-7960(%rbp),%raxmovq%rax,48(%rbx)movq-7952(%rbp),%raxmovq%rax,56(%rbx)movq-7944(%rbp),%raxmovq%rax,64(%rbx)movq-7936(%rbp),%raxmovq%rax,72(%rbx)...movq-104(%rbp),%raxmovq%rax,7904(%rbx)movq-96(%rbp),%raxmovq%rax,7912(%rbx)movq-88(%rbp),%raxmovups-80(%rbp),%xmm0movups-64(%rbp),%xmm1movups-48(%rbp),%xmm2movups-32(%rbp),%xmm3movq%rax,7920(%rbx)movq-16(%rbp),%raxmovups%xmm0,7928(%rbx)movups%xmm1,7944(%rbx)movups%xmm2,7960(%rbx)movups%xmm3,7976(%rbx)movq%rax,7992(%rbx)movq%rbx,%raxaddq$8008,%rsppopq%rbxpopq%rbpretq
The compiler generates two thousand instructions to copy thisstruct. This seems like a good place to emit a call tomemcpy, but I imagine that optimizing for absurdly giganticstructs isn't a high priority for the compiler team right now.
Class Fields
Let's take a look at what happens when thestruct fields are more complicated than simple integers. Here's a simpleclass, and astruct which contains one:
classExampleClass{}structContainsClass{varx:Intvary:ExampleClassvarz:Int}
Here's a set of functions (split across two files to defeat inlining) which exercise them:
functestContainsClass(){lets=ContainsClass(x:1,y:getExampleClass(),z:3)getClassX(s)getClassY(s)getClassZ(s)}funcgetExampleClass()->ExampleClass{returnExampleClass()}funcgetClassX(parameter:ContainsClass)->Int{returnparameter.x}funcgetClassY(parameter:ContainsClass)->ExampleClass{returnparameter.y}funcgetClassZ(parameter:ContainsClass)->Int{returnparameter.z}
Let's start by looking at the generated code for the getters. Here'sgetClassX:
pushq%rbpmovq%rsp,%rbppushq%rbxpushq%raxmovq%rdi,%rbxmovq%rsi,%rdicallq_swift_releasemovq%rbx,%raxaddq$8,%rsppopq%rbxpopq%rbpretq
The threestruct elements will be passed in the first three parameter registers,rdi,rsi, andrdx. This function wants to return the value inrdi by moving it torax and then returning, but it has to do some bookkeeping first. It appears that the object reference passed inrsi is passed retained, and must be released before the function returns. This code movesrdi into a safe temporary register,rbx, then moves the object reference tordi and callsswift_release to release it. It then moves the value inrbx to the return registerrax and returns from the function.
The code forgetClassZ is pretty much the same, except instead of taking the value fromrdi, it takes it fromrdx:
pushq%rbpmovq%rsp,%rbppushq%rbxpushq%raxmovq%rdx,%rbxmovq%rsi,%rdicallq_swift_releasemovq%rbx,%raxaddq$8,%rsppopq%rbxpopq%rbpretq
The code forgetClassY will be the odd one, since it returns an object reference rather than an integer. Here it is:
pushq%rbpmovq%rsp,%rbpmovq%rsi,%raxpopq%rbpretq
This is short! It moves the value fromrsi, which is the object reference, intorax and returns it. There's no bookkeeping, just a quick shuffling of data. Apparently, the value is passed in retained, but also returned retained, so this code doesn't have to do any memory management at all.
So far we've seen that the code for dealing with thisstruct is much like the code for dealing with thestruct containing threeInt fields, except that the object reference field is passed in retained and must be released by the callee. With that in mind, let's look at the generated code for testContainsClass:
pushq%rbpmovq%rsp,%rbppushq%r14pushq%rbxcallq__TF4main15getExampleClassFT_CS_12ExampleClassmovq%rax,%rbxmovq%rbx,%rdicallq_swift_retainmovq%rax,%r14movl$1,%edimovl$3,%edxmovq%rbx,%rsicallq__TF4main9getClassXFVS_13ContainsClassSimovq%r14,%rdicallq_swift_retainmovl$1,%edimovl$3,%edxmovq%rbx,%rsicallq__TF4main9getClassYFVS_13ContainsClassCS_12ExampleClassmovq%rax,%rdicallq_swift_releasemovl$1,%edimovl$3,%edxmovq%rbx,%rsipopq%rbxpopq%r14popq%rbpjmp__TF4main9getClassZFVS_13ContainsClassSi
The first thing this function does is callgetExampleClass to get theExampleClass instance it stores in thestruct. It takes the returned reference and moves it torbx for safekeeping.
Next, it callsgetClassX, and to do so it has to build a copy of thestruct in the parameter registers. The two integer fields are easy, but the object field needs to be retained to match what the functions expect. The code callsswift_retain on the value stored inrbx, then places that value inrsi and places1 and3 inrdi andrdx to build the complete struct. Finally, it callsgetClassX.
The code to callgetClassY is nearly the same. However,getClassY returns an object reference which needs to be released. After the call, this code moves the return value intordi and callsswift_release to take care of its required memory management.
This function callsgetClassZ as a tail call, so the code here is a bit different. The object reference came retained fromgetExampleClass, so it doesn't need to be retained separately for this final call. This code places it intorsi, places1 and3 intordi andrdx again, then cleans up the stack and jumps togetClassZ to make the final call.
Ultimately, there's little change from astruct with allInts. The only real difference is that copying astruct with an object in it requires retaining that object, and disposing of thatstruct requires releasing the object.
Conclusionstruct storage in Swift is ultimately pretty straightforward, and much of what we've seen carries over from C's much simplerstructs. Astruct instance is largely treated as a loose collection of independent values, which can be manipulated collectively when required. Localstruct variables might be stored on the stack or the individual pieces might be stored in registers, depending on the size of thestruct, the register usage of the rest of the code, and the whims of the compiler. Smallstructs are passed and returned in registers, while largerstructs are passed and returned by reference.structs get copied whenever they're passed and returned. Although you can usestructs to implement copy-on-write data types, the base language construct is copied eagerly and more or less blindly.
That's it for today. Come back next time for more daring feats of programming. Friday Q&A is driven by reader ideas, so if you grow bored while waiting for the next installment and have something you'd like to see discussed,send it in!
Add your thoughts, post a comment:
Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.