In aprevious article we detailed a bottom-up methodology to estimate the power consumption of AWS EC2 instances. This time, we are sharing a dataset containing an estimation of all instances’ carbon footprint, related to both manufacturing and using the servers.
Our goal is to be able to estimate the carbon impact of services relying on EC2 hardware. In the first part of this article we cover two additional steps we took following ourinitial tests:
We then generalize our results to all available EC2 instances, even if we were not able to actually measure them. For this, we had to gather EC2 hardware specifications and define a way toestimate their power consumption profiles (Chapter 3).
Finally, we quickly cover how we convert power consumption into carbon emissions (Chapter 4) and close the article with a naive proposal forestimating embodied emissions for hyperscale server hardware (Chapter 5).
If you are curious, you can look directly at thesimpleestimator🧮 we’ve put together to play with the results. The dataset is also available as aspreadsheet which you can duplicate.
This is awork in progress initiative and our estimation only covers EC2 server hardware. Here is a simplified overview of a Data Center:
We can see that there are a lot of moving parts around our workloads that should be included in a proper assessment:
As we can see, there’s room for improvement but we think that this first step can still be useful to better grasp the physical reality of cloud infrastructure. Of course,any feedback is more than welcome.
Since our initial study, we have performed additional tests to make sure our measurements were consistent. The methodology and our assumptions are detailed in ourprevious article. As a reminder, we have packaged a tool calledturbostress that performs several stress tests to simulate different workloads and report power consumption measures usingIntel RAPL.
We performed this on availableIntel-based bare metal instances. Overall, we have been able to assess the following instances:c5, m5, r5, m5zn, z1d, i3, c5n.
📥 Rawturbostress exports used in this article can be found in thisrepository.
We were particularly curious about whether or not we could observe variations from one instance to another. This led us to perform multiple tests andwe found significant differences with some of our early measurements, namely for the c5, m5, and r5.
Here we compare three different tests on c5.metal instances, performed in three different regions between February (initial experiments) and June 2021.
Apart from a lower idle consumption, we can see animportant increase in the reported numbers on our more recent tests. This is concerning, but at least our two new tests are consistent.
Luckily,turbostress outputs the CPU information (/proc/cpuinfo) on top of the power measurements. By looking at this, we were able to identify the most likely culprit for these discrepancies: theCPU frequency — 🤦♂️ rookie mistake.
Here is a comparison of the reported frequency for each of the 96 logical CPUs (threads). We can see that the first machine, in dark blue, isn’t running at its full capacity. The two others show a comparable “clocking profile,” which is reassuring.
We suppose that this is linked toDVFS techniques that can be used to dynamically scale voltage and frequency at the CPU core level for energy-saving purposes.
The following graph shows the average thread frequency compared to thebase andmax frequency of the CPU model. We used this toidentify measurements performed on underclocked machines and only keep the others for our study.
We consider our measurements to be valid when the reported frequency sits betweenbase andmaxvalues. Fortunately, our measurements on the m5zn, z1d, i3, and c5n were apparently ok.
One of the limitations of RAPL measurements is thatwe are only capturing CPU and DRAM power consumption. Also, this is a software measure and we wanted to know how this would compare with on-premise power readings.
With the help ofWorkflowers andHardbricks we were able to run our turbostress protocol on-premise and compare the results withBMC power consumption readings. This should be closer to reality even though it’s still not a perfect measurement using a power analyzer.
Here is a comparison of the two measurements and the difference (Δ in yellow):
This first result seems to confirm that RAPL readings are consistent with BMC readings. The Δ should be a good enough proxy to estimate the consumption related to the rest of the machine, namely:
Further tests would ideally be required to definitely confirm this assumption. Contact us if you want to help👋.
We are only able to perform our test on bare metal instances and these are not available for all instance types. Also,we are not able to perform the same test on ARM-based architectures even though we have access to bare metal options.
In order to build a dataset covering all available instances, we assumed that all instances ultimately rely on a limited number of hardware platforms containing:
Here is an extract focusing on the CPU platforms available on EC2 as of June 2021. The information comes from AWS’ public documentation or was collected via/proc/cpuinfoandturbostat:
Most of theseCPUs are custom-made for AWS, so some of the specifications are guessed (initalic*). This “CPU platforms” dataset is useful for both power consumption and embodied emission estimations (see chapter 5).
Regarding the power consumption of the instances we have decided to only keepfour load levels for practical reasons:
Thanks to our measurements we have enough data to build consumption profiles for the CPU and DRAM domains. Here is how we defined these values :
In order to generalize this to all platforms, we tried to find the closest hardware and estimated the consumption based on the TDP and a simple rule of thumb.
For example, we use the measurements performed on the m5.metal equipped with a Xeon Platinum 8175M (240W TDP) to derive the power consumption profile of theXeon Platinum 8176M (165W TDP) used in thehigh memory instances. This is detailed in the dataset.
For theAMD and ARM CPUs we rely on the information we found online and make some assumptions. For example, Graviton 2 ARM processors are based on theNeoverse N1 platform and ARM indicates a150W TDP for a 64 core CPU for hyperscale data centers.
For these CPUs, we calculated a simple average of our previous measurements based on the watts consumed per watt TDP (L1). Here is the result for each load level as of writing this article:
For a CPU with a
100W TDP, we will consider that the idle consumption will be100*0,12 = 12 Watts. At full capacity, we are basically considering the TDP value as the actual power consumption.
For GPUs, we use the TDP reported by the manufacturers and the same table as for CPUs. We could later revise this by doing some proper measurements usingnvidia-smi.
For example, the consumption of aTesla V100 GPU that has a TDP of
300Wwill be estimated at300*0,75 = 226 Wattsfor an average workload (50% load level) using the above table.
The rest of a commodity server can include other components such as fans, storage drives, network cards, and other parts we couldn’t measure or include in the previously listed estimations.
In order tocover for the consumption of all these “other” elements we took a shortcut and defined a constant value based on the CPU TDP. We defined this according to our on-premise test (from Chapter 2) and other available data.
On the Lenovo ST550 machine seen earlier, the average difference between RAPL and BMC power consumption readings corresponds to~15% of the CPU configuration TDP (2*85W). Using the same approach we tested our simple “model” with the data provided by Dell for the PowerEdge R740.
In adetailed Life Cycle Assessment, the manufacturer communicates the consumption profile of the machine according to the same four load levels. Comparing both values we obtain an average difference corresponding to~13% of the CPU configuration TDP.
For now, we have defined a simple heuristic that is equal to20% of the CPU(s) TDP. We consider that this value should cover the “other” components’ power consumption.
We know thatthis is far from being rigorous, especially if we include exotic server configurations. However, we think this is still better than considering this consumption negligible and should be a good starting point for commodity servers.
As discussed in our previous article, we consider thatbare-metal resources are cut into instances in a linear fashion.
In thisgreat talk from re:Invent 2017, Adam Boeglin describes how c5 instances are sized. In his example, the c5.18.xlarge instance is the equivalent of two c5.9.xlarge instances and the CPU to memory ratio stays the same across all sizes.
In our dataset, we apply a vCPU ratioInstance number of vCPUs / Bare metal number of vCPUs to split our bare metal estimation to the instance level.
This part will be brief!
In order to convert our power consumption into carbon emissions, we simply apply the electricity carbon emission factor for each data center geolocation. TheCloudCarbonFootprint team has already done this for AWS.
We also included the Power Usage Effectiveness (PUE) in our estimation. AWScommunicates that, according to internal numbers, all their data centers have a PUE under1.2.
We decided to stick to that number.
Most available initiatives are focusing on emissions generated from the use phase, also called “Scope 2” in carbon accounting (from the cloud provider’s point of view). We are deeply convinced thatwe cannot build a proper sustainability strategy by only focussing on Scope 2 emissions.
Thankfully, this is something that is slowly getting more and more attention.
One of the most recent examples is the study from Udit Gupta et al.,Chasing Carbon: The Elusive Environmental Footprint of Computing, where a research team from Harvard & Facebook indicates that:
“most emissions related to modern mobile and data-center equipment come from hardware manufacturing and infrastructure.”
Now, let’s have a quick overview of this whole new kingdom of uncertainties and approximations. First, thestate of the art on manufacturing carbon emissions is quite limited for IT equipment.
We can however list some interesting references:
Here we will only skim the surface. A proper state of the art on IT hardware embodied emissions is ongoing in collaboration with theBoavizta initiative. Stay tuned!
Update January 2022, see this publication from Boavizta —How to evaluate server manufacturing footprint, beyond greenhouse gas emissions?
If we compare Dell’s product carbon footprint data we have a hard time identifying characteristics that could be used as a proxy to estimate the embodied carbon emissions for EC2 hardware.
Here is a table listing the known specifications of the machines and their reported manufacturing carbon footprint:
We can observe a few interesting things in the first wave of early 2019 reports:
1141 to1782 kgCO2eq.Now, if we have a look at the second wave of reports from early 2021, it’s even harder to draw some conclusions:
~750 kgCO2eq.If we assume the methodology to be the same for all these reports it could mean that Dell has greatly improved its supply chain. In any case,we are missing a detailed analysis by component so that we could determine whether or not there are specific parts driving most of the carbon impacts.
One of the only relevant resources available in this area is the aforementionedLife Cycle Assessment performed on the Dell R740 machine (also from 2019). Here is the detailed manufacturing carbon footprint by component for this machine:
At first glance, we can see that the eight high-volume SSDs have an important impact. However, this configuration seems quite specific and we don’t have many equivalents on EC2, except maybe thei3en.
What’s also interesting is thatDRAM is the second most important driver for manufacturing emissions in this analysis. As mentioned in the study:
“The twelve 32GB RAM bars used within the configuration account for around 33% of the total mass of the mixed PWB [but] they account for over 90% of the total GWP impact of the PWB Mixed due to their high capacity per RAM bar and the associated complexity and density of the built-in chips and dies.”
Dell also published aproduct carbon footprint fact sheet for the R740 so we can see if it matches the Life Cycle Assessment (LCA) data. The specifications for the two machines are not similar so we need to adapt a few things on the R740 from the LCA to fall back to a comparable configuration:
Here is a table comparing the manufacturing impact of the two R740 configurations depending on the source:
This result is quite disturbing. While we are not exactly comparing two identical machineswe obtain a drastically different value:1313 kgCO2eq versus550 kgCO2eq. It could suggest that these two analyses were not performed using the same methodology and/or using the same emission factors.
Failing to find an ideal model to estimate the manufacturing emissions for EC2 hardware, we decided to settle with some arbitrary values:
1 CPU,16 GB Memory) as a baseline and assume it has a manufacturing carbon footprint of1000 kgCO2eq. Here we include AWS’ Nitro cards.16 GB), and CPUs. Here is the summary of these values:Using these “Embodied Emission Factors” we are able to adapt our estimations based on the bare metal specifications. Once again, we are aware of how limited this approach is:
We now have a value to estimate emissions from the manufacturing phase. However,applying it to our usage report isn’t straightforward.
The lifespan of a server is set at 4 years in Dell’s assessments. We took an easy approach and considered that we can linearly spread embodied emissions. We simply dividing them by the number of hours in a 4 year period to get an hourly rate.
This hasseveral limitations. We regularly use instances that were introduced more than 4 years ago and we assume that AWS doesn’t install older generation instances when a new one is launched. In that case, should we reward the use of old hardware by lowering their embodied emission factor? For now, we are using the release date information to build a qualitative KPI and observe how “old” our infrastructure is.
Looking at these questions we also identified that the way we distribute embodied emissions can also drive misleading optimization strategies. Let’s see.
Here, it’s important to remember thatthemanufacturing of computing systems has a wider environmental impact compared to the use phase.
Udit Gupta et al. have pointed out the limitations of only focusing on carbon emissions:
“environmental impact of computing systems is multifaceted, spanning water consumption as well as use of other natural resources, including aluminum, cobalt, copper, glass, gold, tin, lithium, zinc, and plastic.”
By only looking at a “carbon KPI,” we could be tempted to regularly move our workloads to newer and more energy-efficient instances and somewhatneglect the other impacts involved with manufacturing this new hardware.
On that point, we have no preconceived ideas about what would be the less impactful tradeoff. This is where having more multi-factor assessments would be handy.
⚠️Update 2023: Following the initial release this work is continued through theBoavizta initiative with a more granular approach to embodied emissions and the dataset hasn’t been updated.
Alldata and sources can be found in aspreadsheet. A simpleestimator page is available as well to play with the dataset.
We hope this work will prove to be useful. We have pushed the bottom-up approach as far as we could, at least on the EC2 part. While this data will serve us to create new KPIs to monitor our cloud platform, we are expecting providers to release more and more data to fuel sustainability initiatives.
Google is now showcasing data centers with the “Lowest CO2” in their GCP console to guide infrastructure location strategies. This is a good start, although it only covers the impact from the electricity used to run the data center and we’ve seen that it’s much more complex than that.
There is a greatresearch opportunity in assessing the tricky cost/performance/impact trade-off we have to deal with when new hardware is released.
Ideally, we could expect providers to incentivize their clients with impact-aware schemes in the future or, at least,provide granular reports that include the whole lifecycle.
🙏We would like to thank the community working on this challenge, especiallyCloud Carbon Footprint,David Mytton andBoavizta, with a special acknowledgement to Workflowers and Hardbricks for their help in testing our results. Thanks also toCaroline Agase for reviewing the article.
The innovators building the future of digital advertising
Sustainability Director at Teads - Opinions expressed are my own