RELATED CASEThis application claims priority to U.S. Provisional Patent Application No. 62/365,969, filed Jul. 22, 2016, U.S. Provisional Patent Application No. 62/376,859, filed Aug. 18, 2016, and U.S. Provisional Patent Application No. 62/427,268, filed Nov. 29, 2016, each of which is hereby incorporated by reference in its entirety.
BACKGROUNDIn the course of ordinary operation of a data center, various types of maintenance are typically necessary in order to maintain desired levels of performance, stability, and reliability. Examples of such maintenance include testing, repair, replacement, and/or reconfiguration of components, installing new components, upgrading existing components, repositioning components and equipment, and other tasks of such a nature. A large modern data center may contain great numbers of components and equipment of various types, and as a result, may have the potential to impose a fairly substantial maintenance burden.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 illustrates an embodiment of a first data center.
FIG. 2 illustrates an embodiment of a logical configuration of a rack.
FIG. 3 illustrates an embodiment of a second data center.
FIG. 4 illustrates an embodiment of a third data center.
FIG. 5 illustrates an embodiment of a connectivity scheme.
FIG. 6 illustrates an embodiment of first rack architecture.
FIG. 7 illustrates an embodiment of a first sled.
FIG. 8 illustrates an embodiment of a second rack architecture.
FIG. 9 illustrates an embodiment of a rack.
FIG. 10 illustrates an embodiment of a second sled.
FIG. 11 illustrates an embodiment of a fourth data center.
FIG. 12 illustrates an embodiment of a first logic flow.
FIG. 13 illustrates an embodiment of a fifth data center.
FIG. 14 illustrates an embodiment of an automated maintenance device.
FIG. 15 illustrates an embodiment of a first operating environment.
FIG. 16 illustrates an embodiment of a second operating environment.
FIG. 17 illustrates an embodiment of a third operating environment.
FIG. 18 illustrates an embodiment of a fourth operating environment.
FIG. 19 illustrates an embodiment of a fifth operating environment.
FIG. 20 illustrates an embodiment of a sixth operating environment.
FIG. 21 illustrates an embodiment of a first logic flow.
FIG. 22 illustrates an embodiment of a second logic flow.
FIG. 23 illustrates an embodiment of a third logic flow.
FIG. 24A illustrates an embodiment of a first storage medium.
FIG. 24B illustrates an embodiment of a second storage medium.
FIG. 25 illustrates an embodiment of a computing architecture.
FIG. 26 illustrates an embodiment of a communications architecture.
FIG. 27 illustrates an embodiment of a communication device.
FIG. 28 illustrates an embodiment of a first wireless network.
FIG. 29 illustrates an embodiment of a second wireless network.
DETAILED DESCRIPTIONVarious embodiments may be generally directed to techniques for automated data center maintenance. In one embodiment, for example, an automated maintenance device may comprise processing circuitry and non-transitory computer-readable storage media comprising instructions for execution by the processing circuitry to cause the automated maintenance device to receive an automation command from an automation coordinator for a data center, identify an automated maintenance procedure based on the received automation command, and perform the identified automated maintenance procedure. Other embodiments are described and claimed.
Various embodiments may comprise one or more elements. An element may comprise any structure arranged to perform certain operations. Each element may be implemented as hardware, software, or any combination thereof, as desired for a given set of design parameters or performance constraints. Although an embodiment may be described with a limited number of elements in a certain topology by way of example, the embodiment may include more or less elements in alternate topologies as desired for a given implementation. It is worthy to note that any reference to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrases “in one embodiment,” “in some embodiments,” and “in various embodiments” in various places in the specification are not necessarily all referring to the same embodiment.
FIG. 1 illustrates a conceptual overview of adata center100 that may generally be representative of a data center or other type of computing network in/for which one or more techniques described herein may be implemented according to various embodiments. As shown inFIG. 1,data center100 may generally contain a plurality of racks, each of which may house computing equipment comprising a respective set of physical resources. In the particular non-limiting example depicted inFIG. 1,data center100 contains fourracks102A to102D, which house computing equipment comprising respective sets of physical resources (PCRs)105A to105D. According to this example, a collective set ofphysical resources106 ofdata center100 includes the various sets of physical resources105A to105D that are distributed amongracks102A to102D.Physical resources106 may include resources of multiple types, such as—for example—processors, co-processors, accelerators, field-programmable gate arrays (FPGAs), memory, and storage. The embodiments are not limited to these examples.
Theillustrative data center100 differs from typical data centers in many ways. For example, in the illustrative embodiment, the circuit boards (“sleds”) on which components such as CPUs, memory, and other components are placed are designed for increased thermal performance. In particular, in the illustrative embodiment, the sleds are shallower than typical boards. In other words, the sleds are shorter from the front to the back, where cooling fans are located. This decreases the length of the path that air must to travel across the components on the board. Further, the components on the sled are spaced further apart than in typical circuit boards, and the components are arranged to reduce or eliminate shadowing (i.e., one component in the air flow path of another component). In the illustrative embodiment, processing components such as the processors are located on a top side of a sled while near memory, such as DIMMs, are located on a bottom side of the sled. As a result of the enhanced airflow provided by this design, the components may operate at higher frequencies and power levels than in typical systems, thereby increasing performance. Furthermore, the sleds are configured to blindly mate with power and data communication cables in eachrack102A,102B,102C,102D, enhancing their ability to be quickly removed, upgraded, reinstalled, and/or replaced. Similarly, individual components located on the sleds, such as processors, accelerators, memory, and data storage drives, are configured to be easily upgraded due to their increased spacing from each other. In the illustrative embodiment, the components additionally include hardware attestation features to prove their authenticity.
Furthermore, in the illustrative embodiment, thedata center100 utilizes a single network architecture (“fabric”) that supports multiple other network architectures including Ethernet and Omni-Path. The sleds, in the illustrative embodiment, are coupled to switches via optical fibers, which provide higher bandwidth and lower latency than typical twister pair cabling (e.g., Category 5, Category 5e, Category 6, etc.). Due to the high bandwidth, low latency interconnections and network architecture, thedata center100 may, in use, pool resources, such as memory, accelerators (e.g., graphics accelerators, FPGAs, ASICs, etc.), and data storage drives that are physically disaggregated, and provide them to compute resources (e.g., processors) on an as needed basis, enabling the compute resources to access the pooled resources as if they were local. Theillustrative data center100 additionally receives usage information for the various resources, predicts resource usage for different types of workloads based on past resource usage, and dynamically reallocates the resources based on this information.
Theracks102A,102B,102C,102D of thedata center100 may include physical design features that facilitate the automation of a variety of types of maintenance tasks. For example,data center100 may be implemented using racks that are designed to be robotically-accessed, and to accept and house robotically-manipulable resource sleds. Furthermore, in the illustrative embodiment, theracks102A,102B,102C,102D include integrated power sources that receive a greater voltage than is typical for power sources. The increased voltage enables the power sources to provide additional power to the components on each sled, enabling the components to operate at higher than typical frequencies.FIG. 2 illustrates an exemplary logical configuration of arack202 of thedata center100. As shown inFIG. 2,rack202 may generally house a plurality of sleds, each of which may comprise a respective set of physical resources. In the particular non-limiting example depicted inFIG. 2, rack202 houses sleds204-1 to204-4 comprising respective sets of physical resources205-1 to205-4, each of which constitutes a portion of the collective set ofphysical resources206 comprised inrack202. With respect toFIG. 1, ifrack202 is representative of—for example—rack102A, thenphysical resources206 may correspond to the physical resources105A comprised inrack102A. In the context of this example, physical resources105A may thus be made up of the respective sets of physical resources, including physical storage resources205-1, physical accelerator resources205-2, physical memory resources205-3, and physical compute resources205-5 comprised in the sleds204-1 to204-4 ofrack202. The embodiments are not limited to this example. Each sled may contain a pool of each of the various types of physical resources (e.g., compute, memory, accelerator, storage). By having robotically accessible and robotically manipulable sleds comprising disaggregated resources, each type of resource can be upgraded independently of each other and at their own optimized refresh rate.
FIG. 3 illustrates an example of adata center300 that may generally be representative of one in/for which one or more techniques described herein may be implemented according to various embodiments. In the particular non-limiting example depicted inFIG. 3,data center300 comprises racks302-1 to302-32. In various embodiments, the racks ofdata center300 may be arranged in such fashion as to define and/or accommodate various access pathways. For example, as shown inFIG. 3, the racks ofdata center300 may be arranged in such fashion as to define and/or accommodateaccess pathways311A,311B,311C, and311D. In some embodiments, the presence of such access pathways may generally enable automated maintenance equipment, such as robotic maintenance equipment, to physically access the computing equipment housed in the various racks ofdata center300 and perform automated maintenance tasks (e.g., replace a failed sled, upgrade a sled). In various embodiments, the dimensions ofaccess pathways311A,311B,311C, and311D, the dimensions of racks302-1 to302-32, and/or one or more other aspects of the physical layout ofdata center300 may be selected to facilitate such automated operations. The embodiments are not limited in this context.
FIG. 4 illustrates an example of adata center400 that may generally be representative of one in/for which one or more techniques described herein may be implemented according to various embodiments. As shown inFIG. 4,data center400 may feature anoptical fabric412.Optical fabric412 may generally comprise a combination of optical signaling media (such as optical cabling) and optical switching infrastructure via which any particular sled indata center400 can send signals to (and receive signals from) each of the other sleds indata center400. The signaling connectivity thatoptical fabric412 provides to any given sled may include connectivity both to other sleds in a same rack and sleds in other racks. In the particular non-limiting example depicted inFIG. 4,data center400 includes fourracks402A to402D.Racks402A to402D house respective pairs ofsleds404A-1 and404A-2,404B-1 and404B-2,404C-1 and404C-2, and404D-1 and404D-2. Thus, in this example,data center400 comprises a total of eight sleds. Viaoptical fabric412, each such sled may possess signaling connectivity with each of the seven other sleds indata center400. For example, viaoptical fabric412,sled404A-1 inrack402A may possess signaling connectivity withsled404A-2 inrack402A, as well as the sixother sleds404B-1,404B-2,404C-1,404C-2,404D-1, and404D-2 that are distributed among theother racks402B,402C, and402D ofdata center400. The embodiments are not limited to this example.
FIG. 5 illustrates an overview of aconnectivity scheme500 that may generally be representative of link-layer connectivity that may be established in some embodiments among the various sleds of a data center, such as any ofexample data centers100,300, and400 ofFIGS. 1, 3, and4.Connectivity scheme500 may be implemented using an optical fabric that features a dual-modeoptical switching infrastructure514. Dual-modeoptical switching infrastructure514 may generally comprise a switching infrastructure that is capable of receiving communications according to multiple link-layer protocols via a same unified set of optical signaling media, and properly switching such communications. In various embodiments, dual-modeoptical switching infrastructure514 may be implemented using one or more dual-modeoptical switches515. In various embodiments, dual-modeoptical switches515 may generally comprise high-radix switches. In some embodiments, dual-modeoptical switches515 may comprise multi-ply switches, such as four-ply switches. In various embodiments, dual-modeoptical switches515 may feature integrated silicon photonics that enable them to switch communications with significantly reduced latency in comparison to conventional switching devices. In some embodiments, dual-modeoptical switches515 may constituteleaf switches530 in a leaf-spine architecture additionally including one or more dual-mode optical spine switches520.
In various embodiments, dual-mode optical switches may be capable of receiving both Ethernet protocol communications carrying Internet Protocol (IP packets) and communications according to a second, high-performance computing (HPC) link-layer protocol (e.g., Intel's Omni-Path Architecture's, Infiniband) via optical signaling media of an optical fabric. As reflected inFIG. 5, with respect to any particular pair ofsleds504A and504B possessing optical signaling connectivity to the optical fabric,connectivity scheme500 may thus provide support for link-layer connectivity via both Ethernet links and HPC links. Thus, both Ethernet and HPC communications can be supported by a single high-bandwidth, low-latency switch fabric. The embodiments are not limited to this example.
FIG. 6 illustrates a general overview of arack architecture600 that may be representative of an architecture of any particular one of the racks depicted inFIGS. 1 to 4 according to some embodiments. As reflected inFIG. 6,rack architecture600 may generally feature a plurality of sled spaces into which sleds may be inserted, each of which may be robotically-accessible via arack access region601. In the particular non-limiting example depicted inFIG. 6,rack architecture600 features five sled spaces603-1 to603-5. Sled spaces603-1 to603-5 feature respective multi-purpose connector modules (MPCMs)616-1 to616-5.
Included among the types of sleds to be accommodated byrack architecture600 may be one or more types of sleds that feature expansion capabilities.FIG. 7 illustrates an example of asled704 that may be representative of a sled of such a type. As shown inFIG. 7,sled704 may comprise a set ofphysical resources705, as well as anMPCM716 designed to couple with a counterpart MPCM whensled704 is inserted into a sled space such as any of sled spaces603-1 to603-5 ofFIG. 6.Sled704 may also feature anexpansion connector717.Expansion connector717 may generally comprise a socket, slot, or other type of connection element that is capable of accepting one or more types of expansion modules, such as anexpansion sled718. By coupling with a counterpart connector onexpansion sled718,expansion connector717 may providephysical resources705 with access tosupplemental computing resources705B residing onexpansion sled718. The embodiments are not limited in this context.
FIG. 8 illustrates an example of arack architecture800 that may be representative of a rack architecture that may be implemented in order to provide support for sleds featuring expansion capabilities, such assled704 ofFIG. 7. In the particular non-limiting example depicted inFIG. 8,rack architecture800 includes seven sled spaces803-1 to803-7, which feature respective MPCMs816-1 to816-7. Sled spaces803-1 to803-7 include respective primary regions803-1A to803-7A and respective expansion regions803-1B to803-7B. With respect to each such sled space, when the corresponding MPCM is coupled with a counterpart MPCM of an inserted sled, the primary region may generally constitute a region of the sled space that physically accommodates the inserted sled. The expansion region may generally constitute a region of the sled space that can physically accommodate an expansion module, such asexpansion sled718 ofFIG. 7, in the event that the inserted sled is configured with such a module.
FIG. 9 illustrates an example of arack902 that may be representative of a rack implemented according torack architecture800 ofFIG. 8 according to some embodiments. In the particular non-limiting example depicted inFIG. 9, rack902 features seven sled spaces903-1 to903-7, which include respective primary regions903-1A to903-7A and respective expansion regions903-1B to903-7B. In various embodiments, temperature control inrack902 may be implemented using an air cooling system. For example, as reflected inFIG. 9,rack902 may feature a plurality of fans919 that are generally arranged to provide air cooling within the various sled spaces903-1 to903-7. In some embodiments, the height of the sled space is greater than the conventional “1U” server height. In such embodiments, fans919 may generally comprise relatively slow, large diameter cooling fans as compared to fans used in conventional rack configurations. Running larger diameter cooling fans at lower speeds may increase fan lifetime relative to smaller diameter cooling fans running at higher speeds while still providing the same amount of cooling. The sleds are physically shallower than conventional rack dimensions. Further, components are arranged on each sled to reduce thermal shadowing (i.e., not arranged serially in the direction of air flow). As a result, the wider, shallower sleds allow for an increase in device performance because the devices can be operated at a higher thermal envelope (e.g., 250 W) due to improved cooling (i.e., no thermal shadowing, more space between devices, more room for larger heat sinks, etc.).
MPCMs916-1 to916-7 may be configured to provide inserted sleds with access to power sourced by respective power modules920-1 to920-7, each of which may draw power from anexternal power source921. In various embodiments,external power source921 may deliver alternating current (AC) power to rack902, and power modules920-1 to920-7 may be configured to convert such AC power to direct current (DC) power to be sourced to inserted sleds. In some embodiments, for example, power modules920-1 to920-7 may be configured to convert 277-volt AC power into 12-volt DC power for provision to inserted sleds via respective MPCMs916-1 to916-7. The embodiments are not limited to this example.
MPCMs916-1 to916-7 may also be arranged to provide inserted sleds with optical signaling connectivity to a dual-modeoptical switching infrastructure914, which may be the same as—or similar to—dual-modeoptical switching infrastructure514 ofFIG. 5. In various embodiments, optical connectors contained in MPCMs916-1 to916-7 may be designed to couple with counterpart optical connectors contained in MPCMs of inserted sleds to provide such sleds with optical signaling connectivity to dual-modeoptical switching infrastructure914 via respective lengths of optical cabling922-1 to922-7. In some embodiments, each such length of optical cabling may extend from its corresponding MPCM to an optical interconnect loom923 that is external to the sled spaces ofrack902. In various embodiments, optical interconnect loom923 may be arranged to pass through a support post or other type of load-bearing element ofrack902. The embodiments are not limited in this context. Because inserted sleds connect to an optical switching infrastructure via MPCMs, the resources typically spent in manually configuring the rack cabling to accommodate a newly inserted sled can be saved.
FIG. 10 illustrates an example of asled1004 that may be representative of a sled designed for use in conjunction withrack902 ofFIG. 9 according to some embodiments.Sled1004 may feature anMPCM1016 that comprises anoptical connector1016A and apower connector1016B, and that is designed to couple with a counterpart MPCM of a sled space in conjunction with insertion ofMPCM1016 into that sled space.Coupling MPCM1016 with such a counterpart MPCM may causepower connector1016 to couple with a power connector comprised in the counterpart MPCM. This may generally enablephysical resources1005 ofsled1004 to source power from an external source, viapower connector1016 andpower transmission media1024 that conductively couplespower connector1016 tophysical resources1005.
Sled1004 may also include dual-mode optical network interface circuitry1026. Dual-mode optical network interface circuitry1026 may generally comprise circuitry that is capable of communicating over optical signaling media according to each of multiple link-layer protocols supported by dual-modeoptical switching infrastructure914 ofFIG. 9. In some embodiments, dual-mode optical network interface circuitry1026 may be capable both of Ethernet protocol communications and of communications according to a second, high-performance protocol. In various embodiments, dual-mode optical network interface circuitry1026 may include one or more optical transceiver modules1027, each of which may be capable of transmitting and receiving optical signals over each of one or more optical channels. The embodiments are not limited in this context.
Coupling MPCM1016 with a counterpart MPCM of a sled space in a given rack may causeoptical connector1016A to couple with an optical connector comprised in the counterpart MPCM. This may generally establish optical connectivity between optical cabling of the sled and dual-mode optical network interface circuitry1026, via each of a set ofoptical channels1025. Dual-mode optical network interface circuitry1026 may communicate with thephysical resources1005 ofsled1004 viaelectrical signaling media1028. In addition to the dimensions of the sleds and arrangement of components on the sleds to provide improved cooling and enable operation at a relatively higher thermal envelope (e.g., 250 W), as described above with reference toFIG. 9, in some embodiments, a sled may include one or more additional features to facilitate air cooling, such as a heatpipe and/or heat sinks arranged to dissipate heat generated byphysical resources1005. It is worthy of note that although theexample sled1004 depicted inFIG. 10 does not feature an expansion connector, any given sled that features the design elements ofsled1004 may also feature an expansion connector according to some embodiments. The embodiments are not limited in this context.
FIG. 11 illustrates an example of adata center1100 that may generally be representative of one in/for which one or more techniques described herein may be implemented according to various embodiments. As reflected inFIG. 11, a physicalinfrastructure management framework1150A may be implemented to facilitate management of aphysical infrastructure1100A ofdata center1100. In various embodiments, one function of physicalinfrastructure management framework1150A may be to manage automated maintenance functions withindata center1100, such as the use of robotic maintenance equipment to service computing equipment withinphysical infrastructure1100A. In some embodiments,physical infrastructure1100A may feature an advanced telemetry system that performs telemetry reporting that is sufficiently robust to support remote automated management ofphysical infrastructure1100A. In various embodiments, telemetry information provided by such an advanced telemetry system may support features such as failure prediction/prevention capabilities and capacity planning capabilities. In some embodiments, physicalinfrastructure management framework1150A may also be configured to manage authentication of physical infrastructure components using hardware attestation techniques. For example, robots may verify the authenticity of components before installation by analyzing information collected from a radio frequency identification (RFID) tag associated with each component to be installed. The embodiments are not limited in this context.
As shown inFIG. 11, thephysical infrastructure1100A ofdata center1100 may comprise anoptical fabric1112, which may include a dual-mode optical switching infrastructure1114.Optical fabric1112 and dual-mode optical switching infrastructure1114 may be the same as—or similar to—optical fabric412 ofFIG. 4 and dual-modeoptical switching infrastructure514 ofFIG. 5, respectively, and may provide high-bandwidth, low-latency, multi-protocol connectivity among sleds ofdata center1100. As discussed above, with reference toFIG. 1, in various embodiments, the availability of such connectivity may make it feasible to disaggregate and dynamically pool resources such as accelerators, memory, and storage. In some embodiments, for example, one or more pooledaccelerator sleds1130 may be included among thephysical infrastructure1100A ofdata center1100, each of which may comprise a pool of accelerator resources—such as co-processors and/or FPGAs, for example—that is available globally accessible to other sleds viaoptical fabric1112 and dual-mode optical switching infrastructure1114.
In another example, in various embodiments, one or more pooledstorage sleds1132 may be included among thephysical infrastructure1100A ofdata center1100, each of which may comprise a pool of storage resources that is available globally accessible to other sleds viaoptical fabric1112 and dual-mode optical switching infrastructure1114. In some embodiments, such pooledstorage sleds1132 may comprise pools of solid-state storage devices such as solid-state drives (SSDs). In various embodiments, one or more high-performance processing sleds1134 may be included among thephysical infrastructure1100A ofdata center1100. In some embodiments, high-performance processing sleds1134 may comprise pools of high-performance processors, as well as cooling features that enhance air cooling to yield a higher thermal envelope of up to 250 W or more. In various embodiments, any given high-performance processing sled1134 may feature anexpansion connector1117 that can accept a far memory expansion sled, such that the far memory that is locally available to that high-performance processing sled1134 is disaggregated from the processors and near memory comprised on that sled. In some embodiments, such a high-performance processing sled1134 may be configured with far memory using an expansion sled that comprises low-latency SSD storage. The optical infrastructure allows for compute resources on one sled to utilize remote accelerator/FPGA, memory, and/or SSD resources that are disaggregated on a sled located on the same rack or any other rack in the data center. The remote resources can be located one switch jump away or two-switch jumps away in the spine-leaf network architecture described above with reference toFIG. 5. The embodiments are not limited in this context.
In various embodiments, one or more layers of abstraction may be applied to the physical resources ofphysical infrastructure1100A in order to define a virtual infrastructure, such as a software-definedinfrastructure1100B. In some embodiments, virtual computing resources1136 of software-definedinfrastructure1100B may be allocated to support the provision ofcloud services1140. In various embodiments, particular sets of virtual computing resources1136 may be grouped for provision to cloudservices1140 in the form ofSDI services1138. Examples ofcloud services1140 may include—without limitation—software as a service (SaaS)services1142, platform as a service (PaaS)services1144, and infrastructure as a service (IaaS) services1146.
In some embodiments, management of software-definedinfrastructure1100B may be conducted using a virtualinfrastructure management framework1150B. In various embodiments, virtualinfrastructure management framework1150B may be designed to implement workload fingerprinting techniques and/or machine-learning techniques in conjunction with managing allocation of virtual computing resources1136 and/orSDI services1138 tocloud services1140. In some embodiments, virtualinfrastructure management framework1150B may use/consult telemetry data in conjunction with performing such resource allocation. In various embodiments, an application/service management framework1150C may be implemented in order to provide QoS management capabilities forcloud services1140. The embodiments are not limited in this context.
FIG. 12 illustrates an example of alogic flow1200 that may be representative of a maintenance algorithm for a data center, such as one or more ofdata center100 ofFIG. 1,data center300 ofFIG. 3,data center400 ofFIG. 4, anddata center1100 ofFIG. 11. As shown inFIG. 12, data center operation information may be collected at1202. In various embodiments, the collected data center operation information may include information describing various characteristics of ongoing operation of the data center, such as resource utilization levels, workload sizes, throughput rates, temperature measurements, and so forth. In some embodiments, the collected data center operation information may additionally or alternatively include information describing other characteristics of the data center, such as the types of resources comprised in the data center, the locations/distributions of such resources within the data center, the capabilities and/or features of those resources, and so forth. The embodiments are not limited to these examples.
Based on data center operation information such as may be collected at1202, a maintenance task to be completed may be identified at1204. In one example, based on data center operation information indicating that processing resources on a given sled are non-responsive to communications from resources on other sleds, it may be determined at1204 that the sled is to be pulled for testing. In another example, based on data center operation information indicating that a particular DIMM has reached the end of its estimated service life, it may be determined that the DIMM is to be replaced. At1206, a set of physical actions associated with the maintenance task may be determined, and those physical actions may be performed at1208 in order to complete the maintenance task. For instance, in the aforementioned example in which it is determined at1204 that a DIMM is to be replaced, the physical actions identified at1206 and performed at1208 may include traveling to a particular rack in order to access a sled comprising the DIMM, removing the DIMM from a socket on the sled, and inserting a replacement DIMM into the socket. The embodiments are not limited to this example.
FIG. 13 illustrates an overhead view of anexample data center1300. According to various embodiments,data center1300 may be representative of a data center in which various operations associated with data center maintenance—such as operations associated with one or more ofblocks1202,1204,1206, and1208 inlogic flow1200 ofFIG. 12—are automated using the capabilities of robotic maintenance equipment. According to some embodiments,data center1300 may be representative of one or more ofdata center100 ofFIG. 1,data center300 ofFIG. 3,data center400 ofFIG. 4, anddata center1100 ofFIG. 11. The embodiments are not limited in this context.
In various embodiments, according to an automated maintenance scheme implemented indata center1300,robots1360 may be used to service, repair, replace, clean, test, configure, upgrade, move, position, and/or otherwise manipulate equipment housed inracks1302.Racks1302 may be arranged in such fashion as to define and/or accommodate access pathways via whichrobots1360 can physically access such equipment.Robots1360 may traverse such access pathways in conjunction with moving around indata center1300 to perform various tasks. Physical features of equipment housed inracks1302 may be designed to facilitate robotic manipulation/handling. It is to be appreciated that in various embodiments, the equipment housed inracks1302 may include some equipment that is not robotically accessible/serviceable. Further, in some embodiments, there may be some equipment withindata center1300 that is robotically accessible/serviceable but is not housed inracks1302. The embodiments are not limited in this context.
FIG. 14 illustrates a block diagram of anautomated maintenance device1400 that may be representative of any givenrobot1360 indata center1300 ofFIG. 13 according to various embodiments. As shown inFIG. 14, automatedmaintenance device1400 may comprise a variety of elements. In the non-limiting example depicted inFIG. 14, automatedmaintenance device1400 compriseslocomotion elements1462,manipulation elements1463,sensory elements1464,communication elements1465,interfaces1466, memory/storage elements1467, and operations management and control (OMC)elements1468.
Locomotion elements1462 may generally comprise physical elements enablingautomated maintenance device1400 to move around within a data center. In various embodiments,locomotion elements1462 may comprise wheels. In some embodiments,locomotion elements1462 may comprise caterpillar tracks. In various embodiments,automated maintenance device1400 may provide the motive power/force required for motion. For example, in some embodiments,automated maintenance device1400 may feature a battery that provides power to drive wheels or tracks used byautomated maintenance device1400 for moving around in a data center. In various other embodiments, the motive power/force may be provided by an external source. The embodiments are not limited in this context.
Manipulation elements1463 may generally comprise physical elements that are usable to manipulate various types of equipment in a data center. In some embodiments,manipulation elements1463 may include one or more robotic arms. In various embodiments,manipulation elements1463 may include one or more multi-link manipulators. In some embodiments,manipulation elements1463 may include one or more end effectors usable for gripping various types of equipment, components, and/or other objects within the data center. In various embodiments,manipulation elements1463 may include one or more end effectors comprising impactive grippers, such as jaw or claw grippers. In some embodiments,manipulation elements1463 may include one or more end effectors comprising ingressive grippers, which may feature pins, needles, hackles, or other elements that are to physically penetrate the surface of an object being gripped. In various embodiments,manipulation elements1463 may include one or more end effectors comprising astrictive grippers, which may grip objects using air suction, magnetic adhesion, or electroadhesion. The embodiments are not limited to these examples.
Sensory elements1464 may generally comprise physical elements that are usable to sense various aspects of ambient conditions within a data center. Examples ofsensory elements1464 may include cameras, alignment guides/sensors, distance sensors, proximity sensors, barcode readers, RFID/NFC readers, temperature sensors, airflow sensors, air quality sensors, humidity sensors, and pressure sensors. The embodiments are not limited to these examples.
Communication elements1465 may generally comprise a set of electronic components and/or circuitry operable to perform functions associated with communications betweenautomated maintenance device1400 and one or more external devices. In a given embodiment, such communications may include wireless communications, wired communications, or both. In various embodiments,communication elements1465 may include elements operative to generate/construct packets, frames, messages, and/or other information to be wirelessly communicated to external device(s), and/or to process/deconstruct packets, frames, messages, and/or other information wirelessly received from external device(s). In various embodiments, for example,communication elements1465 may include baseband circuitry supporting wireless communications according to one or more wireless communication protocols/standards. In some embodiments,communication elements1465 may include elements operative to generate, process, construct, and/or deconstruct packets, frames, messages, and/or other information communicated over wired media. In various embodiments, for example,communication elements1465 may include network interface circuitry supporting wired communications according to one or more wired communication protocols/standards. The embodiments are not limited in this context.
In various embodiments,interfaces1466 may include one or more communication interfaces1466A. As reflected inFIG. 14, examples ofinterfaces1466 that automatedmaintenance device1400 may feature in various embodiments may include—without limitation—communication interfaces1466A,testing interfaces1466B,power interfaces1466C, and user interfaces1466D.
Communication interfaces1466A may generally comprise interfaces usable to transmit and/or receive signals via one or more communication media, which may include wired media, wireless media, or both. In various embodiments, communication interfaces1466A may include one or more wireless communication interfaces, such as radio frequency (RF) interfaces and/or optical wireless communication (OWC) interfaces. In some embodiments, communication interfaces may additionally or alternatively include one or more wired communication interfaces, such as interface(s) for communicating over media such as coaxial cable, twisted pair, and optical fiber. The embodiments are not limited to these examples.
In various embodiments,interfaces1466 may include one ormore testing interfaces1466B. Testing interfaces1466B may generally comprise interfaces via whichautomated maintenance device1400 is able to test physical components/resources of one or more types, which may include—without limitation—one or more of physical storage resources205-1, physical accelerator resources205-2, physical memory resources205-3, and physical compute resources205-4 ofFIG. 2. In an example embodiment, interfaces1466 may include atesting interface1466B that enables automatedmaintenance device1400 to test the functionality of a DIMM inserted into a testing slot. The embodiments are not limited to these examples.
In various embodiments,interfaces1466 may include one or more power interfaces1466C. Power interfaces1466C may generally comprise interfaces via whichautomated maintenance device1400 can draw and/or source power. In various embodiments,power interfaces1466C may include one or more interfaces via whichautomated maintenance device1400 can draw power from external source(s). In some embodiments,automated maintenance device1400 may feature one ormore power interfaces1466C configured to provide charge to one or more batteries (not shown), and automated maintenance device may draw its operating power from those one or more batteries. In various embodiments,automated maintenance device1400 may feature one ormore power interfaces1466C via which it can directly draw operating power. In various embodiments,automated maintenance device1400 may feature one ormore power interfaces1466C via which it can source power to external devices. For example, in various embodiments,automated maintenance device1400 may feature apower interface1466C via which it can source power to charge a battery of a second automated maintenance device. The embodiments are not limited to this example.
In some embodiments,interfaces1466 may include one or more user interfaces. User interfaces1466D may generally comprise interfaces via which information can be provided to human technicians and/or user input can be accepted from human technicians. Examples of user interfaces1466D may include displays, touchscreens, speakers, microphones, keypads, mice, trackballs, trackpads, joysticks, fingerprint readers, retinal scanners, buttons, switches, and the like. The embodiments are not limited to these examples.
Memory/storage elements1467 may generally comprise a set of electronic components and/or circuitry capable of retaining data, such as any of various types of data that may be generated, transmitted, received, and/or used byautomated maintenance device1400 during normal operation. In some embodiments, memory/storage elements1467 may include one or both of volatile memory and non-volatile memory. For example, in various embodiments, memory/storage elements1467 may include one or more of read-only memory (ROM), random-access memory (RAM), dynamic RAM (DRAM), Double-Data-Rate DRAM (DDRAM), synchronous DRAM (SDRAM), static RAM (SRAM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, polymer memory such as ferroelectric polymer memory, ovonic memory, phase change or ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, hard disks, an array of devices such as Redundant Array of Independent Disks (RAID) drives, solid state memory devices, solid state drives (SSDs), or any other type of media suitable for storing information. The embodiments are not limited to these examples.
OMC elements1468 may generally comprise a set of components and/or circuitry capable of performing computing operations required to implement logic for managing and controlling the operations ofautomated maintenance device1400. In various embodiments,OMC elements1468 may include processing circuitry, such as one or more processors/processing units. In some embodiments, an automation engine1469 may execute on such processing circuitry. Automation engine1469 may generally be operative to conduct overall management, control, coordination, and/or oversight of the operations ofautomated maintenance device1400. In various embodiments, this may include management, coordination, control, and/or oversight of the operations/usage of various other elements withinautomated maintenance device1400, such as any or all oflocomotion elements1462,manipulation elements1463,sensory elements1464,communication elements1465,interfaces1466, and memory/storage elements1467. The embodiments are not limited in this context.
FIG. 15 illustrates an example of anoperating environment1500 that may be representative of the implementation of an automated maintenance scheme indata center1300 according to various embodiments. According to such an automated maintenance scheme, anautomation coordinator1555 may centrally manage/coordinate various aspects of automated maintenance operations indata center1300. In some embodiments,automation coordinator1555 may centrally manage/coordinate various aspects of automated maintenance operations indata center1300 based in part ontelemetry data1571 provided by atelemetry framework1570. According to various embodiments,telemetry framework1570 may be representative of an advanced telemetry system that performs telemetry reporting forphysical infrastructure1100A indata center1100 ofFIG. 11, andautomation coordinator1555 may be representative of automated maintenance coordination functionality of physicalinfrastructure management framework1150A. The embodiments are not limited in this context.
In some embodiments, management/coordination functionality ofautomation coordinator1555 may be provided by a coordination engine1572. In various embodiments, coordination engine1572 may execute on processing circuitry ofautomation coordinator1555. In various embodiments, coordination engine1572 may generateautomation commands1573 for transmission torobots1360 in order to instructrobots1360 to perform automated maintenance tasks and/or actions associated with such tasks. In some embodiments,robots1360 may provideautomation coordinator1555 with various types offeedback1574 in order to—for example—acknowledge automation commands1573, report the results of attempted maintenance tasks, provide information regarding the statuses of components, resources, and/or equipment, provide information regarding information regarding the statuses ofrobots1360 themselves, and/or report measurements of one or more aspects of ambient conditions in the data center. The embodiments are not limited to these examples.
In some embodiments, coordination engine1572 may consider various types of information in conjunction with automated maintenance coordination/management. As reflected inFIG. 15, examples of such types of information may includephysical infrastructure information1575, datacenter operations information1576,maintenance task information1577, andmaintenance equipment information1579.
Physical infrastructure information1575 may generally comprise information identifying equipment, devices, components, interconnects, physical resources, and/or other infrastructure elements that comprise portions of the physical infrastructure ofdata center1300, and describing characteristics of such elements. Datacenter operations information1576 may generally comprise information describing various aspects of ongoing operations withindata center1300. In some embodiments, for example, datacenter operations information1576 may include information describing one or more workloads currently being processed indata center1300. In various embodiments, datacenter operations information1576 may include metrics characterizing one or more aspects of current operations indata center1300. For example, in some embodiments, datacenter operations information1576 may include performance metrics characterizing the relative level of performance currently being achieved indata center1300, efficiency metrics characterizing the relative level of efficiency with which the physical resources ofdata center1300 are being used to handle the current workloads, and utilization metrics generally indicative of current usage levels of various types of resources indata center1300. In various embodiments, datacenter operations information1576 may includetelemetry data1571, such asautomation coordinator1555 may receive viatelemetry framework1570 or fromrobots1360. The embodiments are not limited in this context.
Maintenance task information1577 may generally comprise information identifying and describing ongoing and pending maintenance tasks ofdata center1300.Maintenance task information1577 may also include information identifying and describing previously completed maintenance tasks. In various embodiments,maintenance task information1577 may include a pending task queue1578. Pending task queue1578 may generally comprise information identifying a set of maintenance tasks that need to be performed indata center1300.Maintenance equipment information1579 may generally comprise identifying and describing automated maintenance equipment—such asrobots1360—ofdata center1300. In some embodiments,maintenance equipment information1579 may include a candidate device pool1580. Candidate device pool1580 may generally comprise information identifying a set ofrobots1360 that are currently available for use indata center1300. The embodiments are not limited in this context.
In various embodiments, based ontelemetry data1571,automation coordinator1555 may identify automated maintenance tasks to be performed indata center1300 byrobots1360. For example, based ontelemetry data1571 indicating a high bit error rate at a DIMM,automation coordinator1555 may determine that arobot1360 should be assigned to replace that DIMM. In some embodiments,automation coordinator1555 may usetelemetry data1571 to prioritize among automated maintenance tasks, such as tasks comprised in pending task queue1578. For example,automation coordinator1555 may usetelemetry data1571 to assess the respective expected performance impacts of multiple automated maintenance tasks in pending task queue1578, and may assign out an automated maintenance task with the highest expected performance impact first. In some embodiments, in identifying and/or prioritizing among automated maintenance tasks,automation coordinator1555 may consider any or all ofphysical infrastructure information1575, datacenter operations information1576,maintenance task information1577, andmaintenance equipment information1579 in addition to—or in lieu of—telemetry data1571.
In a first example,automation coordinator1555 may assign a low priority to an automated maintenance task involving replacement of a malfunctioning compute sled based onphysical infrastructure information1575 indicating that another sled in a different rack can be used as a substitute without need for replacing the malfunctioning compute sled. In a second example,automation coordinator1555 may assign a high priority to an automated maintenance task involving replacing a malfunctioning memory sled based on datacenter operation information1576 indicating that a scarcity of memory constitutes a performance bottleneck with respect to workloads being processed indata center1300. In a third example,automation coordinator1555 may determine not to add a new maintenance task to pending task queue1578 based on a determination that a maintenance task already present in pending task queue1578 may render the new maintenance task unnecessary and/or moot. In a fourth example, in determining an extent to which to prioritize an automated maintenance task that requires the use ofparticular robots1360 featuring specialized capabilities,automation coordinator1555 may considermaintenance equipment information1579 indicating whether anyrobots1360 featuring such specialized capabilities are currently available. The embodiments are not limited to these examples.
In various embodiments, based ontelemetry data1571,automation coordinator1555 may control the positioning and/or movement ofrobots1360 withindata center1300. For example, having usedtelemetry data1571 to identify a region ofdata center1300 within which a greater number of hardware failures have been and/or are expected to be observed,automation coordinator1555 may positionrobots1360 more densely within that identified region than within other regions ofdata center1300. The embodiments are not limited in this context.
In some embodiments, in response to automated maintenance decisions—such as may be reached based on any or all oftelemetry data1571,physical infrastructure information1575, datacenter operations information1576,maintenance task information1577, andmaintenance equipment information1579—automation coordinator1555 may sendautomation commands1573 torobots1360 in order to instructrobots1360 to perform operations associated with automated maintenance tasks. For example, upon determining that a particular compute sled should be replaced,automation coordinator1555 may send anautomation command1573 in order to instruct arobot1360 to perform a sled replacement procedure to replace the sled. In various embodiments,automation coordinator1555 may informrobots1360 of various parameters characterizing assigned automated maintenance tasks by including such parameters in automation commands1573. For instance, in the context of the preceding example, theautomation command1573 may contain fields specifying a sled ID uniquely identifying the sled to be replaced and a rack ID and/or sled space ID identifying the location of that sled within the data center, as well as analogous parameters associated with the replacement sled. The embodiments are not limited to this example.
It is worthy of note that in various embodiments, with respect to some aspects of automated maintenance operations, decision-making may be handled in a distributed—rather than centralized—fashion. In such embodiments,robots1360 may make some automated maintenance decisions autonomously. In some such embodiments, as illustrated inFIG. 15,robots1360 may perform such autonomous decision-making based ontelemetry data1571 received fromtelemetry framework1570. In an example embodiment, arobot1360 may determine based on analysis oftelemetry data1571 that a particular CPU is malfunctioning, and autonomously decide to replace that malfunctioning CPU. In various embodiments, some or all of therobots1360 indata center1300 may have access to any or all ofphysical infrastructure information1575, datacenter operations information1576,maintenance task information1577, andmaintenance equipment information1579, and may consider such information as well in conjunction with autonomous decision-making. In various embodiments, distributed coordination functions may be implemented to enable some types of maintenance tasks to be completed via collaborative maintenance procedures involving cooperation between multiple robots. The embodiments are not limited in this context.
FIG. 16 illustrates an example of anoperating environment1600 that may be representative of various embodiments. Inoperating environment1600, in conjunction with automated maintenance operations indata center1300,robots1360 may provideautomation coordinator1555 withfeedback1574 that includes one or more ofposition data1681,assistance data1682, andenvironmental data1683. The embodiments are not limited to these examples. It is worthy of note that in some embodiments, although not depicted inFIG. 16,robots1360 may gather various types oftelemetry data1571 in conjunction with automated maintenance operations and include such gatheredtelemetry data1571 in thefeedback1574 provided toautomation coordinator1555. The embodiments are not limited in this context.
Position data1681 may generally comprise data for use byautomation coordinator1555 to determine/track the positions and/or movements ofrobots1360 withindata center1300. In some embodiments,position data1681 may comprise data associated with an indoor positioning system. In some such embodiments, the indoor positioning system may be a radio-based system, such as a Wi-Fi-based or Bluetooth-based indoor positioning system. In some other embodiments, a non-radio based positioning system, such as a magnetic, optical, or inertial indoor positioning system may be used. In various embodiments, the indoor positioning system may be a hybrid system, such as one that combines two or more of radio-based, magnetic, optical, and inertial indoor positioning techniques. The embodiments are not limited in this context.
Assistance data1682 may generally comprise data for use byautomation coordinator1555 to provide human maintenance personnel with information aiding them in the identification and/or performance of manual maintenance tasks. In various embodiments, a givenrobot1360 may generateassistance data1682 in response to identifying a maintenance issue that it cannot correct/resolve in an automated fashion. For instance, after identifying a component that needs to be replaced and determining that it cannot perform the replacement itself, arobot1360 take a picture of the component and provideassistance data1682 comprising that picture toautomation coordinator1555.Automation coordinator1555 may then cause the picture to be presented on a display for reference by human maintenance personnel in order to aid visual identification of the component to be replaced. The embodiments are not limited to this example.
In some embodiments, the performance and/or reliability of various types of hardware indata center1300 may potentially be affected by one or more aspects of the ambient conditions withindata center1300, such as ambient temperature, pressure, humidity, and air quality. For example, a rate at which corrosion occurs on metallic contacts of components such as DIMMs may depend on the ambient temperature and humidity. In various embodiments, it may thus be desirable to monitor various types of environmental parameters at various locations during ongoing operations ofdata center1300.
In some embodiments,robots1360 may be configured to support environmental condition monitoring by measuring one or more aspects of ambient conditions within the data center during ongoing operations and providing those collected measurements toautomation coordinator1555 in the form ofenvironmental data1683. In various embodiments,robots1360 may collectenvironmental data1683 using sensors or sensor arrays comprising sensory elements such assensory elements1464 ofFIG. 14. Examples of conditions/parameters thatrobots1360 may measure and report toautomation coordinator1555 in the form ofenvironmental data1683 may include—without limitation—temperature, pressure, humidity, and air quality. In some embodiments, in conjunction with providing environmental condition measurements in the form ofenvironmental data1683,robots1360 may also providecorresponding position data1681 that indicates the locations at which the associated measurements were performed. The embodiments are not limited in this context.
In various embodiments, access to dynamic, continuous, and location-specific measurements of such parameters may enable a data center operator to predict failures, dynamically configure systems for best performance, and dynamically move resources for data center optimization. In some embodiments, based onenvironmental data1683 provided byrobots1360, a data center operator may be able to predict accelerated failure of parts versus standard factory specification and replace parts earlier (or move to lower priority tasks). In various embodiments,environmental data1683 provided byrobots1360 may enable a data center operator to initiate service tickets ahead of predicted failure timelines. For example, a cleaning of DIMM contacts may be initiated in order to avoid corrosion build-up to the level where failures start occurring. In some embodiments,environmental data1683 provided byrobots1360 may enable a data center operator to continuously and dynamically configure servers based on, for example, altitude, pressure and other parameters that may be important to such things as fan speeds and cooling configurations which in turn may affect performance of a server in a given environment and temperature. In various embodiments,environmental data1683 provided byrobots1360 may enable a data center operator to detect and move data center resources automatically from zones/locations of the data center that may be affected by equipment failures or environment variations detected by the robot's sensors. For example, based onenvironmental data1683 indicating an excessive temperature or air quality deterioration in a particular data center region, servers and/or other resources may be relocated from the affected region to a different region. The embodiments are not limited to these examples.
FIG. 17 illustrates an example of anoperating environment1700 that may be representative of the implementation of an automated data center maintenance scheme according to some embodiments. Inoperating environment1700, arobot1760 may perform one or more automated maintenance tasks at arack1702. According to some embodiments,robot1760 may be representative of arobot1360 that performs operations associated with automated data center maintenance indata center1300 ofFIGS. 13, 15, and 16. In various embodiments,robot1760 may be implemented using automatedmaintenance device1400 ofFIG. 14. In various embodiments, as reflected by the dashed line inFIG. 17,robot1760 may move to a location ofrack1702 from another location in order to perform one or more automated maintenance tasks atrack1702. In some embodiments,robot1760 may perform one or more such tasks based on automation commands1773 received fromautomation coordinator1555. In various embodiments,robot1760 may additionally or alternatively perform one or more such tasks autonomously, without intervention on the part ofautomation coordinator1555. The embodiments are not limited in this context.
In some embodiments,robot1760 may perform one or more automated maintenance tasks involving the installation and/or removal of sleds at racks of a data center such asdata center1300. In various embodiments, for example,robot1760 may be operative to install asled1704 atrack1702. In some embodiments,robot1760 may installsled1704 by inserting it into an available sled space ofrack1702. In various embodiments, in conjunction with insertingsled1704,robot1760 may grip particular physical elements designed to accommodate robotic manipulation/handling. In some embodiments,robot1760 may use image recognition and/or other location techniques to locate the elements to be gripped, and may insertsled1704 while gripping those elements. In various embodiments, rather than installingsled1704,robot1760 may instead removesled1704 fromrack1702 and install areplacement sled1704B. In some embodiments,robot1760 may installreplacement sled1704B in a same sled space as was occupied bysled1704, once it has removedsled1704. In various other embodiments,robot1760 may installreplacement sled1704B in a different sled space, such that it does not need to removesled1704 before installingreplacement sled1704B. The embodiments are not limited in this context.
In some embodiments,robot1760 may perform one or more automated maintenance tasks involving upkeep, repair, and/or replacement of particular components on sleds of a data center such asdata center1300. In various embodiments,robot1760 may be used to power up acomponent1706 in accordance with a scheme for periodically powering up components in the data center on a periodic basis in order to improve the reliability of such components. In some embodiments, for example, storage and/or memory components may tend to malfunction when left idle for excessive periods of time, and thus robots may be used to power up such components according to a defined cycle. In such an embodiment,robot1760 may be operative to power up anappropriate component1706 by plugging thatcomponent1706 into a powered interface/slot. The embodiments are not limited to this example.
In various embodiments,robot1760 may be operative to manipulate a givencomponent1706 in accordance with a scheme for automated upkeep of pooled memory resources of a data center. According to such a scheme, robots may be used to assess/troubleshoot apparently malfunctioning memory resources such as DIMMs. In some embodiments, according to such a scheme,robot1760 may identify acomponent1706 comprising a memory resource such as a DIMM, remove thatcomponent1706 from a slot onsled1704, and clean thecomponent1706.Robot1760 may then test thecomponent1706 to determine whether the issue has been resolved, and may determine to pullsled1704 for “back-room” servicing if it finds that the problem persists. In various embodiments,robot1760 may test thecomponent1706 after reinserting it into its slot onsled1704. In some other embodiments,robot1760 may be configured with a testing slot into which it can insert thecomponent1706 for the purpose of testing. The embodiments are not limited in this context.
FIG. 18 illustrates an example of anoperating environment1800 that may be representative of the implementation of an automated data center maintenance scheme according to some embodiments. Inoperating environment1800, arobot1860 may perform automated CPU cache servicing for asled1804 at arack1802. According to some embodiments,robot1860 may be representative of arobot1360 that performs operations associated with automated data center maintenance indata center1300 ofFIGS. 13, 15, and 16. In various embodiments,robot1860 may be implemented using automatedmaintenance device1400 ofFIG. 14. In some embodiments, as reflected by the dashed line inFIG. 18,robot1860 may move to a location ofrack1802 from another location in order to perform the automated CPU cache servicing forsled1804. In various embodiments,robot1860 may perform such automated CPU cache servicing based on automation commands1873 received fromautomation coordinator1555. In some other embodiments,robot1860 may perform the automated CPU cache servicing autonomously, without intervention on the part ofautomation coordinator1555. The embodiments are not limited in this context.
As shown inFIG. 18,sled1804 may comprisecomponents1806 that include a CPU1806A,cache memory1806B for the CPU1806A, and aheatsink1806C for the CPU1806A. In various embodiments,cache memory1806B may underlie CPU1806A, and CPU1806A may underlieheatsink1806C. In some embodiments,cache memory1806B may comprise one or more cache memory modules. In various embodiments, the automated CPU cache servicing thatrobot1860 performs inoperating environment1800 may involve replacingcache memory1806B. For example, in some embodiments,cache memory1806B may comprise one or more cache memory modules thatrobot1860 removes fromsled1804 and replaces with one or more replacement cache modules. In various embodiments, the determination to perform automated CPU cache servicing and thus replacecache memory1806B may be based on a determination thatcache memory1806B is not functioning properly or is outdated. For example, in some embodiments,automation coordinator1555 may determine—based ontelemetry data1571 ofFIG. 15—thatcache memory1806B is not functioning, and may userobot1860 to replacecache memory1806B in response to that determination. The embodiments are not limited to this example.
In various embodiments, according to a procedure for automated CPU cache servicing,robot1860 may remove CPU1806A andheat sink1806C fromsled1804 in order to gain physical access tocache memory1806B. In some embodiments,robot1860 may removesled1804 fromrack1802 prior to removing CPU1806A andheat sink1806C fromsled1804. In various other embodiments,robot1860 may remove CPU1806A andheat sink1806C fromsled1804 whilesled1804 remains seated within a sled space ofrack1802. In some embodiments,robot1860 may first removeheat sink1806C, and then remove CPU1806A. In various other embodiments,robot1860 may remove bothheat sink1806C and CPU1806A simultaneously and/or as a collective unit (i.e., without removingheat sink1806C from CPU1806A). In some embodiments, after replacingcache memory1806B,robot1860 may reinstall CPU1806A andheat sink1806C uponsled1804, which it may then reinsert into a sled space ofrack1802 in embodiments in which it was previously removed. The embodiments are not limited in this context.
FIG. 19 illustrates an example of anoperating environment1900 that may be representative of the implementation of an automated data center maintenance scheme according to some embodiments. Inoperating environment1900, arobot1960 may perform automated storage and/or transfer of a compute state of acompute sled1904 at arack1902. According to some embodiments,robot1760 may be representative of arobot1360 that performs operations associated with automated data center maintenance indata center1300 ofFIGS. 13, 15, and 16. In various embodiments,robot1960 may be implemented using automatedmaintenance device1400 ofFIG. 14. In some embodiments, as reflected by the dashed line inFIG. 19,robot1960 may move to a location ofrack1902 from another location in order to perform the automated storage and/or transfer of the compute state ofcompute sled1904. In various embodiments,robot1960 may perform such automated compute state storage and/or transfer based on automation commands1973 received fromautomation coordinator1555. In some other embodiments,robot1960 may perform the automated compute state storage and/or transfer autonomously, without intervention on the part ofautomation coordinator1555. The embodiments are not limited in this context.
As shown inFIG. 19,compute sled1904 may comprise components1906 that include one or more CPUs1906A and aconnector1906B. In various embodiments,compute sled1904 may comprise two CPUs1906A. In some other embodiments,compute sled1904 may comprise more than two CPUs1906A, or only a single CPU1906A.Connector1906B may generally comprise a slot, socket, or other connective component designed to accept a memory daughter card for use to store a compute state ofcompute sled1904. In various embodiments,compute sled1904 may comprise two CPUs1906A andconnector1906B may be located between those two CPUs1906A. The embodiments are not limited in this context.
In some embodiments, according to a procedure for automated compute state storage and/or transfer,robot1960 may insert amemory card1918 intoconnector1906B. In various embodiments,robot1960 may removecompute sled1904 fromrack1902 prior to insertingmemory card1918 intoconnector1906B. In some other embodiments,robot1960 may insertmemory card1918 intoconnector1906B whilecompute sled1904 remains seated within a sled space ofrack1902. In still other embodiments,memory card1918 may be present and coupled withconnector1906B prior to initiation of the automated compute state storage and/or transfer procedure. In various embodiments,memory card1918 may comprise a set of physical memory resources1906C. In some embodiments, once memory card is inserted into/coupled withconnector1906B, a compute state1984 ofcompute sled1904 may be stored onmemory card1918 using one or more of the physical memory resources1906C comprised thereon. In various embodiments, compute state1984 may include respective states of each CPU1906A comprised oncompute sled1904. In some embodiments, compute state1984 may also include states of one or more memory resources comprised oncompute sled1904. The embodiments are not limited in this context.
In various embodiments,robot1960 may perform an automated compute state storage/transfer procedure in order to preserve the compute state ofcompute sled1904 during upkeep/repair ofcompute sled1904. In some such embodiments, once compute state1984 is stored onmemory card1918,robot1960 may removememory card1918 fromconnector1906B, perform upkeep/repair ofcompute sled1904, reinsertmemory card1918 intoconnector1906B, and then restorecompute sled1904 to the compute state1984 stored onmemory card1918. For instance, in an example embodiment,robot1960 may remove a CPU1906A from a socket oncompute sled1904 and insert a replacement CPU into that socket, and then causecompute sled1904 to be restored to the compute state1984 stored onmemory card1918. In various other embodiments,robot1960 may perform an automated compute state storage/transfer procedure in order to replacecompute sled1904 with another compute sled. In some such embodiments, once compute state1984 is stored onmemory card1918,robot1960 may removememory card1918 fromconnector1906B, insertmemory card1918 into a connector on a replacement compute sled, insert the replacement compute sled into a sled space ofrack1902 or another rack, and cause the replacement compute sled to realize the compute state1984 stored onmemory card1918. The embodiments are not limited in this context.
FIG. 20 illustrates an example of anoperating environment2000. According to various embodiments, operatingenvironment2000 may be representative of the implementation of an automated data center maintenance scheme according to which some aspects of automated maintenance operations involve collaboration/cooperation between robots. Inoperating environment2000, in conjunction with performing a collaborative maintenance task,robots2060A and2060B may coordinate with each other by exchanginginterdevice coordination information2086A and2086B via one ormore communication links2085.Communication links2085 may comprise wireless communication links, wired communication links, or a combination of both. According to some embodiments,robots2060A and2060B may be representative ofrobots1360 that perform operations associated with automated data center maintenance indata center1300 ofFIGS. 13, 15, and16. In various embodiments, one or both ofrobots2060A and2060B may be implemented using automatedmaintenance device1400 ofFIG. 14.
It is worthy of note that the absence ofautomation coordinator1555 inFIG. 20 is not intended to indicate that no aspects of automated maintenance would/could be centrally coordinated inoperating environment2000. It is both possible and contemplated that in various embodiments, distributed coordination may be implemented for some aspects of automated maintenance in a data center in which other aspects of automated maintenance are centrally coordinated by an entity such asautomation coordinator1555. For example, inoperating environment2000, a central automation coordinator may determine the need for performance of the collaborative maintenance task,select robots2060A and2060B as the robots that are to perform the collaborative maintenance task, and send automation commands to causerobots2060A and2060B to initiate the collaborative maintenance task.Robots2060A and2060B may then coordinate directly with each other in conjunction with performing the physical actions necessary to complete the collaborative maintenance task. The embodiments are not limited to this example.
FIG. 21 illustrates an example of alogic flow2100 that may be representative of the implementation of one or more of the disclosed techniques according to some embodiments. For example,logic flow2100 may be representative of operations thatautomation coordinator1555 may perform in any ofoperating environments1500,1600,1700,1800,1900, and2000 ofFIGS. 15-20 according to various embodiments. As shown inFIG. 21, at2102, a maintenance task that is to be performed in a data center may be identified. For example, inoperating environment1500 ofFIG. 15,automation coordinator1555 may identify a maintenance task that is to be performed indata center1300.
At2104, a determination may be made to initiate automated performance of the maintenance task. For example, having added an identified maintenance task to pending task queue1578 inoperating environment1500 ofFIG. 15,automation coordinator1555 may determine at a subsequent point in time that that maintenance task constitutes the highest priority task in the pending task queue1578 and thus that its performance should be initiated. In another example, rather than adding the identified maintenance task to pending task queue1578,automation coordinator1555 may determine to initiate performance of the maintenance task immediately after it is identified.
At2106, an automated maintenance device to which to assign the maintenance task may be selected. For example, among one ormore robots1360 comprised in candidate device pool1580 inoperating environment1500 ofFIG. 15,automation coordinator1555 may select arobot1360 to which to assign an identified maintenance task. It is worthy of note that in some embodiments, the identified maintenance task may be handled by multiple robots according to a collaborate maintenance procedure. In such cases, more than one automated maintenance device may be selected at2106 as an assignee of the maintenance task. For example, inoperating environment1500 ofFIG. 15,automation coordinator1555 may selectmultiple robots1360 among those comprised in candidate device pool1580 that are to work together according to a collaborative maintenance procedure to complete a maintenance task.
At2108, one or more automation commands may be sent to cause an automated maintenance device selected at2106 to perform an automated maintenance procedure associated with the maintenance task. For example, inoperating environment1500 ofFIG. 15,automation coordinator1555 may send one or more automation commands1573 to cause arobot1360 to perform an automated maintenance procedure associated with a maintenance task to which thatrobot1360 has been allocated. In some embodiments in which multiple automated maintenance devices are selected at2106 as assignees of the same maintenance task, automation commands may be sent to multiple automated maintenance devices at2108. For example, inoperating environment1500 ofFIG. 15,automation coordinator1555 may send respective automation command(s)1573 tomultiple robots1360 to cause those robots to perform a collaborative maintenance procedure associated with the maintenance task to be completed. The embodiments are not limited to these examples.
FIG. 22 illustrates an example of alogic flow2200 that may be representative of the implementation of one or more of the disclosed techniques according to some embodiments. For example,logic flow2200 may be representative of operations that may be performed in various embodiments by a robot such as arobot1360 in one or both ofoperating environments1500 and1600 ofFIGS. 15 and 16 and/or any ofrobots1760,1860,1960,2060A, and2060B in operatingenvironments1700,1800,1900, and2000 ofFIGS. 17-20. As shown inFIG. 22, one or more automation commands may be received from an automation coordinator of a data center at2202. For example, inoperating environment1500 ofFIG. 15, arobot1360 may receive one or more automation commands1573 fromautomation coordinator1555.
At2204, an automated maintenance procedure may be identified based on the one or more automation commands received at2202. For example, based on one or more automation commands1573 received fromautomation coordinator1555 inoperating environment1500 ofFIG. 15, arobot1360 may identify an automated maintenance procedure that it is to perform. The automated maintenance procedure identified at2204 may then be performed at2206. In various embodiments, the identification of the automated maintenance procedure at2204 may be based on a maintenance task code that is comprised in at least one of the received automation commands, and is defined to correspond to a particular automated maintenance procedure. For example, based on a maintenance task code comprised in anautomation command1573 received fromautomation coordinator1555, arobot1360 inoperating environment1500 ofFIG. 15 may identify an automated DIMM testing procedure as an automated maintenance procedure to be performed. In various embodiments, the one or more automation commands received at2202 may collectively contain one or more maintenance task parameters specifying particular details of the automated maintenance task, and such details may also be identified at2204. For instance, in the context of the preceding example, therobot1360 may identify—based on maintenance task parameters comprised in one or more automation commands1573 received fromautomation coordinator1555—details such as a physical resource ID of a DIMM to be tested, an identity and location of a sled on which that DIMM resides, and an identity of a particular DIMM slot on that sled that currently houses the DIMM. The embodiments are not limited to these examples.
FIG. 23 illustrates an example of alogic flow2300 that may be representative of the implementation of one or more of the disclosed techniques according to some embodiments. For example,logic flow2300 may be representative of operations that may be performed byrobot2060A orrobot2060B inoperating environment2000 ofFIG. 20. As shown inFIG. 23, a collaborative maintenance procedure that is to be performed in a data center may be identified at an automated maintenance device at2302. For example, inoperating environment2000 ofFIG. 20,robot2060A may determine that a collaborative CPU replacement procedure is to be performed. In some embodiments, the identification of the collaborative maintenance procedure at2302 may be based on one or more automation commands received by the automated maintenance device from a centralized automation coordinator such asautomation coordinator1555. In various other embodiments, the identification of the collaborative maintenance procedure at2302 may be performed autonomously. For example, inoperating environment1500 ofFIG. 15, arobot1360 may determine based on analysis oftelemetry data1571 that a particular CPU is malfunctioning, and may then identify a collaborative maintenance procedure to be performed in order to replace that malfunctioning CPU. The embodiments are not limited to this example.
A second automated maintenance device with which to collaborate during performance of the collaborative maintenance procedure may be identified at2304, and interdevice coordination information may be sent to the second automated maintenance device at2306 in order to initiate the collaborative maintenance procedure. For example, inoperating environment2000 ofFIG. 20,robot2060A may determine that it is to collaborate withrobot2060B in conjunction with a collaborative CPU replacement procedure, and may sendinterdevice coordination information2086A torobot2086B in order to initiate that collaborative CPU replacement procedure. In some embodiments, the identification of the second automated maintenance device may be based on information received from a centralized automation coordinator such asautomation coordinator1555. For example, in some embodiments, a centralized automation coordinator may be responsible for selecting the particular robots that are to work together to perform the collaborative maintenance procedure, and the identity of the second automated maintenance device may be indicated by a parameter comprised in an automation command received from the centralized automation coordinator. In other embodiments, the identification performed at2304 may correspond to an autonomous selection of the second automated maintenance device. For example, inoperating environment1500 ofFIG. 15, afirst robot1360 may select asecond robot1360 that is comprised among those in candidate device pool1580 as the second automated maintenance device that is to participate in the collaborative maintenance procedure. The embodiments are not limited to these examples.
FIG. 24A illustrates an embodiment of astorage medium2400.Storage medium2400 may comprise any computer-readable storage medium or machine-readable storage medium, such as an optical, magnetic or semiconductor storage medium. In some embodiments,storage medium2400 may comprise a non-transitory storage medium. In various embodiments,storage medium2400 may comprise an article of manufacture. In some embodiments,storage medium2400 may store computer-executable instructions, such as computer-executable instructions to implementlogic flow2100 ofFIG. 21. Examples of a computer-readable storage medium or machine-readable storage medium may include any tangible media capable of storing electronic data, including volatile memory or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth. Examples of computer-executable instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, object-oriented code, visual code, and the like. The embodiments are not limited to these examples.
FIG. 24B illustrates an embodiment of astorage medium2450.Storage medium2450 may comprise any computer-readable storage medium or machine-readable storage medium, such as an optical, magnetic or semiconductor storage medium. In some embodiments,storage medium2450 may comprise a non-transitory storage medium. In various embodiments,storage medium2450 may comprise an article of manufacture. According to some embodiments,storage medium2450 may be representative of a memory/storage element1467 comprised inautomated maintenance device1400 ofFIG. 14. In some embodiments,storage medium2450 may store computer-executable instructions, such as computer-executable instructions to implement one or both oflogic flow2200 ofFIG. 22 andlogic flow2300 ofFIG. 23. Examples of a computer-readable storage medium or machine-readable storage medium and of computer-executable instructions may include any of the respective examples identified above in reference tostorage medium2400 ofFIG. 24A. The embodiments are not limited to these examples.
FIG. 25 illustrates an embodiment of anexemplary computing architecture2500 that may be suitable for implementing various embodiments as previously described. In various embodiments, thecomputing architecture2500 may comprise or be implemented as part of an electronic device. In some embodiments, thecomputing architecture2500 may be representative, for example, of a computing device suitable for use in conjunction with implementation of one or more ofrobots1360,1760,1860,1960,2060A, and2060B,automated maintenance device1400,automation coordinator1555, andlogic flows2100,2200, and2300. The embodiments are not limited in this context.
As used in this application, the terms “system” and “component” and “module” are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution, examples of which are provided by theexemplary computing architecture2500. For example, a component can be, but is not limited to being, a process running on a processor, a processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium), an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server can be a component. One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers. Further, components may be communicatively coupled to each other by various types of communications media to coordinate operations. The coordination may involve the uni-directional or bi-directional exchange of information. For instance, the components may communicate information in the form of signals communicated over the communications media. The information can be implemented as signals allocated to various signal lines. In such allocations, each message may be a signal. Further embodiments, however, may alternatively employ data messages. Such data messages may be sent across various connections. Exemplary connections include parallel interfaces, serial interfaces, and bus interfaces.
Thecomputing architecture2500 includes various common computing elements, such as one or more processors, multi-core processors, co-processors, memory units, chipsets, controllers, peripherals, interfaces, oscillators, timing devices, video cards, audio cards, multimedia input/output (I/O) components, power supplies, and so forth. The embodiments, however, are not limited to implementation by thecomputing architecture2500.
As shown inFIG. 25, according tocomputing architecture2500, acomputer2502 comprises aprocessing unit2504, asystem memory2506 and asystem bus2508. In some embodiments,computer2502 may comprise a server. In some embodiments,computer2502 may comprise a client. Theprocessing unit2504 can be any of various commercially available processors, including without limitation an AMD® Athlon®, Duron® and Opteron® processors; ARM® application, embedded and secure processors; IBM® and Motorola® DragonBall® and PowerPC® processors; IBM and Sony® Cell processors; Intel® Celeron®, Core (2) Duo®, Itanium®, Pentium®, Xeon®, and XScale® processors; and similar processors. Dual microprocessors, multi-core processors, and other multi processor architectures may also be employed as theprocessing unit2504.
Thesystem bus2508 provides an interface for system components including, but not limited to, thesystem memory2506 to theprocessing unit2504. Thesystem bus2508 can be any of several types of bus structure that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and a local bus using any of a variety of commercially available bus architectures. Interface adapters may connect to thesystem bus2508 via a slot architecture. Example slot architectures may include without limitation Accelerated Graphics Port (AGP), Card Bus, (Extended) Industry Standard Architecture ((E)ISA), Micro Channel Architecture (MCA), NuBus, Peripheral Component Interconnect (Extended) (PCI(X)), PCI Express, Personal Computer Memory Card International Association (PCMCIA), and the like.
Thesystem memory2506 may include various types of computer-readable storage media in the form of one or more higher speed memory units, such as read-only memory (ROM), random-access memory (RAM), dynamic RAM (DRAM), Double-Data-Rate DRAM (DDRAM), synchronous DRAM (SDRAM), static RAM (SRAM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, polymer memory such as ferroelectric polymer memory, ovonic memory, phase change or ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, an array of devices such as Redundant Array of Independent Disks (RAID) drives, solid state memory devices (e.g., USB memory, solid state drives (SSD) and any other type of storage media suitable for storing information. In the illustrated embodiment shown inFIG. 25, thesystem memory2506 can includenon-volatile memory2510 and/orvolatile memory2512. A basic input/output system (BIOS) can be stored in thenon-volatile memory2510.
Thecomputer2502 may include various types of computer-readable storage media in the form of one or more lower speed memory units, including an internal (or external) hard disk drive (HDD)2514, a magnetic floppy disk drive (FDD)2516 to read from or write to a removablemagnetic disk2518, and anoptical disk drive2520 to read from or write to a removable optical disk2522 (e.g., a CD-ROM or DVD). TheHDD2514,FDD2516 andoptical disk drive2520 can be connected to thesystem bus2508 by aHDD interface2524, anFDD interface2526 and anoptical drive interface2528, respectively. TheHDD interface2524 for external drive implementations can include at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies.
The drives and associated computer-readable media provide volatile and/or nonvolatile storage of data, data structures, computer-executable instructions, and so forth. For example, a number of program modules can be stored in the drives andmemory units2510,2512, including anoperating system2530, one ormore application programs2532,other program modules2534, andprogram data2536.
A user can enter commands and information into thecomputer2502 through one or more wire/wireless input devices, for example, akeyboard2538 and a pointing device, such as amouse2540. Other input devices may include microphones, infra-red (IR) remote controls, radio-frequency (RF) remote controls, game pads, stylus pens, card readers, dongles, finger print readers, gloves, graphics tablets, joysticks, keyboards, retina readers, touch screens (e.g., capacitive, resistive, etc.), trackballs, trackpads, sensors, styluses, and the like. These and other input devices are often connected to theprocessing unit2504 through aninput device interface2542 that is coupled to thesystem bus2508, but can be connected by other interfaces such as a parallel port, IEEE 1394 serial port, a game port, a USB port, an IR interface, and so forth.
Amonitor2544 or other type of display device may also be connected to thesystem bus2508 via an interface, such as avideo adaptor2546. Themonitor2544 may be internal or external to thecomputer2502. In addition to themonitor2544, a computer typically includes other peripheral output devices, such as speakers, printers, and so forth.
Thecomputer2502 may operate in a networked environment using logical connections via wire and/or wireless communications to one or more remote computers, such as aremote computer2548. Theremote computer2548 can be a workstation, a server computer, a router, a personal computer, portable computer, microprocessor-based entertainment appliance, a peer device or other common network node, and typically includes many or all of the elements described relative to thecomputer2502, although, for purposes of brevity, only a memory/storage device2550 is illustrated. The logical connections depicted include wire/wireless connectivity to a local area network (LAN)2552 and/or larger networks, for example, a wide area network (WAN)2554. Such LAN and WAN networking environments are commonplace in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which may connect to a global communications network, for example, the Internet.
When used in a LAN networking environment, thecomputer2502 may be connected to theLAN2552 through a wire and/or wireless communication network interface oradaptor2556. Theadaptor2556 can facilitate wire and/or wireless communications to theLAN2552, which may also include a wireless access point disposed thereon for communicating with the wireless functionality of theadaptor2556.
When used in a WAN networking environment, thecomputer2502 can include amodem2558, or may be connected to a communications server on theWAN2554, or has other means for establishing communications over theWAN2554, such as by way of the Internet. Themodem2558, which can be internal or external and a wire and/or wireless device, connects to thesystem bus2508 via theinput device interface2542. In a networked environment, program modules depicted relative to thecomputer2502, or portions thereof, can be stored in the remote memory/storage device2550. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers can be used.
Thecomputer2502 may be operable to communicate with wire and wireless devices or entities using the IEEE 802 family of standards, such as wireless devices operatively disposed in wireless communication (e.g., IEEE 802.16 over-the-air modulation techniques). This includes at least Wi-Fi (or Wireless Fidelity), WiMax, and Bluetooth™ wireless technologies, among others. Thus, the communication can be a predefined structure as with a conventional network or simply an ad hoc communication between at least two devices. Wi-Fi networks use radio technologies called IEEE 802.11x (a, b, g, n, etc.) to provide secure, reliable, fast wireless connectivity. A Wi-Fi network can be used to connect computers to each other, to the Internet, and to wire networks (which use IEEE 802.3-related media and functions).
FIG. 26 illustrates a block diagram of anexemplary communications architecture2600 suitable for implementing various embodiments as previously described. Thecommunications architecture2600 includes various common communications elements, such as a transmitter, receiver, transceiver, radio, network interface, baseband processor, antenna, amplifiers, filters, power supplies, and so forth. The embodiments, however, are not limited to implementation by thecommunications architecture2600.
As shown inFIG. 26, thecommunications architecture2600 comprises includes one ormore clients2602 andservers2604. Theclients2602 and theservers2604 are operatively connected to one or more respectiveclient data stores2608 andserver data stores2610 that can be employed to store information local to therespective clients2602 andservers2604, such as cookies and/or associated contextual information. Any one ofclients2602 and/orservers2604 may implement one or more ofrobots1360,1760,1860,1960,2060A, and2060B,automated maintenance device1400,automation coordinator1555, logic flows2100,2200, and2300, andcomputing architecture2500.
Theclients2602 and theservers2604 may communicate information between each other using acommunication framework2606. Thecommunications framework2606 may implement any well-known communications techniques and protocols. Thecommunications framework2606 may be implemented as a packet-switched network (e.g., public networks such as the Internet, private networks such as an enterprise intranet, and so forth), a circuit-switched network (e.g., the public switched telephone network), or a combination of a packet-switched network and a circuit-switched network (with suitable gateways and translators).
Thecommunications framework2606 may implement various network interfaces arranged to accept, communicate, and connect to a communications network. A network interface may be regarded as a specialized form of an input output interface. Network interfaces may employ connection protocols including without limitation direct connect, Ethernet (e.g., thick, thin, twisted pair 10/100/1000 Base T, and the like), token ring, wireless network interfaces, cellular network interfaces, IEEE 802.11a-x network interfaces, IEEE 802.16 network interfaces, IEEE 802.20 network interfaces, and the like. Further, multiple network interfaces may be used to engage with various communications network types. For example, multiple network interfaces may be employed to allow for the communication over broadcast, multicast, and unicast networks. Should processing requirements dictate a greater amount speed and capacity, distributed network controller architectures may similarly be employed to pool, load balance, and otherwise increase the communicative bandwidth required byclients2602 and theservers2604. A communications network may be any one and the combination of wired and/or wireless networks including without limitation a direct interconnection, a secured custom connection, a private network (e.g., an enterprise intranet), a public network (e.g., the Internet), a Personal Area Network (PAN), a Local Area Network (LAN), a Metropolitan Area Network (MAN), an Operating Missions as Nodes on the Internet (OMNI), a Wide Area Network (WAN), a wireless network, a cellular network, and other communications networks.
As used herein, the term “circuitry” may refer to, be part of, or include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group), and/or memory (shared, dedicated, or group) that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable hardware components that provide the described functionality. In some embodiments, the circuitry may be implemented in, or functions associated with the circuitry may be implemented by, one or more software or firmware modules. In some embodiments, circuitry may include logic, at least partially operable in hardware. Embodiments described herein may be implemented into a system using any suitably configured hardware and/or software.
FIG. 27 illustrates an embodiment of acommunication device2700 that may implement one or more ofrobots1360,1760,1860,1960,2060A, and2060B,automated maintenance device1400,automation coordinator1555, logic flows2100,2200, and2300,storage media2400 and2450,computing architecture2500,clients2602, andservers2604. In various embodiments,device2700 may comprise alogic circuit2728. Thelogic circuit2728 may include physical circuits to perform operations described for one or more ofrobots1360,1760,1860,1960,2060A, and2060B,automated maintenance device1400,automation coordinator1555, logic flows2100,2200, and2300,computing architecture2500,clients2602, andservers2604 for example. As shown inFIG. 27,device2700 may include aradio interface2710,baseband circuitry2720, andcomputing platform2730, although the embodiments are not limited to this configuration.
Thedevice2700 may implement some or all of the structure and/or operations for one or more ofrobots1360,1760,1860,1960,2060A, and2060B,automated maintenance device1400,automation coordinator1555, logic flows2100,2200, and2300,storage media2400 and2450,computing architecture2500,clients2602,servers2604, andlogic circuit2728 in a single computing entity, such as entirely within a single device. Alternatively, thedevice2700 may distribute portions of the structure and/or operations for one or more ofrobots1360,1760,1860,1960,2060A, and2060B,automated maintenance device1400,automation coordinator1555, logic flows2100,2200, and2300,storage media2400 and2450,computing architecture2500,clients2602,servers2604, andlogic circuit2728 across multiple computing entities using a distributed system architecture, such as a client-server architecture, a 3-tier architecture, an N-tier architecture, a tightly-coupled or clustered architecture, a peer-to-peer architecture, a master-slave architecture, a shared database architecture, and other types of distributed systems. The embodiments are not limited in this context.
In one embodiment,radio interface2710 may include a component or combination of components adapted for transmitting and/or receiving single-carrier or multi-carrier modulated signals (e.g., including complementary code keying (CCK), orthogonal frequency division multiplexing (OFDM), and/or single-carrier frequency division multiple access (SC-FDMA) symbols) although the embodiments are not limited to any specific over-the-air interface or modulation scheme.Radio interface2710 may include, for example, areceiver2712, afrequency synthesizer2714, and/or atransmitter2716.Radio interface2710 may include bias controls, a crystal oscillator and/or one or more antennas2718-f. In another embodiment,radio interface2710 may use external voltage-controlled oscillators (VCOs), surface acoustic wave filters, intermediate frequency (IF) filters and/or RF filters, as desired. Due to the variety of potential RF interface designs an expansive description thereof is omitted.
Baseband circuitry2720 may communicate withradio interface2710 to process receive and/or transmit signals and may include, for example, a mixer for down-converting received RF signals, an analog-to-digital converter2722 for converting analog signals to digital form, a digital-to-analog converter2724 for converting digital signals to analog form, and a mixer for up-converting signals for transmission. Further,baseband circuitry2720 may include a baseband or physical layer (PHY)processing circuit2726 for PHY link layer processing of respective receive/transmit signals.Baseband circuitry2720 may include, for example, a medium access control (MAC)processing circuit2727 for MAC/data link layer processing.Baseband circuitry2720 may include amemory controller2732 for communicating withMAC processing circuit2727 and/or acomputing platform2730, for example, via one ormore interfaces2734.
In some embodiments,PHY processing circuit2726 may include a frame construction and/or detection module, in combination with additional circuitry such as a buffer memory, to construct and/or deconstruct communication frames. Alternatively or in addition,MAC processing circuit2727 may share processing for certain of these functions or perform these processes independent ofPHY processing circuit2726. In some embodiments, MAC and PHY processing may be integrated into a single circuit.
Thecomputing platform2730 may provide computing functionality for thedevice2700. As shown, thecomputing platform2730 may include aprocessing component2740. In addition to, or alternatively of, thebaseband circuitry2720, thedevice2700 may execute processing operations or logic for one or more ofrobots1360,1760,1860,1960,2060A, and2060B,automated maintenance device1400,automation coordinator1555, logic flows2100,2200, and2300,storage media2400 and2450,computing architecture2500,clients2602,servers2604, andlogic circuit2728 using theprocessing component2740. The processing component2740 (and/orPHY2726 and/or MAC2727) may comprise various hardware elements, software elements, or a combination of both. Examples of hardware elements may include devices, logic devices, components, processors, microprocessors, circuits, processor circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), memory units, logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth. Examples of software elements may include software components, programs, applications, computer programs, application programs, system programs, software development programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints, as desired for a given implementation.
Thecomputing platform2730 may further includeother platform components2750.Other platform components2750 include common computing elements, such as one or more processors, multi-core processors, co-processors, memory units, chipsets, controllers, peripherals, interfaces, oscillators, timing devices, video cards, audio cards, multimedia input/output (I/O) components (e.g., digital displays), power supplies, and so forth. Examples of memory units may include without limitation various types of computer readable and machine readable storage media in the form of one or more higher speed memory units, such as read-only memory (ROM), random-access memory (RAM), dynamic RAM (DRAM), Double-Data-Rate DRAM (DDRAM), synchronous DRAM (SDRAM), static RAM (SRAM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, polymer memory such as ferroelectric polymer memory, ovonic memory, phase change or ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, an array of devices such as Redundant Array of Independent Disks (RAID) drives, solid state memory devices (e.g., USB memory, solid state drives (SSD) and any other type of storage media suitable for storing information.
Device2700 may be, for example, an ultra-mobile device, a mobile device, a fixed device, a machine-to-machine (M2M) device, a personal digital assistant (PDA), a mobile computing device, a smart phone, a telephone, a digital telephone, a cellular telephone, user equipment, eBook readers, a handset, a one-way pager, a two-way pager, a messaging device, a computer, a personal computer (PC), a desktop computer, a laptop computer, a notebook computer, a netbook computer, a handheld computer, a tablet computer, a server, a server array or server farm, a web server, a network server, an Internet server, a work station, a mini-computer, a main frame computer, a supercomputer, a network appliance, a web appliance, a distributed computing system, multiprocessor systems, processor-based systems, consumer electronics, programmable consumer electronics, game devices, display, television, digital television, set top box, wireless access point, base station, node B, subscriber station, mobile subscriber center, radio network controller, router, hub, gateway, bridge, switch, machine, or combination thereof. Accordingly, functions and/or specific configurations ofdevice2700 described herein, may be included or omitted in various embodiments ofdevice2700, as suitably desired.
Embodiments ofdevice2700 may be implemented using single input single output (SISO) architectures. However, certain implementations may include multiple antennas (e.g., antennas2718-f) for transmission and/or reception using adaptive antenna techniques for beamforming or spatial division multiple access (SDMA) and/or using MIMO communication techniques.
The components and features ofdevice2700 may be implemented using any combination of discrete circuitry, application specific integrated circuits (ASICs), logic gates and/or single chip architectures. Further, the features ofdevice2700 may be implemented using microcontrollers, programmable logic arrays and/or microprocessors or any combination of the foregoing where suitably appropriate. It is noted that hardware, firmware and/or software elements may be collectively or individually referred to herein as “logic” or “circuit.”
It should be appreciated that theexemplary device2700 shown in the block diagram ofFIG. 27 may represent one functionally descriptive example of many potential implementations. Accordingly, division, omission or inclusion of block functions depicted in the accompanying figures does not infer that the hardware components, circuits, software and/or elements for implementing these functions would be necessarily be divided, omitted, or included in embodiments.
FIG. 28 illustrates an embodiment of a broadbandwireless access system2800. As shown inFIG. 28, broadbandwireless access system2800 may be an internet protocol (IP) type network comprising aninternet2810 type network or the like that is capable of supporting mobile wireless access and/or fixed wireless access tointernet2810. In one or more embodiments, broadbandwireless access system2800 may comprise any type of orthogonal frequency division multiple access (OFDMA)-based or single-carrier frequency division multiple access (SC-FDMA)-based wireless network, such as a system compliant with one or more of the 3GPP LTE Specifications and/or IEEE 802.16 Standards, and the scope of the claimed subject matter is not limited in these respects.
In the exemplary broadbandwireless access system2800, radio access networks (RANs)2812 and2818 are capable of coupling with evolved node Bs (eNBs)2814 and2820, respectively, to provide wireless communication between one or more fixed devices2816 andinternet2810 and/or between or one or more mobile devices2822 andInternet2810. One example of a fixed device2816 and a mobile device2822 isdevice2700 ofFIG. 27, with the fixed device2816 comprising a stationary version ofdevice2700 and the mobile device2822 comprising a mobile version ofdevice2700.RANs2812 and2818 may implement profiles that are capable of defining the mapping of network functions to one or more physical entities on broadbandwireless access system2800.eNBs2814 and2820 may comprise radio equipment to provide RF communication with fixed device2816 and/or mobile device2822, such as described with reference todevice2700, and may comprise, for example, the PHY and MAC layer equipment in compliance with a 3GPP LTE Specification or an IEEE 802.16 Standard.eNBs2814 and2820 may further comprise an IP backplane to couple toInternet2810 viaRANs2812 and2818, respectively, although the scope of the claimed subject matter is not limited in these respects.
Broadbandwireless access system2800 may further comprise a visited core network (CN)2824 and/or ahome CN2826, each of which may be capable of providing one or more network functions including but not limited to proxy and/or relay type functions, for example authentication, authorization and accounting (AAA) functions, dynamic host configuration protocol (DHCP) functions, or domain name service controls or the like, domain gateways such as public switched telephone network (PSTN) gateways or voice over internet protocol (VoIP) gateways, and/or internet protocol (IP) type server functions, or the like. However, these are merely example of the types of functions that are capable of being provided by visitedCN2824 and/orhome CN2826, and the scope of the claimed subject matter is not limited in these respects. VisitedCN2824 may be referred to as a visited CN in the case where visitedCN2824 is not part of the regular service provider of fixed device2816 or mobile device2822, for example where fixed device2816 or mobile device2822 is roaming away from itsrespective home CN2826, or where broadbandwireless access system2800 is part of the regular service provider of fixed device2816 or mobile device2822 but where broadbandwireless access system2800 may be in another location or state that is not the main or home location of fixed device2816 or mobile device2822. The embodiments are not limited in this context.
Fixed device2816 may be located anywhere within range of one or both ofeNBs2814 and2820, such as in or near a home or business to provide home or business customer broadband access toInternet2810 viaeNBs2814 and2820 andRANs2812 and2818, respectively, andhome CN2826. It is worthy of note that although fixed device2816 is generally disposed in a stationary location, it may be moved to different locations as needed. Mobile device2822 may be utilized at one or more locations if mobile device2822 is within range of one or both ofeNBs2814 and2820, for example. In accordance with one or more embodiments, operation support system (OSS)2828 may be part of broadbandwireless access system2800 to provide management functions for broadbandwireless access system2800 and to provide interfaces between functional entities of broadbandwireless access system2800. Broadbandwireless access system2800 ofFIG. 28 is merely one type of wireless network showing a certain number of the components of broadbandwireless access system2800, and the scope of the claimed subject matter is not limited in these respects.
FIG. 29 illustrates an embodiment of awireless network2900. As shown inFIG. 29, wireless network comprises anaccess point2902 andwireless stations2904,2906, and2908. Any one ofaccess point2902 andwireless stations2904,2906, and2908 may potentially implement one or more ofrobots1360,1760,1860,1960,2060A, and2060B,automated maintenance device1400,automation coordinator1555, logic flows2100,2200, and2300,storage media2400 and2450,computing architecture2500,clients2602,servers2604, andcommunication device2700.
In various embodiments,wireless network2900 may comprise a wireless local area network (WLAN), such as a WLAN implementing one or more Institute of Electrical and Electronics Engineers (IEEE) 802.11 standards (sometimes collectively referred to as “Wi-Fi”). In some other embodiments,wireless network2900 may comprise another type of wireless network, and/or may implement other wireless communications standards. In various embodiments, for example,wireless network2900 may comprise a WWAN or WPAN rather than a WLAN. The embodiments are not limited to this example.
In some embodiments,wireless network2900 may implement one or more broadband wireless communications standards, such as 3G or 4G standards, including their revisions, progeny, and variants. Examples of 3G or 4G wireless standards may include without limitation any of the IEEE 802.16m and 802.16p standards, 3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE) and LTE-Advanced (LTE-A) standards, and International Mobile Telecommunications Advanced (IMT-ADV) standards, including their revisions, progeny and variants. Other suitable examples may include, without limitation, Global System for Mobile Communications (GSM)/Enhanced Data Rates for GSM Evolution (EDGE) technologies, Universal Mobile Telecommunications System (UMTS)/High Speed Packet Access (HSPA) technologies, Worldwide Interoperability for Microwave Access (WiMAX) or the WiMAX II technologies, Code Division Multiple Access (CDMA) 2000 system technologies (e.g.,CDMA2000 1×RTT, CDMA2000 EV-DO, CDMA EV-DV, and so forth), High Performance Radio Metropolitan Area Network (HIPERMAN) technologies as defined by the European Telecommunications Standards Institute (ETSI) Broadband Radio Access Networks (BRAN), Wireless Broadband (WiBro) technologies, GSM with General Packet Radio Service (GPRS) system (GSM/GPRS) technologies, High Speed Downlink Packet Access (HSDPA) technologies, High Speed Orthogonal Frequency-Division Multiplexing (OFDM) Packet Access (HSOPA) technologies, High-Speed Uplink Packet Access (HSUPA) system technologies, 3GPP Rel. 8-12 of LTE/System Architecture Evolution (SAE), and so forth. The embodiments are not limited in this context.
In various embodiments,wireless stations2904,2906, and2908 may communicate withaccess point2902 in order to obtain connectivity to one or more external data networks. In some embodiments, for example,wireless stations2904,2906, and2908 may connect to theInternet2912 viaaccess point2902 andaccess network2910. In various embodiments,access network2910 may comprise a private network that provides subscription-based Internet-connectivity, such as an Internet Service Provider (ISP) network. The embodiments are not limited to this example.
In various embodiments, two or more ofwireless stations2904,2906, and2908 may communicate with each other directly by exchanging peer-to-peer communications. For example, in the example ofFIG. 29,wireless stations2904 and2906 communicate with each other directly by exchanging peer-to-peer communications2914. In some embodiments, such peer-to-peer communications may be performed according to one or more Wi-Fi Alliance (WFA) standards. For example, in various embodiments, such peer-to-peer communications may be performed according to the WFA Wi-Fi Direct standard, 2010 Release. In various embodiments, such peer-to-peer communications may additionally or alternatively be performed using one or more interfaces, protocols, and/or standards developed by the WFA Wi-Fi Direct Services (WFDS) Task Group. The embodiments are not limited to these examples.
Various embodiments may be implemented using hardware elements, software elements, or a combination of both. Examples of hardware elements may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth. Examples of software may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints.
One or more aspects of at least one embodiment may be implemented by representative instructions stored on a machine-readable medium which represents various logic within the processor, which when read by a machine causes the machine to fabricate logic to perform the techniques described herein. Such representations, known as “IP cores” may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that actually make the logic or processor. Some embodiments may be implemented, for example, using a machine-readable medium or article which may store an instruction or a set of instructions that, if executed by a machine, may cause the machine to perform a method and/or operations in accordance with the embodiments. Such a machine may include, for example, any suitable processing platform, computing platform, computing device, processing device, computing system, processing system, computer, processor, or the like, and may be implemented using any suitable combination of hardware and/or software. The machine-readable medium or article may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit, for example, memory, removable or non-removable media, erasable or non-erasable media, writeable or re-writeable media, digital or analog media, hard disk, floppy disk, Compact Disk Read Only Memory (CD-ROM), Compact Disk Recordable (CD-R), Compact Disk Rewriteable (CD-RW), optical disk, magnetic media, magneto-optical media, removable memory cards or disks, various types of Digital Versatile Disk (DVD), a tape, a cassette, or the like. The instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, encrypted code, and the like, implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.
The following examples pertain to further embodiments:
Example 1 is a method for automated data center maintenance, comprising processing, by processing circuitry of an automated maintenance device, an automation command received from an automation coordinator for a data center, identifying an automated maintenance procedure based on the received automation command, and performing the identified automated maintenance procedure.
Example 2 is the method of Example 1, the identified automated maintenance procedure to comprise a sled replacement procedure.
Example 3 is the method of Example 2, the sled replacement procedure to comprise replacing a compute sled.
Example 4 is the method of Example 3, the sled replacement procedure to comprise removing the compute sled from a sled space, removing a memory card from a connector slot of the compute sled, inserting the memory card into a connector slot of a replacement compute sled, and inserting the replacement compute sled into the sled space.
Example 5 is the method of Example 4, the memory card to store a compute state of the compute sled.
Example 6 is the method of Example 5, the sled replacement procedure to comprise initiating a restoration of the stored compute state on the replacement compute sled.
Example 7 is the method of Example 2, the sled replacement procedure to comprise replacing an accelerator sled.
Example 8 is the method of Example 2, the sled replacement procedure to comprise replacing a memory sled.
Example 9 is the method of Example 2, the sled replacement procedure to comprise replacing a storage sled.
Example 10 is the method of Example 1, the identified automated maintenance procedure to comprise a component replacement procedure.
Example 11 is the method of Example 10, the component replacement procedure to comprise removing a component from a socket of a sled, and inserting a replacement component into the socket.
Example 12 is the method of Example 11, the component to comprise a processor.
Example 13 is the method of Example 11, the component to comprise a field-programmable gate array (FPGA).
Example 14 is the method of Example 11, the component to comprise a memory module.
Example 15 is the method of Example 11, the component to comprise a non-volatile storage device.
Example 16 is the method of Example 15, the non-volatile storage device to comprise a solid-state drive (SSD).
Example 17 is the method of Example 16, the SSD to comprise a three-dimensional (3D) NAND SSD.
Example 18 is the method of Example 10, the component replacement procedure to comprise a cache memory replacement procedure.
Example 19 is the method of Example 18, the cache memory replacement procedure to comprise replacing one or more cache memory modules of a processor on a sled.
Example 20 is the method of Example 19, the cache memory replacement procedure to comprise removing a heat sink from atop the processor, removing the processor from a socket to facilitate access to one or more cache memory modules underlying the processor, removing the one or cache memory modules, inserting one or more replacement cache memory modules, reinserting the processor into the socket, and reinstalling the heat sink.
Example 21 is the method of Example 1, the identified automated maintenance procedure to comprise a component servicing procedure.
Example 22 is the method of Example 21, the component servicing procedure to comprise servicing a component on a sled.
Example 23 is the method of Example 22, the component servicing procedure to comprise removing the sled from a sled space of a rack.
Example 24 is the method of any of Examples 22 to 23, the component servicing procedure to comprise removing the component from the sled.
Example 25 is the method of any of Examples 22 to 24, the component servicing procedure to comprise testing the component.
Example 26 is the method of any of Examples 22 to 25, the component servicing procedure to comprise cleaning the component.
Example 27 is the method of any of Examples 22 to 26, the component servicing procedure to comprise power-cycling the component.
Example 28 is the method of any of Examples 22 to 27, the component servicing procedure to comprise capturing one or more images of the component.
Example 29 is the method of Example 28, comprising sending the one or more captured images to the automation coordinator.
Example 30 is the method of any of Examples 22 to 29, the component to comprise a processor.
Example 31 is the method of any of Examples 22 to 29, the component to comprise a field-programmable gate array (FPGA).
Example 32 is the method of any of Examples 22 to 29, the component to comprise a memory module.
Example 33 is the method of any of Examples 22 to 29, the component to comprise a non-volatile storage device.
Example 34 is the method of Example 33, the non-volatile storage device to comprise a solid-state drive (SSD).
Example 35 is the method of Example 34, the SSD to comprise a three-dimensional (3D) NAND SSD.
Example 36 is the method of any of Examples 1 to 35, comprising identifying the automated maintenance procedure based on a maintenance task code comprised in the received automation command.
Example 37 is the method of any of Examples 1 to 36, comprising performing the identified automated maintenance procedure based on one or more maintenance task parameters.
Example 38 is the method of Example 37, the one or more maintenance task parameters to be comprised in the received automation command.
Example 39 is the method of Example 37, at least one of the one or more maintenance task parameters to be comprised in a second automation command received from the automation coordinator.
Example 40 is the method of any of Examples 37 to 39, the one or more maintenance task parameters to include one or more location parameters.
Example 41 is the method of Example 40, the one or more location parameters to include a rack identifier (ID) associated with a rack within the data center.
Example 42 is the method of any of Examples 40 to 41, the one or more location parameters to include a sled space identifier (ID) associated with a sled space within the data center.
Example 43 is the method of any of Examples 40 to 42, the one or more location parameters to include a slot identifier (ID) associated with a connector socket on a sled within the data center.
Example 44 is the method of any of Examples 37 to 43, the one or more maintenance task parameters to include a sled identifier (ID) associated with a sled within the data center.
Example 45 is the method of any of Examples 37 to 44, the one or more maintenance task parameters to include a component identifier (ID) associated with a component on a sled within the data center.
Example 46 is the method of any of Examples 1 to 45, the automation command to be comprised in signals received via a communication interface of the automated maintenance device.
Example 47 is the method of Example 46, the communication interface to comprise a radio frequency (RF) interface, the signals to comprise RF signals.
Example 48 is the method of any of Examples 1 to 47, comprising sending a message to the automation coordinator to acknowledge the received automation command.
Example 49 is the method of any of Examples 1 to 48, comprising sending a message to the automation coordinator to report a result of the automated maintenance procedure.
Example 50 is the method of any of Examples 1 to 49, comprising sending position data to the automation coordinator, the position data to indicate a position of the automated maintenance device within the data center.
Example 51 is the method of any of Examples 1 to 50, comprising sending assistance data to the automation coordinator, the assistance data to comprise an image of a component that is to be manually replaced or serviced.
Example 52 is the method of any of Example 1 to 51, comprising sending environmental data to the automation coordinator, the environmental data to comprise measurements of one or more aspects of ambient conditions within the data center.
Example 53 is the method of Example 52, comprising one or more sensors to generate the measurements comprised in the environmental data.
Example 54 is the method of any of Examples 52 to 53, the environmental data to comprise one or more temperature measurements.
Example 55 is the method of any of Examples 52 to 54, the environmental data to comprise one or more humidity measurements.
Example 56 is the method of any of Examples 52 to 55, the environmental data to comprise one or more air quality measurements.
Example 57 is the method of any of Examples 52 to 56, the environmental data to comprise one or more pressure measurements.
Example 58 is a computer-readable storage medium storing instructions that, when executed, cause an automated maintenance device to perform a method according to any of Examples 1 to 57.
Example 59 is an automated maintenance device, comprising processing circuitry and computer-readable storage media storing instructions for execution by the processing circuitry to cause the automated maintenance device to perform a method according to any of Examples 1 to 57.
Example 60 is a method for coordination of automated data center maintenance, comprising identifying, by processing circuitry, a maintenance task to be performed in a data center, determining to initiate automated performance of the maintenance task, selecting an automated maintenance device to which to assign the maintenance task, and sending an automation command to cause the automated maintenance device to perform an automated maintenance procedure associated with the maintenance task.
Example 61 is the method of Example 60, comprising identifying the maintenance task based on telemetry data associated with one or more physical resources of the data center.
Example 62 is the method of Example 61, comprising receiving the telemetry data via a telemetry framework of the data center.
Example 63 is the method of any of Examples 61 to 62, the telemetry data to include one or more telemetry metrics associated with a physical compute resource.
Example 64 is the method of any of Examples 61 to 63, the telemetry data to include one or more telemetry metrics associated with a physical accelerator resource.
Example 65 is the method of any of Examples 61 to 64, the telemetry data to include one or more telemetry metrics associated with a physical memory resource.
Example 66 is the method of any of Examples 61 to 65, the telemetry data to include one or more telemetry metrics associated with a physical storage resource.
Example 67 is the method of any of Examples 60 to 66, comprising identifying the maintenance task based on environmental data received from one or more automated maintenance devices of the data center.
Example 68 is the method of Example 67, the environmental data to include one or more temperature measurements.
Example 69 is the method of any of Examples 67 to 68, the environmental data to include one or more humidity measurements.
Example 70 is the method of any of Examples 67 to 69, the environmental data to include one or more air quality measurements.
Example 71 is the method of any of Examples 67 to 70, the environmental data to include one or more pressure measurements.
Example 72 is the method of any of Examples 60 to 71, comprising adding the maintenance task to a pending task queue following identification of the maintenance task.
Example 73 is the method of Example 72, comprising determining to initiate automated performance of the maintenance task based on a determination that the maintenance task constitutes a highest priority task among one or more maintenance tasks comprised in the pending task queue.
Example 74 is the method of any of Examples 60 to 73, comprising selecting the automated maintenance device from among one or more automated maintenance devices in a candidate device pool.
Example 75 is the method of any of Examples 60 to 74, comprising selecting the automated maintenance device based on one or more capabilities of the automated maintenance device.
Example 76 is the method of any of Examples 60 to 75, comprising selecting the automated maintenance device based on position data received from the automated maintenance device.
Example 77 is the method of any of Examples 60 to 76, the automation command to comprise a maintenance task code indicating a task type associated with the maintenance task.
Example 78 is the method of any of Examples 60 to 77, the automation command to comprise location information associated with the maintenance task.
Example 79 is the method of Example 78, the location information to include a rack identifier (ID) associated with a rack within the data center.
Example 80 is the method of any of Examples 78 to 79, the location information to include a sled space identifier (ID) associated with a sled space within the data center.
Example 81 is the method of any of Examples 78 to 80, the location information to include a slot identifier (ID) associated with a connector socket on a sled within the data center.
Example 82 is the method of any of Examples 60 to 81, the automation command to comprise a sled identifier (ID) associated with a sled within the data center.
Example 83 is the method of any of Examples 60 to 82, the automation command to comprise a physical resource identifier (ID) associated with a physical resource within the data center.
Example 84 is the method of any of Examples 60 to 81, the maintenance task to comprise replacement of a sled.
Example 85 is the method of Example 83, the sled to comprise a compute sled, an accelerator sled, a memory sled, or a storage sled.
Example 86 is the method of any of Examples 60 to 81, the maintenance task to comprise replacement of one or more components of a sled.
Example 87 is the method of any of Examples 60 to 81, the maintenance task to comprise repair of one or more components of a sled.
Example 88 is the method of any of Examples 60 to 81, the maintenance task to comprise testing of one or more components of a sled.
Example 89 is the method of any of Examples 60 to 81, the maintenance task to comprise cleaning of one or more components of a sled.
Example 90 is the method of any of Examples 60 to 81, the maintenance task to comprise power cycling one or more memory modules.
Example 91 is the method of any of Examples 60 to 81, the maintenance task to comprise power cycling one or more non-volatile storage devices.
Example 92 is the method of any of Examples 60 to 81, the maintenance task to comprise storing a compute state of a compute sled, replacing the compute sled with a second compute sled, and transferring the stored compute state to the second compute sled.
Example 93 is the method of any of Examples 60 to 81, the maintenance task to comprise replacing one or more cache memory modules of a processor.
Example 94 is a computer-readable storage medium storing instructions that, when executed by an automation coordinator for a data center, cause the automation coordinator to perform a method according to any of Examples 60 to 93.
Example 95 is an apparatus, comprising processing circuitry and computer-readable storage media storing instructions for execution by the processing circuitry to perform a method according to any of Examples 60 to 93.
Example 96 is a method for automated data center maintenance, comprising identifying, by processing circuitry of an automated maintenance device, a collaborative maintenance procedure to be performed in a data center, identifying a second automated maintenance device with which to collaborate during performance of the collaborative maintenance procedure, and sending interdevice coordination information to the second automated maintenance device to initiate the collaborative maintenance procedure.
Example 97 is the method of Example 96, comprising identifying the collaborative maintenance procedure based on telemetry data associated with one or more physical resources of the data center.
Example 98 is the method of Example 97, the telemetry data to include one or more telemetry metrics associated with a physical compute resource.
Example 99 is the method of any of Examples 97 to 98, the telemetry data to include one or more telemetry metrics associated with a physical accelerator resource.
Example 100 is the method of any of Examples 97 to 99, the telemetry data to include one or more telemetry metrics associated with a physical memory resource.
Example 101 is the method of any of Examples 97 to 100, the telemetry data to include one or more telemetry metrics associated with a physical storage resource.
Example 102 is the method of any of Examples 96 to 101, comprising identifying the collaborative maintenance procedure based on environmental data comprising measurements of one or more aspects of ambient conditions within the data center.
Example 103 is the method of Example 102, comprising one or more sensors to generate the measurements comprised in the environmental data.
Example 104 is the method of any of Examples 102 to 103, the environmental data to comprise one or more temperature measurements.
Example 105 is the method of any of Examples 102 to 104, the environmental data to comprise one or more humidity measurements.
Example 106 is the method of any of Examples 102 to 105, the environmental data to comprise one or more air quality measurements.
Example 107 is the method of any of Examples 102 to 106, the environmental data to comprise one or more pressure measurements.
Example 108 is the method of Example 96, comprising identifying the collaborative maintenance procedure based on an automation command received from an automation coordinator for the data center.
Example 109 is the method of Example 108, comprising identifying the collaborative maintenance procedure based on a maintenance task code comprised in the received automation command.
Example 110 is the method of any of Examples 96 to 109, comprising selecting the second automated maintenance device from among a plurality of automated maintenance devices in a candidate device pool for the data center.
Example 111 is the method of any of Examples 96 to 110, comprising identifying the second automated maintenance device based on a parameter comprised in a command received from an automation coordinator for the data center.
Example 112 is the method of any of Examples 96 to 111, the collaborative maintenance procedure to comprise replacing a sled.
Example 113 is the method of Example 112, the sled to comprise a compute sled.
Example 114 is the method of Example 113, the collaborative maintenance procedure to comprise removing the compute sled from a sled space, removing a memory card from a connector slot of the compute sled, inserting the memory card into a connector slot of a replacement compute sled, and inserting the replacement compute sled into the sled space.
Example 115 is the method of Example 114, the memory card to store a compute state of the compute sled.
Example 116 is the method of Example 115, the collaborative maintenance procedure to comprise initiating a restoration of the stored compute state on the replacement compute sled.
Example 117 is the method of Example 112, the sled to comprise an accelerator sled, a memory sled, or a storage sled.
Example 118 is the method of any of Examples 96 to 111, the collaborative maintenance procedure to comprise replacing a component on a sled.
Example 119 is the method of Example 118, the component to comprise a processor.
Example 120 is the method of Example 118, the component to comprise a field-programmable gate array (FPGA).
Example 121 is the method of Example 118, the component to comprise a memory module.
Example 122 is the method of Example 118, the component to comprise a non-volatile storage device.
Example 123 is the method of Example 122, the non-volatile storage device to comprise a solid-state drive (SSD).
Example 124 is the method of Example 123, the SSD to comprise a three-dimensional (3D) NAND SSD.
Example 125 is the method of any of Examples 96 to 111, the collaborative maintenance procedure to comprise replacing one or more cache memory modules of a processor on a sled.
Example 126 is the method of Example 125, the collaborative maintenance procedure to comprise removing a heat sink from atop the processor, removing the processor from a socket to facilitate access to one or more cache memory modules underlying the processor, removing the one or cache memory modules, inserting one or more replacement cache memory modules, reinserting the processor into the socket, and reinstalling the heat sink.
Example 127 is the method of any of Examples 96 to 111, the collaborative maintenance procedure to comprise servicing a component on a sled.
Example 128 is the method of Example 127, the collaborative maintenance procedure to comprise removing the sled from a sled space of a rack.
Example 129 is the method of any of Examples 127 to 128, the collaborative maintenance procedure to comprise removing the component from the sled.
Example 130 is the method of any of Examples 127 to 129, the collaborative maintenance procedure to comprise testing the component.
Example 131 is the method of any of Examples 127 to 130, the collaborative maintenance procedure to comprise cleaning the component.
Example 132 is the method of any of Examples 127 to 131, the collaborative maintenance procedure to comprise power-cycling the component.
Example 133 is the method of any of Examples 127 to 132, the collaborative maintenance procedure to comprise capturing one or more images of the component.
Example 134 is the method of any of Examples 127 to 133, the component to comprise a processor.
Example 135 is the method of any of Examples 127 to 133, the component to comprise a field-programmable gate array (FPGA).
Example 136 is the method of any of Examples 127 to 133, the component to comprise a memory module.
Example 137 is the method of any of Examples 127 to 133, the component to comprise a non-volatile storage device.
Example 138 is the method of Example 137, the non-volatile storage device to comprise a solid-state drive (SSD).
Example 139 is the method of Example 138, the SSD to comprise a three-dimensional (3D) NAND SSD.
Example 140 is the method of any of Examples 96 to 139, the interdevice coordination information to comprise a rack identifier (ID) associated with a rack within the data center.
Example 141 is the method of any of Examples 96 to 140, the interdevice coordination information to comprise a sled space identifier (ID) associated with a sled space within the data center.
Example 142 is the method of any of Examples 96 to 141, the interdevice coordination information to comprise a slot identifier (ID) associated with a connector socket on a sled within the data center.
Example 143 is the method of any of Examples 96 to 142, the interdevice coordination information to comprise a sled identifier (ID) associated with a sled within the data center.
Example 144 is the method of any of Examples 96 to 143, the interdevice coordination information to comprise a component identifier (ID) associated with a component on a sled within the data center.
Example 145 is a computer-readable storage medium storing instructions that, when executed, cause an automated maintenance device to perform a method according to any of Examples 96 to 144.
Example 146 is an automated maintenance device, comprising processing circuitry and computer-readable storage media storing instructions for execution by the processing circuitry to cause the automated maintenance device to perform a method according to any of Examples 96 to 144.
Example 147 is an automated maintenance device, comprising means for receiving an automation command from an automation coordinator for a data center, means for identifying an automated maintenance procedure based on the received automation command, and means for performing the identified automated maintenance procedure.
Example 148 is the automated maintenance device of Example 147, the identified automated maintenance procedure to comprise a sled replacement procedure.
Example 149 is the automated maintenance device of Example 148, the sled replacement procedure to comprise removing a compute sled from a sled space, removing a memory card from a connector slot of the compute sled, inserting the memory card into a connector slot of a replacement compute sled, and inserting the replacement compute sled into the sled space.
Example 150 is the automated maintenance device of Example 149, the memory card to store a compute state of the compute sled.
Example 151 is the automated maintenance device of Example 150, the sled replacement procedure to comprise initiating a restoration of the stored compute state on the replacement compute sled.
Example 152 is the automated maintenance device of Example 148, the sled replacement procedure to comprise replacing an accelerator sled, a memory sled, or a storage sled.
Example 153 is the automated maintenance device of Example 147, the identified automated maintenance procedure to comprise a component replacement procedure.
Example 154 is the automated maintenance device of Example 153, the component replacement procedure to comprise removing a component from a socket of a sled, and inserting a replacement component into the socket.
Example 155 is the automated maintenance device of Example 154, the component to comprise a processor, a field-programmable gate array (FPGA), a memory module, or a solid-state drive (SSD).
Example 156 is the automated maintenance device of Example 153, the component replacement procedure to comprise a cache memory replacement procedure.
Example 157 is the automated maintenance device of Example 156, the cache memory replacement procedure to comprise replacing one or more cache memory modules of a processor on a sled.
Example 158 is the automated maintenance device of Example 157, the cache memory replacement procedure to comprise removing a heat sink from atop the processor, removing the processor from a socket to facilitate access to one or more cache memory modules underlying the processor, removing the one or cache memory modules, inserting one or more replacement cache memory modules, reinserting the processor into the socket, and reinstalling the heat sink.
Example 159 is the automated maintenance device of Example 147, the identified automated maintenance procedure to comprise a component servicing procedure.
Example 160 is the automated maintenance device of Example 159, the component servicing procedure to comprise servicing a component on a sled.
Example 161 is the automated maintenance device of Example 160, the component servicing procedure to comprise removing the sled from a sled space of a rack.
Example 162 is the automated maintenance device of any of Examples 160 to 161, the component servicing procedure to comprise removing the component from the sled.
Example 163 is the automated maintenance device of any of Examples 160 to 162, the component servicing procedure to comprise testing the component.
Example 164 is the automated maintenance device of any of Examples 160 to 163, the component servicing procedure to comprise cleaning the component.
Example 165 is the automated maintenance device of any of Examples 160 to 164, the component servicing procedure to comprise power-cycling the component.
Example 166 is the automated maintenance device of any of Examples 160 to 165, the component servicing procedure to comprise capturing one or more images of the component.
Example 167 is the automated maintenance device of any of Examples 160 to 166, the component to comprise a processor, a field-programmable gate array (FPGA), a memory module, or a solid-state drive (SSD).
Example 168 is the automated maintenance device of any of Examples 147 to 167, comprising means for identifying the automated maintenance procedure based on a maintenance task code comprised in the received automation command.
Example 169 is the automated maintenance device of any of Examples 147 to 168, comprising means for performing the identified automated maintenance procedure based on one or more maintenance task parameters.
Example 170 is the automated maintenance device of Example 169, the one or more maintenance task parameters to be comprised in the received automation command.
Example 171 is the automated maintenance device of Example 169, at least one of the one or more maintenance task parameters to be comprised in a second automation command received from the automation coordinator.
Example 172 is the automated maintenance device of any of Examples 169 to 171, the one or more maintenance task parameters to include one or more location parameters.
Example 173 is the automated maintenance device of Example 172, the one or more location parameters to include a rack identifier (ID) associated with a rack within the data center.
Example 174 is the automated maintenance device of any of Examples 172 to 173, the one or more location parameters to include a sled space identifier (ID) associated with a sled space within the data center.
Example 175 is the automated maintenance device of any of Examples 172 to 174, the one or more location parameters to include a slot identifier (ID) associated with a connector socket on a sled within the data center.
Example 176 is the automated maintenance device of any of Examples 169 to 175, the one or more maintenance task parameters to include a sled identifier (ID) associated with a sled within the data center.
Example 177 is the automated maintenance device of any of Examples 169 to 176, the one or more maintenance task parameters to include a component identifier (ID) associated with a component on a sled within the data center.
Example 178 is the automated maintenance device of any of Examples 147 to 177, the automation command to be comprised in signals received via a communication interface of the automated maintenance device.
Example 179 is the automated maintenance device of Example 178, the communication interface to comprise a radio frequency (RF) interface, the signals to comprise RF signals.
Example 180 is the automated maintenance device of any of Examples 147 to 179, comprising means for sending a message to the automation coordinator to acknowledge the received automation command.
Example 181 is the automated maintenance device of any of Examples 147 to 180, comprising means for sending a message to the automation coordinator to report a result of the automated maintenance procedure.
Example 182 is the automated maintenance device of any of Examples 147 to 181, comprising means for sending position data to the automation coordinator, the position data to indicate a position of the automated maintenance device within the data center.
Example 183 is the automated maintenance device of any of Examples 147 to 182, comprising means for sending assistance data to the automation coordinator, the assistance data to comprise an image of a component that is to be manually replaced or serviced.
Example 184 is the automated maintenance device of any of Example 147 to 183, comprising means for sending environmental data to the automation coordinator, the environmental data to comprise measurements of one or more aspects of ambient conditions within the data center.
Example 185 is the automated maintenance device of Example 184, comprising means for generating the measurements comprised in the environmental data.
Example 186 is the automated maintenance device of any of Examples 184 to 185, the environmental data to comprise one or more temperature measurements.
Example 187 is the automated maintenance device of any of Examples 184 to 186, the environmental data to comprise one or more humidity measurements.
Example 188 is the automated maintenance device of any of Examples 184 to 187, the environmental data to comprise one or more air quality measurements.
Example 189 is the automated maintenance device of any of Examples 184 to 188, the environmental data to comprise one or more pressure measurements.
Example 189 is an apparatus for coordination of automated data center maintenance, comprising means for identifying a maintenance task to be performed in a data center, means for determining to initiate automated performance of the maintenance task, means for selecting an automated maintenance device to which to assign the maintenance task, and means for sending an automation command to cause the automated maintenance device to perform an automated maintenance procedure associated with the maintenance task.
Example 190 is the apparatus of Example 189, comprising means for identifying the maintenance task based on telemetry data associated with one or more physical resources of the data center.
Example 191 is the apparatus of Example 190, comprising means for receiving the telemetry data via a telemetry framework of the data center.
Example 192 is the apparatus of any of Examples 190 to 191, the telemetry data to include one or more telemetry metrics associated with a physical compute resource.
Example 193 is the apparatus of any of Examples 190 to 192, the telemetry data to include one or more telemetry metrics associated with a physical accelerator resource.
Example 194 is the apparatus of any of Examples 190 to 193, the telemetry data to include one or more telemetry metrics associated with a physical memory resource.
Example 195 is the apparatus of any of Examples 190 to 194, the telemetry data to include one or more telemetry metrics associated with a physical storage resource.
Example 196 is the apparatus of any of Examples 189 to 195, comprising means for identifying the maintenance task based on environmental data received from one or more automated maintenance devices of the data center.
Example 197 is the apparatus of Example 196, the environmental data to include one or more temperature measurements.
Example 198 is the apparatus of any of Examples 196 to 197, the environmental data to include one or more humidity measurements.
Example 199 is the apparatus of any of Examples 196 to 198, the environmental data to include one or more air quality measurements.
Example 200 is the apparatus of any of Examples 196 to 199, the environmental data to include one or more pressure measurements.
Example 201 is the apparatus of any of Examples 189 to 200, comprising means for adding the maintenance task to a pending task queue following identification of the maintenance task.
Example 202 is the apparatus of Example 201, comprising means for determining to initiate automated performance of the maintenance task based on a determination that the maintenance task constitutes a highest priority task among one or more maintenance tasks comprised in the pending task queue.
Example 203 is the apparatus of any of Examples 189 to 202, comprising means for selecting the automated maintenance device from among one or more automated maintenance devices in a candidate device pool.
Example 204 is the apparatus of any of Examples 189 to 203, comprising means for selecting the automated maintenance device based on one or more capabilities of the automated maintenance device.
Example 205 is the apparatus of any of Examples 189 to 204, comprising means for selecting the automated maintenance device based on position data received from the automated maintenance device.
Example 206 is the apparatus of any of Examples 189 to 205, the automation command to comprise a maintenance task code indicating a task type associated with the maintenance task.
Example 207 is the apparatus of any of Examples 189 to 206, the automation command to comprise location information associated with the maintenance task.
Example 208 is the apparatus of Example 207, the location information to include a rack identifier (ID) associated with a rack within the data center.
Example 209 is the apparatus of any of Examples 207 to 208, the location information to include a sled space identifier (ID) associated with a sled space within the data center.
Example 210 is the apparatus of any of Examples 207 to 209, the location information to include a slot identifier (ID) associated with a connector socket on a sled within the data center.
Example 211 is the apparatus of any of Examples 189 to 210, the automation command to comprise a sled identifier (ID) associated with a sled within the data center.
Example 212 is the apparatus of any of Examples 189 to 211, the automation command to comprise a physical resource identifier (ID) associated with a physical resource within the data center.
Example 213 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise replacement of a sled.
Example 214 is the apparatus of Example 213, the sled to comprise a compute sled, an accelerator sled, a memory sled, or a storage sled.
Example 215 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise replacement of one or more components of a sled.
Example 216 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise repair of one or more components of a sled.
Example 217 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise testing of one or more components of a sled.
Example 218 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise cleaning of one or more components of a sled.
Example 219 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise power cycling one or more memory modules.
Example 220 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise power cycling one or more non-volatile storage devices.
Example 221 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise storing a compute state of a compute sled, replacing the compute sled with a second compute sled, and transferring the stored compute state to the second compute sled.
Example 222 is the apparatus of any of Examples 189 to 212, the maintenance task to comprise replacing one or more cache memory modules of a processor.
Example 223 is an automated maintenance device, comprising means for identifying a collaborative maintenance procedure to be performed in a data center, means for identifying a second automated maintenance device with which to collaborate during performance of the collaborative maintenance procedure, and means for sending interdevice coordination information to the second automated maintenance device to initiate the collaborative maintenance procedure.
Example 224 is the automated maintenance device of Example 223, comprising means for identifying the collaborative maintenance procedure based on telemetry data associated with one or more physical resources of the data center.
Example 225 is the automated maintenance device of Example 224, the telemetry data to include one or more telemetry metrics associated with a physical compute resource.
Example 226 is the automated maintenance device of any of Examples 224 to 225, the telemetry data to include one or more telemetry metrics associated with a physical accelerator resource.
Example 227 is the automated maintenance device of any of Examples 224 to 226, the telemetry data to include one or more telemetry metrics associated with a physical memory resource.
Example 228 is the automated maintenance device of any of Examples 224 to 227, the telemetry data to include one or more telemetry metrics associated with a physical storage resource.
Example 229 is the automated maintenance device of any of Examples 223 to 228, comprising means for identifying the collaborative maintenance procedure based on environmental data comprising measurements of one or more aspects of ambient conditions within the data center.
Example 230 is the automated maintenance device of Example 229, comprising one or more sensors to generate the measurements comprised in the environmental data.
Example 231 is the automated maintenance device of any of Examples 229 to 230, the environmental data to comprise one or more temperature measurements.
Example 232 is the automated maintenance device of any of Examples 229 to 231, the environmental data to comprise one or more humidity measurements.
Example 233 is the automated maintenance device of any of Examples 229 to 232, the environmental data to comprise one or more air quality measurements.
Example 234 is the automated maintenance device of any of Examples 229 to 233, the environmental data to comprise one or more pressure measurements.
Example 235 is the automated maintenance device of Example 223, comprising means for identifying the collaborative maintenance procedure based on an automation command received from an automation coordinator for the data center.
Example 236 is the automated maintenance device of Example 235, comprising means for identifying the collaborative maintenance procedure based on a maintenance task code comprised in the received automation command.
Example 237 is the automated maintenance device of any of Examples 223 to 236, comprising means for selecting the second automated maintenance device from among a plurality of automated maintenance devices in a candidate device pool for the data center.
Example 238 is the automated maintenance device of any of Examples 223 to 237, comprising means for identifying the second automated maintenance device based on a parameter comprised in a command received from an automation coordinator for the data center.
Example 239 is the automated maintenance device of any of Examples 223 to 238, the collaborative maintenance procedure to comprise replacing a sled.
Example 240 is the automated maintenance device of Example 239, the sled to comprise a compute sled.
Example 241 is the automated maintenance device of Example 240, the collaborative maintenance procedure to comprise removing the compute sled from a sled space, removing a memory card from a connector slot of the compute sled, inserting the memory card into a connector slot of a replacement compute sled, and inserting the replacement compute sled into the sled space.
Example 242 is the automated maintenance device of Example 241, the memory card to store a compute state of the compute sled.
Example 243 is the automated maintenance device of Example 242, the collaborative maintenance procedure to comprise initiating a restoration of the stored compute state on the replacement compute sled.
Example 244 is the automated maintenance device of Example 239, the sled to comprise an accelerator sled, a memory sled, or a storage sled.
Example 245 is the automated maintenance device of any of Examples 223 to 238, the collaborative maintenance procedure to comprise replacing a component on a sled.
Example 246 is the automated maintenance device of Example 245, the component to comprise a processor, a field-programmable gate array (FPGA), a memory module, or a solid-state drive (SSD).
Example 247 is the automated maintenance device of any of Examples 223 to 238, the collaborative maintenance procedure to comprise replacing one or more cache memory modules of a processor on a sled.
Example 248 is the automated maintenance device of Example 247, the collaborative maintenance procedure to comprise removing a heat sink from atop the processor, removing the processor from a socket to facilitate access to one or more cache memory modules underlying the processor, removing the one or cache memory modules, inserting one or more replacement cache memory modules, reinserting the processor into the socket, and reinstalling the heat sink.
Example 249 is the automated maintenance device of any of Examples 223 to 238, the collaborative maintenance procedure to comprise servicing a component on a sled.
Example 250 is the automated maintenance device of Example 249, the collaborative maintenance procedure to comprise removing the sled from a sled space of a rack.
Example 251 is the automated maintenance device of any of Examples 249 to 250, the collaborative maintenance procedure to comprise removing the component from the sled.
Example 252 is the automated maintenance device of any of Examples 249 to 251, the collaborative maintenance procedure to comprise testing the component.
Example 253 is the automated maintenance device of any of Examples 249 to 252, the collaborative maintenance procedure to comprise cleaning the component.
Example 254 is the automated maintenance device of any of Examples 249 to 253, the collaborative maintenance procedure to comprise power-cycling the component.
Example 255 is the automated maintenance device of any of Examples 249 to 254, the collaborative maintenance procedure to comprise capturing one or more images of the component.
Example 256 is the automated maintenance device of any of Examples 249 to 255, the component to comprise a processor, a field-programmable gate array (FPGA), a memory module, or a solid-state drive (SSD).
Example 257 is the automated maintenance device of any of Examples 223 to 256, the interdevice coordination information to comprise a rack identifier (ID) associated with a rack within the data center.
Example 258 is the automated maintenance device of any of Examples 223 to 257, the interdevice coordination information to comprise a sled space identifier (ID) associated with a sled space within the data center.
Example 259 is the automated maintenance device of any of Examples 223 to 258, the interdevice coordination information to comprise a slot identifier (ID) associated with a connector socket on a sled within the data center.
Example 260 is the automated maintenance device of any of Examples 223 to 259, the interdevice coordination information to comprise a sled identifier (ID) associated with a sled within the data center.
Example 261 is the automated maintenance device of any of Examples 223 to 260, the interdevice coordination information to comprise a component identifier (ID) associated with a component on a sled within the data center.
Numerous specific details have been set forth herein to provide a thorough understanding of the embodiments. It will be understood by those skilled in the art, however, that the embodiments may be practiced without these specific details. In other instances, well-known operations, components, and circuits have not been described in detail so as not to obscure the embodiments. It can be appreciated that the specific structural and functional details disclosed herein may be representative and do not necessarily limit the scope of the embodiments.
Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. These terms are not intended as synonyms for each other. For example, some embodiments may be described using the terms “connected” and/or “coupled” to indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
Unless specifically stated otherwise, it may be appreciated that terms such as “processing,” “computing,” “calculating,” “determining,” or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical quantities (e.g., electronic) within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices. The embodiments are not limited in this context.
It should be noted that the methods described herein do not have to be executed in the order described, or in any particular order. Moreover, various activities described with respect to the methods identified herein can be executed in serial or parallel fashion.
Although specific embodiments have been illustrated and described herein, it should be appreciated that any arrangement calculated to achieve the same purpose may be substituted for the specific embodiments shown. This disclosure is intended to cover any and all adaptations or variations of various embodiments. It is to be understood that the above description has been made in an illustrative fashion, and not a restrictive one. Combinations of the above embodiments, and other embodiments not specifically described herein will be apparent to those of skill in the art upon reviewing the above description. Thus, the scope of various embodiments includes any other applications in which the above compositions, structures, and methods are used.
It is emphasized that the Abstract of the Disclosure is provided to comply with 37 C.F.R. §1.72(b), requiring an abstract that will allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate preferred embodiment. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein,” respectively. Moreover, the terms “first,” “second,” and “third,” etc. are used merely as labels, and are not intended to impose numerical requirements on their objects.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.