It can be appreciated that there is an ever increasing amount of data that needs to be stored and managed. For example, many entities are in the process of digitizing much of their records management and/or other business or non-business related activities. Similarly, web based service providers generally engage in transactions that are primarily digital in nature. Thus, techniques and mechanisms that facilitate efficient and cost effective storage of vast amounts of digital data are being implemented.
SUMMARYThis Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key factors or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In order to manage a group of data storage components (e.g., a clustered storage system), for example, a storage system administrator discovers and connects to the system (e.g., identifies a remote connection access point and remotely accesses the system through the remote access point). The administrator typically makes use of cluster management software present on one or more nodes or controllers of the cluster that facilitates remote administration of the system. Currently, a specified local virtual interface (VIF), restricted to a specific controller, is typically designated as a sole access point for the management software to access the storage system. While the administrator may have access to all components in the clustered storage system from this single access point, and may be able to perform typical administrative and management functions in the system, the VIF may not failover if a connection or the access point controller fails. This can lead to the clustered storage system remaining unmanageable until the administrator can discover and connect to another management VIF on another controller in the system.
A virtual interface (VIF) failover scheme is disclosed herein that allows an administrator to manage a cluster of data storage components as a single entity, rather than as a collection of disparate components. The virtual interface generally comprises, among other things, an IP (internet protocol) address that provides access to the cluster, and more particularly to a hosting of a user interface for cluster management software. In the event that the particular storage component, or a port on the component, upon which the user interface is hosted becomes unavailable (e.g., a VIF failover), the hosting of the user interface (e.g., the hosting of the VIF) is triggered to migrate to another node, or another port on the same node, in the cluster. That is, respective nodes in the cluster have the cluster management software installed thereon, but a corresponding virtual user interface access for the software is merely hosted on one node at a time, and can migrate to other nodes as necessary. Such migration or host shifting generally occurs automatically and/or according to pre-defined rules so that it is substantially invisible to the administrator, affording cluster management continuity regardless of which nodes, or ports, are added to and/or removed from the cluster (e.g., to adjust the storage capacity of the cluster). Moreover, regardless of (the amount of) user interface hosting migration, the VIF essentially maintains the same IP address so that the administrator has continuity with regard to accessing the cluster management software. Also, failover rules can be configured or updated automatically in accordance with nodes becoming available and/or unavailable, at least for purposes of hosting the user interface. Such a VIF failover scheme mitigates access downtime, improves reliability, and provides a substantially seamless failover approach, which is important given the significance of storage system management to organizations and/or ongoing business concerns.
To the accomplishment of the foregoing and related ends, the following description and annexed drawings set forth certain illustrative aspects and implementations. These are indicative of but a few of the various ways in which one or more aspects may be employed. Other aspects, advantages, and novel features of the disclosure will become apparent from the following detailed description when considered in conjunction with the annexed drawings.
DESCRIPTION OF THE DRAWINGSFIG. 1ais a schematic block diagram illustrating a plurality of nodes interconnected as a cluster, where a virtual interface is implemented for managing the cluster.
FIG. 1bis another schematic block diagram illustrating a plurality of nodes interconnected as a cluster, where a virtual interface may be implemented for managing the cluster.
FIG. 2 is a flow diagram illustrating an exemplary method for managing a cluster of data storage devices using VIF failover technique.
FIG. 3 is a schematic block diagram illustrating an exemplary VIF failover process for a disabled node in a 3-node cluster.
FIG. 4 is a schematic block diagram illustrating an exemplary collection of management processes that execute as user mode applications on a node of a cluster to facilitate cluster management.
FIG. 5 is a stack diagram illustrating an exemplary virtual interface (VIF) data structure.
FIG. 6 is a flow diagram illustrating an exemplary VIF failover technique.
FIG. 7 is a flow diagram illustrating an exemplary VIF failover technique utilizing VIF managers.
FIG. 8 is a schematic block diagram illustrating an exemplary system that may be used to host a VIF as provided herein.
FIG. 9 is a schematic block diagram illustrating an exemplary system that may be used to host a VIF as provided herein.
FIG. 10 is a schematic block diagram illustrating an exemplary system for hosting a VIF for cluster management software on a cluster of data storage devices.
FIG. 11 is an illustration of an exemplary computer-readable medium comprising processor-executable instructions configured to embody one or more of the provisions set forth herein.
DETAILED DESCRIPTIONThe claimed subject matter is now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the claimed subject matter. It may be evident, however, that the claimed subject matter may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing the claimed subject matter.
A virtual interface (VIF) may be used to allow a storage system administrator access to a networked storage system. Cluster management software on the storage system (e.g., instances of cluster management software installed on one or more controllers in a cluster of controllers) may, for example, allow the administrator to manage the system virtually (e.g., as one system, even though the system may be a cluster of individual controllers). However, if the administrator uses a VIF restricted to a single controller, for example, in a cluster of controllers in a storage system, and communication to that “VIF controller” is lost, the administrator may not be able to virtually manage the storage system until access is restored (e.g., another VIF access point is discovered and a connection is made). On-the-other-hand, a VIF failover, for example, may provide automated migration of VIF hosting duties from a failed port on a node to another port, or from a failed node to another node in a cluster of nodes in the system. In this example, the VIF failover can have failover rules that determine how a failover is handled, can detect a failover condition, and automatically migrate the VIF hosting duties. Further, a VIF failover may be able to automatically adjust failover rules in accordance with events in the system, for example, when a new node is added or dropped out. In this way, the administrator may have seamless access to manage the storage system, even during conditions where, for example, a node loses communication with the rest of the system.
Embodiments described herein relate to techniques and systems for a virtual interface (VIF) failover scheme that allows an administrator to manage a cluster of data storage components as a single entity, and that substantially automatically migrates VIF hosting upon a failover condition.
FIG. 1ais a schematic block diagram illustrating an example of virtual clusterstorage system administration100. The example 100 illustrates anexemplary cluster102 ofdata storage devices104 where a virtual interface (VIF)106 is implemented for managing thecluster102. As an example, acluster102 may comprise two ormore storage controllers104, interconnected by a cluster switching fabric114 (e.g., a Fibre Channel connection). In the example illustrated, respective storage controllers ornodes104 havecluster management software108 installed thereon, which may be accessed through the VIF106. It will be appreciated that any number of nodes within a cluster (including less than all) may have such cluster management installed thereon to facilitate (remote) management of the cluster and the nodes contained therein. Further, different nodes in the cluster may have different versions of such management software installed thereon, where the different versions may be more or less robust. For example, a streamlined or stripped down version of the software may be installed on one node and may rely on another more sophisticated version of the software installed on a different node for operation. Likewise, a less robust version of the software may allow the node upon which it is installed to be managed (remotely) by an administrator, but it may not have the capabilities to support a management interface that allows an administrator to manage the entire cluster (remotely) through this node. Further, a node in the cluster may have little to no management software installed thereon, but may be able to be managed (remotely) by an administrator by having an instantiation of (a more sophisticated version of) management software running or executing thereon, where the (more sophisticated version of the) management software is actually installed on a different node in the cluster, for example.
Theadministrator116, wishing to access the cluster storage system to perform system management tasks, may access theduster102, for example, by using a browser118 (e.g., installed on a client device) and inputting an Internet protocol (IP)address120 specified for the VIF106. Acontroller104 in thecluster102, which may be designated as a host of theVIF access point106 for themanagement software108, can allow theadministrator116 to use themanagement software108 virtually to perform management tasks on thecluster102.
If anode104 that is hosting the VIF106 becomes disconnected from the cluster (e.g., acommunications cable222 becomes unplugged or other “hard” failure) or becomes unable to adequately host the VIF106 (e.g., aport224 becomes unavailable for hosting an IP address or other “soft” failure), the VIF106 may become unavailable to theadministrator116 using thebrowser118 to access themanagement software108. In this example, an administrator may be without system management capabilities until a new connection can be discovered and connected.
The change in status of a node being able to host theVIF106 to a node that is unable to host the VIF is generally referred to as a failover condition herein. In the illustrated example, thenodes104 of thecluster102 comprise failovercondition detector components112 that facilitate determining when a failover condition occurs. For example, the failovercondition detector components112 may comprise circuitry that detects when a cable becomes unplugged (e.g., hard failure) and/or when a port becomes unavailable (soft failure). When a failover condition is detected, the corresponding failovercondition detector component112 can cause the failed controller to publish a “failed condition” alert, for example, to the other controllers in the cluster, andfailover rules110 in therespective nodes104 can determine which of the remaining nodes can host theVIF106. In this way, anadministrator116 using abrowser118 to access themanagement software108 can have a substantially seamless VIF experience if a controller that is hosting theVIF106 “goes down.”
It will be appreciated, that while exemplary embodiments, described herein, described clusters as comprising two or more node components, the methods and systems described herein are not limited to two or more nodes. The failover methods and systems, described herein may be utilized by a single controller, filer, or node (or other type of VIF hosting component), comprising two or more ports, for example, that can perform VIF hosting duties. In this embodiment, if a failover condition occurs, for example, the VIF hosting may be migrated from a “failed” port to another port on the node that is capable of hosting the VIF. At times, these different items or components that host (or are capable of or configured to host) a VIF are merely referred to as hosting components herein. Moreover, reference herein to one type of hosting component is generally intended to comprise other types of hosting components as well. For example, discussion, description, explanation, etc. relative to a node is generally intended to be equally applicable to a port.
FIG. 1bis a schematic block diagram illustrating anexemplary cluster environment150 ofinterconnected nodes104bconfigured to provide data storage services, wherein a virtual user interface failover scheme as provided herein can be implemented. Therespective nodes104bgenerally comprise, among other things, a network element152 (N-Module) and a disk element154 (D-Module), where at least some of theseelements152,154 may be comprised within memory of a node. Thedisk elements154 in the illustrated example are coupled to one or more storage devices, such asdisks160 of adisk array162.Such disks160 may implement data storage on any suitable type of storage media, such as optical, magnetic; electronic and/or electro-mechanical storage devices, for example. It will be appreciated that while there are an equal number ofnetwork152 anddisk154 modules depicted in the illustrated example, there may be differing numbers of these elements. For example, there may not be a one-to-one correspondence between thenetwork152 anddisk154 modules in thedifferent nodes104b. Similarly, while merely twonodes104bare illustrated in the example depicted inFIG. 1b, such acluster150 can comprise any suitable number of nodes to provide desired storage services (e.g., n nodes, where n is a positive integer).
Thedisk elements154 are configured to facilitate accessing data on thedisks160 of thearrays162, while thenetwork elements152 are configured to facilitate connection to one or more client devices156 (e.g., one or more general purpose computers) as well as interconnection between different nodes. Connection to one ormore clients156 may occur via an Ethernet and/or Fibre Channel (FC)computer network158, for example, while interconnection between different nodes may occur via a cluster switching fabric114 (e.g., a Gigabit Ethernet switch). An exemplary distributed file system architecture is generally described in U.S. Patent Application Publication No. US 2002/0116593 titled METHOD AND SYSTEM FOR RESPONDING TO FILE SYSTEM REQUESTS, by M. Kazar et al. published. Aug. 22, 2002.
Theexemplary cluster150 is one example of an embodiment where the techniques and systems, described herein, can reside. In this embodiment, themodules152 and154 can communicate with a distributed transactional database (DTDB), for example, stored in thedisk array162 ofrespective nodes104bin thecluster150. In this example, themodules152 and154 may be used to coordinate communication with instances of the DTDB onrespective nodes102 in thecluster150. Having DTDB arranged and managed in this manner, for example, may facilitate consistency of information regarding respective multiple components, and data stored onmultiple disks160, contained in thecluster150. It will be appreciated that components, features, elements, etc. described herein that have functionalities and/or aspects related to accessing data may be comprised within and/or comprise one ormore disk elements154. Similarly, components, features, elements, etc. described herein that have functionalities and/or aspects related to networking and/or communications may be comprised within and/or comprise one ormore network elements152. It will also be appreciated that such N and D modules are commercially available from NetApp, Inc. of Sunnyvale, Calif.
FIG. 2 is a flow diagram illustrating anexemplary method200 for managing a cluster of data storage devices using one or more VIF failover techniques. After initialization at202, theexemplary method200 involves hosting a virtual user interface (VIF) on a particular node of a cluster for accessing cluster management software at204. At206, the exemplary method determines whether a failover condition occurs (e.g., a VIF hosting node becomes unavailable or is not longer able to effectively host the VIF). At208, if a failover condition occurs, theexemplary method200 automatically migrates a hosting of the virtual user interface to a different node in the cluster (e.g., VIF hosting migrates from a “failed” node to another node that is able to host the VIF). Having migrated the VIF hosting so that an administrator can continue to virtually and remotely manage the cluster, theexemplary method200 ends at210.
In one aspect, detecting when a node in a cluster is unavailable or unable to effectively host the VIF can be handled by a distributed transactional database (DTDB). This may, for example, correspond to a failovercondition detector component112 as illustrated inFIG. 1. As an example, a storage system comprising two or more nodes (e.g., storage controllers) in a cluster may have an instance of a DTDB (e.g., a replication database (RDB)) running on respective disparate nodes. The DTDBs on the respective nodes can stay substantially in constant contact with each other, updating cluster events (e.g., file writes, file access, hierarchy rules, list of failover rules, etc.) and detecting communication issues. In this example, because the DTDBs running on respective nodes in the cluster are in substantially constant communication, an interruption in this communication from one node may be an indication to other nodes in the cluster that the affected node is unavailable (e.g., offline), at least for purposes of hosting the VIF. Further, if a physical link to a node's hardware in the cluster is severed (e.g., a cable becomes unplugged), the cluster management software could detect the communication interruption and publish this information to the other nodes in the cluster.
In this aspect, for example, if it is determined that a node hosting the VIF for the cluster management software has become unavailable or unable to effectively host the VIF, this can trigger a failover action. As alluded to above, a failover condition may occur when communication between the node hosting the VIF and one or more other nodes in the cluster becomes unreliable (e.g., the hosting node does not have the ability to stay in substantially constant contact with the other nodes in the cluster), which may be called a “soft” failover condition. A failover condition may also occur when hardware associated with the node that is hosting the VIF sustains a failure and is unable to communicate with the other nodes in the cluster (e.g., a network cable becomes unplugged, a partial power failure, a complete failure of the node), which may be called a “hard” failover condition. A hard failover condition may generally be regarded as a condition that results in a loss of an ability for cluster management software to communicate with a node in the cluster; and a soft failover condition may generally be regarded as a condition that results in a loss of an ability for a node in the cluster to host a virtual user interface. In this example, when either a soft or hard failover condition is detected, a VIF hosting migration may be triggered, causing the VIF to be (automatically) moved to a different node in the cluster that has the ability to host the VIF effectively.
FIG. 3 is a schematic block diagram illustrating an exemplary VIF failover process for a disabled node in a 3-node cluster300. In this example, respective nodes A, B, C in the 3-node cluster have an instance of aDTDB312 running thereon alongcluster management software316. In this example, initially, node B hosts VIF V at302. When, node B “fails” (e.g., either a soft or hard failover condition occurs) respective coordinators on the different nodes, such as may be comprised withinrespective replication databases312, receive a “B offline” event notification at304 through the cluster's communication channel314 (e.g., network connection, Fibre Channel, etc.). A coordinator in node A applies a failover rule for VIF V and concludes at306 that VIF V should be hosted on healthy node C. The decision to host VIF V on node C is published to the different nodes in the cluster at308, using the cluster'scommunication channel314. The VIF manager instance, on node C accepts the new hosting rule and takes ownership ofVIF Vat310.
FIG. 4 is a schematic block diagram illustrating an exemplary collection of management processes that execute asuser mode applications400 on an operating system of a node to provide management of nodes of a cluster. In this example, the management processes comprise amanagement framework process402, and a virtual interface (VIF)manager404, utilizing respective data replication services (e.g., distributed replication databases)DTDB408,DTDB410.
Themanagement framework402, among other things, provides a user interface to anadministrator406 via a command line interface (CLI) and/or a network-based (e.g., Web-based) graphical user interface (GUI), for example. It will be appreciated, however, that other user interfaces and protocols may be provided as devised by those skilled in the art. For example, management APIs, such as Simple Network Management Protocol (SNMP) and/or SOAP (e.g., a protocol for exchanging XML-based messages, such as remote procedure calls, normally using HTTP/HTTPS) may be used to provide a user interface to theadministrator406. Themanagement framework402 may be based on a common interface model (CIM) object manager, for example, that provides the entity to which users/system administrators interact with a node to manage a cluster. TheVIF manager404 is configured to use an agreement between quorum participants in a cluster to coordinate virtual interfaces hosted on nodes of a cluster.
FIG. 5 is a stack diagram illustrating an exemplary virtual interface (VIF)500 data structure. TheVIF500 comprises an address502 (e.g., an IP address),netmask504,routes506, VIF ID508,virtual server ID510 and a set ofnetwork ports512 where the VIF may be hosted. There are generally three types of VIFs: data VIFs, cluster VIFs, and management VIFs. Data. VIFs are configured on ports facing a client or on a “client facing network” (e.g., ports on a network interface card (NIC) of a network adapter) for data access via NFS and CIFS protocols, for example. Cluster VIFs and management VIFs can be configured, for example, on ports facing other nodes of a cluster (e.g., ports on an access adaptor). It will be noted that the embodiments described herein typically utilizes management VIFs.
Referring back toFIG. 4, theVIF manager404 may be configured to cause a VIF to switch over to a different node (e.g., to failover) where, for example, a network port failure, network interface failure, cable failure; switch port failure and/or other type of node failure occurs. The VIF is at most, however, hosted by one node at time, including during a failover process. TheVIF manager404 is generally responsible for enabling, configuring, activating, monitoring and/or failing over the management VIF. In this example, during boot, theVIF manager404 is initiated by themanagement framework402. A discovery mechanism of themanagement framework402 discovers devices (e.g., other nodes) managed by a network module of the node, as well as other devices on the node. Themanagement framework402 queries and configures respective network interfaces (e.g., NICs) of different devices (e.g., nodes) managed by the network module of the node. TheVIF manager404 can store information needed to manage these ports and/or network interfaces, for example, in theDTDB410.
In another aspect, theVIF manager404 generally causes a failover to occur, when a failover condition is detected, according to one or more associated failover rules, such as by attempting to failover to a node in a predetermined order, for example, where a first node in such a rule is considered the home node. TheVIF manager404 can also redirect a VIF back to its home node (e.g., a node that hosts the VIF at initiation of VIF management) as long as that node is active. By way of further example, a default failover rule can be that no failover is enabled. VIF failover rules can be established and edited by a system administrator, automatically, as nodes are added to the cluster, etc.
Two examples of failover rules for node selection are “priority-order rule” and “next-available rule.” According to the priority-order rule, when a node fails (e.g., a soft or hard failover condition), a VIF moves to the highest priority available node in the cluster (home node if available). By providing unique priorities, the priority of a node can be specified within the rule. According to the next-available rule, when a node fails, a VIF moves to the next available node in the rule (home node if available).
It will be appreciated that an instance of theVIF manager404 runs on the respective nodes in a cluster. The respective VIF manager instances can use a DTDB to store configuration information and use a DTDB quorum mechanism to determine a node's suitability for VIF hosting. The DTDB can also serve as a mechanism for signaling other nodes when they have been configured to host a VIF (e.g.,308,FIG. 3). One instance of the VIF manager can be designated as a failover coordinator (e.g., Node A inFIG. 3). The coordinator is co-located, for example, on a node with a DTDB master. In this example, the coordinator is responsible for reassigning the VIFs of nodes that are unavailable. The other respective instances (secondaries) of the VIF manager in a cluster can be responsible for reassigning VIFs upon failover conditions. The coordinator can also perform functions of a secondary with respect to its local node. In one example, a coordinator publishes new configurations (e.g.,308,FIG. 3), and if for some reason a particular node (e.g., an unhealthy node) can not fulfill requirements of a newly published configuration, then a secondary transaction is invoked by the unhealthy node to republish a second (failover) configuration.
In embodiments described above (and below), as an example, failover rules can be stored, accessed, updated, and/or managed using instances of a distributed transactional database (DTDB) system (e.g. a replication database (RDB)) running on components of a storage system. However, it will be appreciated that, failover rules may be stored, accessed, updated and/or managed using a variety of hardware, firmware and/or software, comprising a variety of components on a storage system configured for virtual management.
In one embodiment, a technique for managing a cluster of data storage devices may comprise hosting a VIF for cluster management by automatically configuring failover rules for the VIF as a function of (e.g., before, during and/or after) the initiation of hosting of the VIF. That is, in addition to having pre-established failover rules, such rules can also be automatically adjusted based on events occurring in the cluster to update failover rule lists, for example. Failover rule lists may be updated or computed, for example, when nodes and/or ports are added to and/or removed from the cluster. That is, when a change in the cluster occurs, the available ports that can host the VIF are determined, and depending on where the home port is for a VIF, for example, an individual customized list of failover rules for that VIF can be determined and propagated to the available hosting devices in the cluster.FIG. 6 is a flow diagram illustrating anexemplary method600 for managing a cluster of storage devices by hosting a VIF.
Theexemplary method600 begins at602 and involves hosting a VIF on a particular node of a cluster (e.g., a home node), for accessing cluster management software (e.g., VIF manager) at604. At606, before, during and/or after the initiation of VIF hosting (e.g., during system boot, the VIF manager is initiated by a management framework, which, in turn, initiates hosting of the VIF on a node in the cluster), the failover rules are configured for hosting the VIF in the cluster (e.g., failover rules are configured based on the number of healthy nodes present, the configuration of the nodes, and/or other assignments based on predetermined settings). At608, when events occur in the cluster (e.g., a node goes down, a node is added, etc.), theexemplary method600 automatically adjusts the failover rules for hosting the VIF to accommodate the event(s) (e.g., creating a new'list of failover rules and/or hierarchical schedule for VIF hosting based on number and configuration of healthy nodes).
In this embodiment, an administrator may provide initial settings for a failover coordinator prior to starting a system or during an administrative task. As an example, initial settings may assign a list of failover rules and/or node failover schedule hierarchy based on a record of reliability of the respective nodes (e.g., nodes that have maintained stable communication longest have a higher order of failover preference in a node failover schedule). As another example, the initial settings may assign a list of failover rules and/or node failover schedule hierarchy based on runtime age of a node (e.g., older nodes may be assigned lower on a list or a lower schedule hierarchy than newer nodes, that possibly have newer components which may be more reliable and/or have enhanced capabilities (e.g., with regard to processing speeds, memory capacities, backup safety features, etc.)). Further, node failover schedules may be assigned randomly, while using the home node as the preferred failover node.
Additionally, in this embodiment, automatic adjustment of failover rules for VIF hosting can be predetermined by the system based on a cluster configuration, and may be automatically adjusted based on particular events. As an example, if a new node is added to the cluster, the system can automatically associate appropriate ports from the new node into the cluster management software, such that a VIF manager can maintain a list of potential failover targets for the VIF. Also, upon hosting the VIF on a particular node of the cluster, the system can automatically create a list of failover rules for the VIF based on a preferred home location and a current list of potential failover targets for the VIF, such that the VIF can potentially be hosted on one of any failover targets in the cluster for the VIF. Further, a failover coordinator may assign different failover rules and/or failover hierarchies to respective nodes (returning to the cluster) depending of if they were previously involved in a soft failover condition or a hard failover condition, for example. It will, nevertheless, be appreciated that numerous variations of failover settings, rules, and adjustments may be devised by those skilled in the art; and that the embodiments, described herein, are not intended to limit the variety or configuration of particular failover rules, settings, and adjustments contemplated.
In another aspect, hosting the VIF on a node of the cluster and migrating the VIF hosting, for example, may involve utilizing a VIF manager coupled to cluster management software. In this embodiment, cluster management software or an instance thereof can be installed on respective nodes in the cluster (e.g.,108,FIG. 1). Similarly, a VIF manager or an instance thereof can also be installed on respective nodes in the cluster (e.g.,404,FIG. 4). In this example, one of the nodes, and the corresponding VIF manager, can be designated as the failover coordinator (e.g., Node A,FIG. 3), and will be responsible for initially designating VIF hosting, and coordinating failover rule initiation and adjustment.
FIG. 7 is a flow diagram illustrating an exemplaryVIF failover methodology700, utilizing VIF managers. After startup, at702 instances of a VIF manager (on different nodes in a cluster) agree on a list of failover rules (e.g., by communicating through the respective instances of the DTDBs on the different nodes of the cluster). The VIF manager instances have access to a list of current VIF location assignments (e.g., stored by the DTDBs). At704, a VIF manager running on a master node performs the role of coordinator by monitoring overall hosting status and applying the failover rules in response to failover conditions (e.g., hard or soft failover conditions). At706, secondary instances of the VIF manager stand ready to perform services on their local nodes as directed by the coordinator.
The designated online VIF manager (e.g., called by the failover rules) activates VIF hosting on its respective node at708, and at710, when an individual node drops out of the cluster (e.g., an instance of a DTDB on the respective node drops out of communication/quorum), it ceases to host the VIF (e.g., after a suitable delay for hiccup avoidance in CIFS protocol). At712, the VIF manager running on a master node (acting as coordinator) notes the unhealthy node based on the dropout (e.g., resulting from notification from the DTDB in the node), and adjusts the VIF hosting rules at714 (e.g., after a hiccup avoidance delay) based on available healthy nodes and the cluster event that initiated the failover condition. At716, the coordinator then publishes changes to the VIF hosting assignment (e.g., using the DTDB in the node to communicate changes via the networked cluster). For example, the coordinator can open a single RW transaction of the DTDB including operations which write an updated VIF assignment table to replicated databases on respective nodes of a cluster, which in turn notifies the VIF manager instances that an update has occurred. At718, other VIF manager instances wake up in response to DTDB update notifications, and scan their local DTDBs to identify the adjusted hosting responsibilities. The newly designated VIF manager, if any, activates VIF hosting on its local node at720, and then the method ends. It will be appreciated that the failover procedure merely requires a single RW transaction for the set of VIFs that failover, and that hosting the VIFs in response to published changes does not require additional transactional resources.
In one example of this embodiment, when the VIF manager running on a master node (acting as coordinator) assigns VIF-hosting responsibility to another node, that other node's VIF manager either honors or reassigns the responsibility. On occasion, some local error may prevent a VIF manager instance from activating VIF hosting. When this occurs, the affected VIF manager instance can recognize the error, reapply the failover rule for the VIFs that it cannot host, and then publish the new hosting rules to the cluster—just as the coordinator originally had. The different VIF manager instances share the replicated failover rule set, so each is capable of publishing new hosting rules.
In another aspect, a system may be devised for hosting a virtual user interface (VIF) for cluster management software on a cluster of data storage devices.FIG. 8 is a schematic block diagram illustrating anexemplary system800 that may be used to host a VIF, as provided herein. Theexemplary system800 comprises two nodes802 (e.g., storage controllers in a storage system cluster) connected in a cluster through a network fabric804 (e.g., private network system connections, such as using Fibre Channel). Therespective nodes802 are configured to host the VIF for the cluster management software. It will be appreciated, that while theexemplary system800 illustrates two nodes, the exemplary system may be configured with more than two nodes connected in a cluster in a variety of configurations.
Respective nodes802 in theexemplary system800 comprise afailover detection component806, configured to detect a failover condition for itsnode802. Further,respective nodes802 comprise afailover indicator component808, configured to communicate the failover condition (e.g., as detected by the failover detection component806) toother nodes802 in thecluster800. Additionally, therespective nodes802 comprise an automatedVIF migration component810, configured to migrate the VIF hosting to adifferent node802 in the cluster after a failover condition is indicated (e.g., as indicated by the failover indication component808). It will be appreciated, that while theexemplary system800 illustrates thefailover detector806,failover indicator808, andVIF migration component810 as comprised within arespective node802, these components may be arranged in other locations in the cluster in accordance with other variations developed by those skilled in the art.
In one embodiment of the system described above, for example, an instance of a distributed transaction database (DTDB) (e.g., a replication database (RDB)) may be located on respective nodes in a cluster, and these DTDBs can be used by the cluster system to communicate between respective nodes in the cluster. In this embodiment, the respective DTDBs can comprise a failover detector, which can detect when a DTDB on a node in the cluster is no longer communicating effectively. Further, in this embodiment, the cluster management software or an instance thereof can be installed on top of (e.g., operably coupled to) the respective DTDBs in the respective nodes. The cluster management software can have a VIF management component that is used to manage VIF hosting (e.g., provide virtual online user access to the cluster management software). In this embodiment, thefailover indicator808 andVIF migration810 components can be managed by the respective VIF managers in the respective nodes in the cluster.
FIG. 9 is a schematic block diagram illustrating anexemplary system900 that may be used to host a VIF. The system is similar to that illustrated inFIG. 8, but illustrates more details. Theexemplary system900 contains two nodes902 (but more can be included) to form a cluster connected by a network904 (e.g., Fibre Channel).Respective nodes902 in thecluster900 have instances of aDTDB910, which is used to communicate between thenodes902 in the cluster. In this example, afailover detection component916 is located on theDTDB910, as respective DTDBs may be the first to detect a communication problem within the cluster.Respective DTDBs910 in thedifferent nodes902 are operably coupled withcluster management software906 or an instance thereof installed on therespective nodes902.
In theexemplary system900, respective nodes comprise aVIF management component908, operably coupled with thecluster management software906. In this example, theVIF management component908 may be used to manage the VIF, and comprises: afailover indicator914 for communicating a failover condition (e.g., based on settings in the VIF manager908) to theother nodes902 in the cluster; and aVIF migration component912, used to move VIF hosting to a different node (e.g., based on setting in the VIF manager908) after a failover condition is detected and indicated.
In another aspect, a system may be devised to host a virtual user interface (VIF) for cluster management software, comprising at least two nodes, which can automatically organize and adjust failover rules based on events in the cluster.FIG. 10 is a block diagram of anexemplary system1000 for hosting a VIF for cluster management software on a cluster of data storage devices. Theexemplary system1000 comprises two nodes1002 (node A and node B, but any number of nodes may be included in a cluster) connected bycommunication ports1022 throughcluster switching fabric1024. It will be appreciated that the exemplary system described herein may contain more nodes, may be connected in other ways, and may use various configurations as devised by those skilled in the art. Nevertheless,respective nodes1002 comprise afailover rule initiator1014 configured to automatically organize failover rules for hosting aVIF1004 at an initiation of the hosting of theVIF1004 in thesystem1000; and afailover rule adjuster1012 configured to automatically adjust the failover rules for hosting the VIF based on events occurring in thecluster1000. In this example, both therule initiator1014 and therule adjuster1012 are comprised in aVIF manager1008, along with afailover indicator1018, and aVIF migration component1020. Further, in thisexemplary system1000, theVIF manager1008 is comprised withincluster management software1006 or an instance thereof on therespective nodes1002. Respective instances of thecluster management software1006 are operably coupled to an instance of a distributed transactional database (DTDB) (e.g., a replication database (RDB))1010, which comprises afailover detection component1016.
At boot of theexemplary system1000, thecluster management software1006 can initiate theVIF manager1008, which in turn operates the failoverrule initiator component1014, which organizes the failover rules for the system, for example, based on the configuration of the system (e.g., how many healthy nodes are present, coordinator/home node present). In this example, if an event occurs in the cluster1000 (e.g., a node is added, lost, failover condition occurs) theVIF manager1008 operates thefailover rule adjuster1012 to adjust the failover rules in accordance with the events.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
The illustrative embodiments described herein use the DTDB services notion of online/offline status as a proxy for general node health. The DTDB service provides online/offline notifications in response to changes in a node's health status. The DTDB service also internally maintains health information for different nodes in a cluster, and provides up-to-date aggregated cluster health information through a health monitor interface. It will be appreciated that one example of a DTDB is a replication database (RDB). A VIF manager coordinator can use the notifications to trigger failover activity, and use the aggregated health view in a decision-making algorithm. The illustrative embodiments may also use an offline state as a form of implicit message passing during network partition. An offline node relinquishes its VIF, and its online peers can know that it has done so by virtue of the fact that time has elapsed while the node is offline.
It will be appreciated that processes, architectures and/or procedures described herein can be implemented in hardware, firmware and/or software. It will also be appreciated that the provisions set forth herein may apply to any type of special-purpose computer (e.g., file server, filer and/or storage serving appliance) and/or general-purpose computer, including a standalone computer or portion thereof, embodied as or including a storage system. Moreover, the teachings herein can be adapted to a variety of storage system architectures including, but not limited to, a network-attached storage environment and/or a storage area network and disk assembly directly attached to a client or host computer. Storage system should therefore be taken broadly to include such arrangements in addition to any subsystems configured to perform a storage function and associated with other equipment or systems.
The operations herein described, are exemplary and imply no particular order. Further, the operations can be used in any sequence when appropriate and can be partially used (e.g., not all actions may be necessary). It should be understood that various computer-implemented operations involving data storage may comprise manipulation of physical quantities that may take the form of electrical, magnetic, and/or optical signals capable of being stored, transferred, combined, compared and/or otherwise manipulated, for example.
Another example involves computer-readable media comprising processor-executable instructions configured to implement one or more of the techniques presented herein. Computer readable media is intended to comprise any mechanism that can store data, which can be thereafter be read by a computer system. Examples of computer readable media include hard drives (e.g., accessible via network attached storage (NAS)), Storage Area Networks (SAN), volatile and non-volatile memory, such as read-only memory (ROM), random-access memory (RAM), EEPROM and/or flash memory, CD-ROMs, CD-Rs, CD-RWs, DVDs, cassettes, magnetic tape, magnetic disk storage, optical or non-optical data storage devices and/or any other medium which can be used to store data. Computer readable media may also comprise communication media, which typically embodies computer readable instructions or other data in a modulated data signal such as a carrier wave or other transport mechanism (e.g., that has one or more of its characteristics set or changed in such a manner as to encode information in the signal). The computer readable medium can also be distributed (e.g., using a switching fabric, such as used in computer farms) over a network-coupled computer system so that computer readable code is stored and executed in a distributed fashion.
Another embodiment (which may include one or more of the variations described above) involves a computer-readable medium comprising processor-executable instructions configured to apply one or more of the techniques presented herein. An exemplary computer-readable medium that may be devised in these ways is illustrated inFIG. 11, wherein theimplementation1100 comprises a computer-readable medium1108 (e.g., a CD-R, DVD-R, or a platter of a hard disk drive), on which is encoded computer-readable data1106. This computer-readable data1106 in turn comprises a set of computer instructions1104 configured to operate according to the principles set forth herein. In one such embodiment, the processor-executable instructions1104 may be configured to perform amethod1102 for managing a cluster of data storage devices comprising a VIF, such as themethod200 ofFIG. 2, for example. Many such computer-readable media may be devised by those of ordinary skill in the art that are configured to operate in accordance with the techniques presented herein.
The foregoing description has been directed to particular embodiments. It will be apparent, however, that other variations and modifications may be made to the described embodiments, with the attainment of some or all of their advantages. Specifically, it should be noted that one or more of the principles set forth herein may be implemented in non-distributed file systems. Furthermore, while this description has been written in terms of separate remote and support systems, the teachings are equally suitable to systems where the functionality of the remote and support systems are implemented in a single system. Alternately, the functions of remote and support systems may be distributed among any number of separate systems, wherein respective systems perform one or more of the functions. Additionally, the procedures, processes and/or modules described herein may be implemented in hardware, software, embodied as a computer-readable medium having program instructions, firmware, or a combination thereof. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the spirit and scope of the disclosure herein.
As used in this application, the terms “component,” “module,” “system”, “interface”, and the like are generally intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a controller and the controller can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
Moreover, the word “exemplary” is used herein to mean serving as an example, instance, or illustration. That is anything described herein as “exemplary” is not necessarily to be construed as advantageous over other aspects or designs. Also, unless specified to the contrary, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”, and the articles “a” and “an” are generally to be construed to comprise “one or more”. Furthermore, to the extent that the terms “includes”, “having”, “has”, “with”, or variants thereof are used, such terms are intended to be inclusive in a manner similar to the term “comprising”.
Also, although the disclosure has been shown and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art based upon a reading and understanding of this specification and the annexed drawings. The disclosure includes all such modifications and alterations and is limited only by the scope of the following claims. In particular regard to the various functions performed by the above described components (e.g., elements, resources, etc.), the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., that is functionally equivalent), even though not structurally equivalent to the disclosed structure which performs the function in the herein illustrated exemplary implementations of the disclosure.