Movatterモバイル変換


[0]ホーム

URL:


US20020145983A1 - Node shutdown in clustered computer system - Google Patents

Node shutdown in clustered computer system
Download PDF

Info

Publication number
US20020145983A1
US20020145983A1US09/827,804US82780401AUS2002145983A1US 20020145983 A1US20020145983 A1US 20020145983A1US 82780401 AUS82780401 AUS 82780401AUS 2002145983 A1US2002145983 A1US 2002145983A1
Authority
US
United States
Prior art keywords
node
group member
nodes
resident
failure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/827,804
Other versions
US6918051B2 (en
Inventor
Timothy Block
Robert Miller
Kiswanto Thayib
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines CorpfiledCriticalInternational Business Machines Corp
Priority to US09/827,804priorityCriticalpatent/US6918051B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATIONreassignmentINTERNATIONAL BUSINESS MACHINES CORPORATIONASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: MILLER, ROBERT, THAYIB, KISWANTO, BLOCK, TIMOTHY ROY
Priority to CA002376351Aprioritypatent/CA2376351A1/en
Publication of US20020145983A1publicationCriticalpatent/US20020145983A1/en
Application grantedgrantedCritical
Publication of US6918051B2publicationCriticalpatent/US6918051B2/en
Adjusted expirationlegal-statusCritical
Expired - Fee Relatedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

A clustered computer system, apparatus, program product and method utilize a group member-initiated shutdown process to terminate clustering on a node in an automated and orderly fashion, typically in the event of a failure detected by a group member residing on that node. As a component of such a process, node leave operations are initiated on the other nodes in a clustered computer system, thereby permitting any dependency failovers to occur in an automated fashion. Moreover, other group members on a node to be shutdown are preemptively terminated prior to local detection of the failure within those other group members, so that termination of clustering on the node may be initiated to complete a shutdown operation.

Description

Claims (30)

What is claimed is:
1. A method of shutting down a node in a clustered computer system, the method comprising:
(a) detecting a failure in a first node among a plurality of nodes in a clustered computer system with a first group member resident on the first node;
(b) in response to detecting the failure, transmitting a signal to each of the other nodes in the plurality of nodes to initiate on each of the other nodes a node leave operation that terminates clustering with the first node; and
(c) in response to detecting the failure, preemptively terminating a second group member resident on the first node prior to any detection of the failure by the second group member.
2. The method ofclaim 2, wherein the first group member is a cluster control group member.
3. The method ofclaim 2, wherein the first group member is a member of a group other than a cluster control group, wherein the method further comprises sending a shutdown message to a cluster control group member resident on the first node, and wherein transmitting the signal to each of the other nodes and preemptively terminating the second group member are each initiated by the cluster control group member in response to the shutdown message from the first group member.
4. The method ofclaim 1, wherein detecting the failure comprises detecting an error that requires clustering to terminate on the first node.
5. The method ofclaim 4, wherein the error is selected from the group consisting of a communication error, a protocol error, failure or termination of a cluster name server, failure or termination of a cluster control group member, failure or termination of a cluster liveliness monitor, corruption of a critical cluster object, and combinations thereof.
6. The method ofclaim 1, wherein transmitting the signal comprises transmitting a distress signal to each other node in the plurality of nodes.
7. The method ofclaim 1, further comprising, at each of the other nodes in the plurality of nodes, processing the distress signal on such other node by initiating a node leave operation for each group member resident on such other node that is a member of a group that has another member resident on the first node.
8. The method ofclaim 1, further comprising, at another node in the plurality of nodes, performing a dependent failover in response to receiving the signal from the first node.
9. The method ofclaim 1, wherein preemptively terminating the group member comprises notifying each active group member resident on the first node that clustering is ending on the first node.
10. The method ofclaim 9, further comprising ending clustering on the first node after notifying each active group member resident on the first node.
11. The method ofclaim 9, wherein notifying each active group member includes sending an error message to each active group member.
12. The method ofclaim 1, further comprising disregarding a second failure detected from a group member, other than the first group member, that is resident on the first node if transmission of the signal has already been initiated.
13. A method of shutting down a node in a clustered computer system, the method comprising:
(a) in a group member resident on a first node among a plurality of nodes in a clustered computer system, initiating a shutdown of the first node; and
(b) shutting down the first node in response to initiation of the shutdown by the group member.
14. The method ofclaim 13, further comprising detecting a failure in the first node with the group member, wherein initiating the shutdown of the first node is performed in response to detecting the failure.
15. The method ofclaim 14, wherein shutting down the first node comprises:
(a) transmitting a signal to each of the other nodes in the plurality of nodes to initiate on each of the other nodes a node leave operation that terminates clustering with the first node; and
(b) preemptively terminating a second group member resident on the first node prior to any detection of the failure by the second group member.
16. The method ofclaim 15, wherein initiating the shutdown of the first node comprises sending a shutdown message from the group member to a cluster control group member, and wherein shutting down the first node further comprises:
(a) notifying a clustering infrastructure resident on the first node using the cluster control group member to transmit the signal and initiate preemptive termination of the second group member; and
(b) terminating the clustering infrastructure.
17. The method ofclaim 15, wherein transmitting the signal comprises transmitting a distress signal to each other node in the plurality of nodes, the method further comprising, at each of the other nodes in the plurality of nodes, processing the distress signal on such other node by initiating a node leave operation for each group member resident on such other node that is a member of a group that has another member resident on the first node.
18. The method ofclaim 15, wherein preemptively terminating the second group member comprises notifying each active group member resident on the first node that clustering is ending on the first node.
19. The method ofclaim 15, further comprising disregarding a second failure detected from another group member resident on the first node if the signal has already been transmitted.
20. An apparatus, comprising:
(a) a memory accessible by a first node among a plurality of nodes in a clustered computer system; and
(b) first and second group members resident in the memory, the first group member configured to detect a failure in the first node; and
(c) a program resident in the memory, the program configured to shut down the first node in response to the detected failure by transmitting a signal to each of the other nodes in the plurality of nodes to initiate on each of the other nodes a node leave operation that terminates clustering with the first node, and preemptively terminating the second group member resident on the first node prior to any detection of the failure by the second group member.
21. The apparatus ofclaim 20, wherein the first group member is a member of a group other than a cluster control group, wherein the first group member is configured to send a shutdown message to a cluster control group member resident on the first node, and wherein the cluster control group member is configured to initiate transmission of the signal to each of the other nodes and termination of the second group member in response to the shutdown message from the first group member.
22. The apparatus ofclaim 20, wherein the program is configured to transmit the signal by transmitting a distress signal to each other node in the plurality of nodes.
23. The apparatus ofclaim 20, wherein the program is configured to terminate the group member by notifying each active group member resident on the first node that clustering is ending on the first node.
24. The apparatus ofclaim 23, wherein the program is further configured to end clustering on the first node after notifying each active group member resident on the first node.
25. The apparatus ofclaim 20, wherein the program is further configured to disregard a second failure detected from a group member, other than the first group member, that is resident on the first node if transmission of the signal has already been initiated.
26. A clustered computer system, comprising first and second nodes coupled to one another over a network, wherein:
(a) the first node is configured to shut down in response to a failure detected in the first node by a first group member resident on the first node by transmitting a signal to the second node and preemptively terminating a second group member resident on the first node prior to any detection of the failure by the second group member; and
(b) the second node is configured to initiate a node leave operation that terminates clustering with the first node in response to the signal from the first node.
27. The clustered computer system ofclaim 26, wherein the second node is configured to initiate a node leave operation for each group member resident on the second node that is a member of a group that has another member resident on the first node.
28. The clustered computer system ofclaim 27, wherein the second node is configured to perform a dependent failover in response to receiving the signal from the first node.
29. A program product, comprising:
(a) first and second group members, the first group member configured to detect a failure in a first node among a plurality of nodes in a clustered computer system;
(b) a program configured to shut down the first node in response to the detected failure by transmitting a signal to each of the other nodes in the plurality of nodes to initiate on each of the other nodes a node leave operation that terminates clustering with the first node, and preemptively terminating the second group member resident on the first node prior to any detection of the failure by the second group member; and
(c) a signal bearing medium bearing the program.
30. The program product ofclaim 29, wherein the signal bearing medium includes at least one of a recordable medium and a transmission medium.
US09/827,8042001-04-062001-04-06Node shutdown in clustered computer systemExpired - Fee RelatedUS6918051B2 (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
US09/827,804US6918051B2 (en)2001-04-062001-04-06Node shutdown in clustered computer system
CA002376351ACA2376351A1 (en)2001-04-062002-03-12Node shutdown in clustered computer system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
US09/827,804US6918051B2 (en)2001-04-062001-04-06Node shutdown in clustered computer system

Publications (2)

Publication NumberPublication Date
US20020145983A1true US20020145983A1 (en)2002-10-10
US6918051B2 US6918051B2 (en)2005-07-12

Family

ID=25250216

Family Applications (1)

Application NumberTitlePriority DateFiling Date
US09/827,804Expired - Fee RelatedUS6918051B2 (en)2001-04-062001-04-06Node shutdown in clustered computer system

Country Status (2)

CountryLink
US (1)US6918051B2 (en)
CA (1)CA2376351A1 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20030204775A1 (en)*2002-04-252003-10-30Wisler Trina R.Method for handling node failures and reloads in a fault tolerant clustered database supporting transaction registration and fault-in logic
US20030233597A1 (en)*2002-06-132003-12-18Armstrong Joseph W.Method for eliminating a computer from a cluster
US6880100B2 (en)*2001-07-182005-04-12Smartmatic Corp.Peer-to-peer fault detection
US20050198552A1 (en)*2004-02-242005-09-08Hitachi, Ltd.Failover method in a clustered computer system
US20050283641A1 (en)*2004-05-212005-12-22International Business Machines CorporationApparatus, system, and method for verified fencing of a rogue node within a cluster
US20060100981A1 (en)*2004-11-042006-05-11International Business Machines CorporationApparatus and method for quorum-based power-down of unresponsive servers in a computer cluster
US20090234951A1 (en)*2008-03-132009-09-17Fujitsu LimitedCluster control apparatus, control system, control method, and control program
EP1762055A4 (en)*2004-03-102010-03-17Scaleout Software Inc EVOLVING ARCHITECTURE OF HIGH-AVAILABILITY CLUSTER ELEMENTS
US8365009B2 (en)*2010-09-102013-01-29Microsoft CorporationControlled automatic healing of data-center services
US8719622B2 (en)2010-12-272014-05-06International Business Machines CorporationRecording and preventing crash in an appliance
US20170054590A1 (en)*2015-08-212017-02-23Rohit AgarwalMulti-Tenant Persistent Job History Service for Data Processing Centers
WO2018063561A1 (en)*2016-09-272018-04-05Intel CorporationTechnologies for providing network interface support for remote memory and storage failover protection
US10733024B2 (en)2017-05-242020-08-04Qubole Inc.Task packing scheduling process for long running applications
CN112783603A (en)*2021-01-182021-05-11深圳市科思科技股份有限公司Cluster shutdown control method and system and storage medium
US11080207B2 (en)2016-06-072021-08-03Qubole, Inc.Caching framework for big-data engines in the cloud
US11113121B2 (en)2016-09-072021-09-07Qubole Inc.Heterogeneous auto-scaling big-data clusters in the cloud
US11144360B2 (en)2019-05-312021-10-12Qubole, Inc.System and method for scheduling and running interactive database queries with service level agreements in a multi-tenant processing system
US11228489B2 (en)2018-01-232022-01-18Qubole, Inc.System and methods for auto-tuning big data workloads on cloud platforms
US11436667B2 (en)2015-06-082022-09-06Qubole, Inc.Pure-spot and dynamically rebalanced auto-scaling clusters
US11474874B2 (en)2014-08-142022-10-18Qubole, Inc.Systems and methods for auto-scaling a big data system
US11704316B2 (en)2019-05-312023-07-18Qubole, Inc.Systems and methods for determining peak memory requirements in SQL processing engines with concurrent subtasks

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP2003330905A (en)*2002-05-142003-11-21Nec CorpComputer system
US7191253B1 (en)*2002-09-252007-03-13Sprint Communications Company L.P.Server computer system utilizing an asynchronous messaging technique having separate request and response paths
US20040153841A1 (en)*2003-01-162004-08-05Silicon Graphics, Inc.Failure hierarchy in a cluster filesystem
US7030739B2 (en)*2003-01-272006-04-18Audiovox CorporationVehicle security system and method for programming an arming delay
US7379444B2 (en)*2003-01-272008-05-27International Business Machines CorporationMethod to recover from node failure/recovery incidents in distributed systems in which notification does not occur
US7711977B2 (en)*2004-04-152010-05-04Raytheon CompanySystem and method for detecting and managing HPC node failure
US8335909B2 (en)2004-04-152012-12-18Raytheon CompanyCoupling processors to each other for high performance computing (HPC)
US9178784B2 (en)2004-04-152015-11-03Raytheon CompanySystem and method for cluster management based on HPC architecture
US20050235055A1 (en)*2004-04-152005-10-20Raytheon CompanyGraphical user interface for managing HPC clusters
US8190714B2 (en)*2004-04-152012-05-29Raytheon CompanySystem and method for computer cluster virtualization using dynamic boot images and virtual disk
US8336040B2 (en)2004-04-152012-12-18Raytheon CompanySystem and method for topology-aware job scheduling and backfilling in an HPC environment
US8244882B2 (en)*2004-11-172012-08-14Raytheon CompanyOn-demand instantiation in a high-performance computing (HPC) system
US7475274B2 (en)2004-11-172009-01-06Raytheon CompanyFault tolerance and recovery in a high-performance computing (HPC) system
US7433931B2 (en)*2004-11-172008-10-07Raytheon CompanyScheduling in a high-performance computing (HPC) system
US8312135B2 (en)*2007-02-022012-11-13Microsoft CorporationComputing system infrastructure to administer distress messages
US7783813B2 (en)*2007-06-142010-08-24International Business Machines CorporationMulti-node configuration of processor cards connected via processor fabrics
US9240937B2 (en)*2011-03-312016-01-19Microsoft Technology Licensing, LlcFault detection and recovery as a service
US9430306B2 (en)2013-10-082016-08-30Lenovo Enterprise Solutions (Singapore) Pte. Ltd.Anticipatory protection of critical jobs in a computing system
CN112328421B (en)*2020-11-052022-04-08腾讯科技(深圳)有限公司System fault processing method and device, computer equipment and storage medium

Citations (27)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5347524A (en)*1990-09-131994-09-13Hewlett-Packard CompanyProtocol analyzer
US5590277A (en)*1994-06-221996-12-31Lucent Technologies Inc.Progressive retry method and apparatus for software failure recovery in multi-process message-passing applications
US5704031A (en)*1995-03-301997-12-30Fujitsu LimitedMethod of performing self-diagnosing hardware, software and firmware at a client node in a client/server system
US5748882A (en)*1992-09-301998-05-05Lucent Technologies Inc.Apparatus and method for fault-tolerant computing
US5748883A (en)*1992-09-301998-05-05International Business Machines CorporationDistributed device status in a clustered system environment
US5805785A (en)*1996-02-271998-09-08International Business Machines CorporationMethod for monitoring and recovery of subsystems in a distributed/clustered system
US5828836A (en)*1993-10-081998-10-27International Business Machines CorporationNetworked information communication system
US5991518A (en)*1997-01-281999-11-23Tandem Computers IncorporatedMethod and apparatus for split-brain avoidance in a multi-processor system
US6108699A (en)*1997-06-272000-08-22Sun Microsystems, Inc.System and method for modifying membership in a clustered distributed computer system and updating system configuration
US6115393A (en)*1991-04-122000-09-05Concord Communications, Inc.Network monitoring
US6151688A (en)*1997-02-212000-11-21Novell, Inc.Resource management in a clustered computer system
US6192483B1 (en)*1997-10-212001-02-20Sun Microsystems, Inc.Data integrity and availability in a distributed computer system
US6243814B1 (en)*1995-11-022001-06-05Sun Microsystem, Inc.Method and apparatus for reliable disk fencing in a multicomputer system
US6314526B1 (en)*1998-07-102001-11-06International Business Machines CorporationResource group quorum scheme for highly scalable and highly available cluster system management
US6393485B1 (en)*1998-10-272002-05-21International Business Machines CorporationMethod and apparatus for managing clustered computer systems
US20020075841A1 (en)*2000-12-192002-06-20Steer David G.Enhanced ARQ with OFDM modulation symbols
US6438705B1 (en)*1999-01-292002-08-20International Business Machines CorporationMethod and apparatus for building and managing multi-clustered computer systems
US6442713B1 (en)*1999-03-302002-08-27International Business Machines CorporationCluster node distress signal
US6449641B1 (en)*1997-10-212002-09-10Sun Microsystems, Inc.Determining cluster membership in a distributed computer system
US6460149B1 (en)*2000-03-032002-10-01International Business Machines CorporationSuicide among well-mannered cluster nodes experiencing heartbeat failure
US6493715B1 (en)*2000-01-122002-12-10International Business Machines CorporationDelivery of configuration change in a group
US6526521B1 (en)*1999-06-182003-02-25Emc CorporationMethods and apparatus for providing data storage access
US20030159084A1 (en)*2000-01-102003-08-21Sun Microsystems, Inc.Controlled take over of services by remaining nodes of clustered computing system
US6718486B1 (en)*2000-01-262004-04-06David E. LovejoyFault monitor for restarting failed instances of the fault monitor
US6721274B2 (en)*2001-04-032004-04-13Brycen Co., Ltd.Controlling packet flow through a stack using service records
US20040083299A1 (en)*1999-06-302004-04-29Dietz Russell S.Method and apparatus for monitoring traffic in a network
US6763023B1 (en)*2000-01-252004-07-133Com CorporationNetwork switch with self-learning routing facility

Patent Citations (27)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5347524A (en)*1990-09-131994-09-13Hewlett-Packard CompanyProtocol analyzer
US6115393A (en)*1991-04-122000-09-05Concord Communications, Inc.Network monitoring
US5748882A (en)*1992-09-301998-05-05Lucent Technologies Inc.Apparatus and method for fault-tolerant computing
US5748883A (en)*1992-09-301998-05-05International Business Machines CorporationDistributed device status in a clustered system environment
US5828836A (en)*1993-10-081998-10-27International Business Machines CorporationNetworked information communication system
US5590277A (en)*1994-06-221996-12-31Lucent Technologies Inc.Progressive retry method and apparatus for software failure recovery in multi-process message-passing applications
US5704031A (en)*1995-03-301997-12-30Fujitsu LimitedMethod of performing self-diagnosing hardware, software and firmware at a client node in a client/server system
US6243814B1 (en)*1995-11-022001-06-05Sun Microsystem, Inc.Method and apparatus for reliable disk fencing in a multicomputer system
US5805785A (en)*1996-02-271998-09-08International Business Machines CorporationMethod for monitoring and recovery of subsystems in a distributed/clustered system
US5991518A (en)*1997-01-281999-11-23Tandem Computers IncorporatedMethod and apparatus for split-brain avoidance in a multi-processor system
US6151688A (en)*1997-02-212000-11-21Novell, Inc.Resource management in a clustered computer system
US6108699A (en)*1997-06-272000-08-22Sun Microsystems, Inc.System and method for modifying membership in a clustered distributed computer system and updating system configuration
US6192483B1 (en)*1997-10-212001-02-20Sun Microsystems, Inc.Data integrity and availability in a distributed computer system
US6449641B1 (en)*1997-10-212002-09-10Sun Microsystems, Inc.Determining cluster membership in a distributed computer system
US6314526B1 (en)*1998-07-102001-11-06International Business Machines CorporationResource group quorum scheme for highly scalable and highly available cluster system management
US6393485B1 (en)*1998-10-272002-05-21International Business Machines CorporationMethod and apparatus for managing clustered computer systems
US6438705B1 (en)*1999-01-292002-08-20International Business Machines CorporationMethod and apparatus for building and managing multi-clustered computer systems
US6442713B1 (en)*1999-03-302002-08-27International Business Machines CorporationCluster node distress signal
US6526521B1 (en)*1999-06-182003-02-25Emc CorporationMethods and apparatus for providing data storage access
US20040083299A1 (en)*1999-06-302004-04-29Dietz Russell S.Method and apparatus for monitoring traffic in a network
US20030159084A1 (en)*2000-01-102003-08-21Sun Microsystems, Inc.Controlled take over of services by remaining nodes of clustered computing system
US6493715B1 (en)*2000-01-122002-12-10International Business Machines CorporationDelivery of configuration change in a group
US6763023B1 (en)*2000-01-252004-07-133Com CorporationNetwork switch with self-learning routing facility
US6718486B1 (en)*2000-01-262004-04-06David E. LovejoyFault monitor for restarting failed instances of the fault monitor
US6460149B1 (en)*2000-03-032002-10-01International Business Machines CorporationSuicide among well-mannered cluster nodes experiencing heartbeat failure
US20020075841A1 (en)*2000-12-192002-06-20Steer David G.Enhanced ARQ with OFDM modulation symbols
US6721274B2 (en)*2001-04-032004-04-13Brycen Co., Ltd.Controlling packet flow through a stack using service records

Cited By (29)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6880100B2 (en)*2001-07-182005-04-12Smartmatic Corp.Peer-to-peer fault detection
US6990608B2 (en)*2002-04-252006-01-24Hewlett-Packard Development Company, L.P.Method for handling node failures and reloads in a fault tolerant clustered database supporting transaction registration and fault-in logic
US20030204775A1 (en)*2002-04-252003-10-30Wisler Trina R.Method for handling node failures and reloads in a fault tolerant clustered database supporting transaction registration and fault-in logic
US20030233597A1 (en)*2002-06-132003-12-18Armstrong Joseph W.Method for eliminating a computer from a cluster
US20050198552A1 (en)*2004-02-242005-09-08Hitachi, Ltd.Failover method in a clustered computer system
US7305578B2 (en)*2004-02-242007-12-04Hitachi, Ltd.Failover method in a clustered computer system
EP1762055A4 (en)*2004-03-102010-03-17Scaleout Software Inc EVOLVING ARCHITECTURE OF HIGH-AVAILABILITY CLUSTER ELEMENTS
US20050283641A1 (en)*2004-05-212005-12-22International Business Machines CorporationApparatus, system, and method for verified fencing of a rogue node within a cluster
US7908251B2 (en)2004-11-042011-03-15International Business Machines CorporationQuorum-based power-down of unresponsive servers in a computer cluster
US20080301491A1 (en)*2004-11-042008-12-04International Business Machines CorporationQuorum-based power-down of unresponsive servers in a computer cluster
US20080301272A1 (en)*2004-11-042008-12-04International Business Machines CorporationQuorum-based power-down of unresponsive servers in a computer cluster
US7716222B2 (en)2004-11-042010-05-11International Business Machines CorporationQuorum-based power-down of unresponsive servers in a computer cluster
US20060100981A1 (en)*2004-11-042006-05-11International Business Machines CorporationApparatus and method for quorum-based power-down of unresponsive servers in a computer cluster
US20080301490A1 (en)*2004-11-042008-12-04International Business Machines CorporationQuorum-based power-down of unresponsive servers in a computer cluster
US8499080B2 (en)*2008-03-132013-07-30Fujitsu LimitedCluster control apparatus, control system, control method, and control program
US20090234951A1 (en)*2008-03-132009-09-17Fujitsu LimitedCluster control apparatus, control system, control method, and control program
US8365009B2 (en)*2010-09-102013-01-29Microsoft CorporationControlled automatic healing of data-center services
US8719622B2 (en)2010-12-272014-05-06International Business Machines CorporationRecording and preventing crash in an appliance
US11474874B2 (en)2014-08-142022-10-18Qubole, Inc.Systems and methods for auto-scaling a big data system
US11436667B2 (en)2015-06-082022-09-06Qubole, Inc.Pure-spot and dynamically rebalanced auto-scaling clusters
US20170054590A1 (en)*2015-08-212017-02-23Rohit AgarwalMulti-Tenant Persistent Job History Service for Data Processing Centers
US11080207B2 (en)2016-06-072021-08-03Qubole, Inc.Caching framework for big-data engines in the cloud
US11113121B2 (en)2016-09-072021-09-07Qubole Inc.Heterogeneous auto-scaling big-data clusters in the cloud
WO2018063561A1 (en)*2016-09-272018-04-05Intel CorporationTechnologies for providing network interface support for remote memory and storage failover protection
US10733024B2 (en)2017-05-242020-08-04Qubole Inc.Task packing scheduling process for long running applications
US11228489B2 (en)2018-01-232022-01-18Qubole, Inc.System and methods for auto-tuning big data workloads on cloud platforms
US11144360B2 (en)2019-05-312021-10-12Qubole, Inc.System and method for scheduling and running interactive database queries with service level agreements in a multi-tenant processing system
US11704316B2 (en)2019-05-312023-07-18Qubole, Inc.Systems and methods for determining peak memory requirements in SQL processing engines with concurrent subtasks
CN112783603A (en)*2021-01-182021-05-11深圳市科思科技股份有限公司Cluster shutdown control method and system and storage medium

Also Published As

Publication numberPublication date
CA2376351A1 (en)2002-10-06
US6918051B2 (en)2005-07-12

Similar Documents

PublicationPublication DateTitle
US6918051B2 (en)Node shutdown in clustered computer system
US6952766B2 (en)Automated node restart in clustered computer system
US6983324B1 (en)Dynamic modification of cluster communication parameters in clustered computer system
AU2004264635B2 (en)Fast application notification in a clustered computing system
US6625639B1 (en)Apparatus and method for processing a task in a clustered computing environment
US8443232B1 (en)Automatic clusterwide fail-back
US6986076B1 (en)Proactive method for ensuring availability in a clustered system
EP1650653B1 (en)Remote enterprise management of high availability systems
US7188237B2 (en)Reboot manager usable to change firmware in a high availability single processor system
US20030005350A1 (en)Failover management system
US20030097610A1 (en)Functional fail-over apparatus and method of operation thereof
US20040153749A1 (en)Redundant multi-processor and logical processor configuration for a file server
US20030074426A1 (en)Dynamic cluster versioning for a group
US20080301491A1 (en)Quorum-based power-down of unresponsive servers in a computer cluster
US7065673B2 (en)Staged startup after failover or reboot
KR20010006847A (en)Cluster node distress signal
EP2518627B1 (en)Partial fault processing method in computer system
US6629260B1 (en)Automatic reconnection of partner software processes in a fault-tolerant computer system
US20220229746A1 (en)Selecting a witness service when implementing a recovery plan
US8015432B1 (en)Method and apparatus for providing computer failover to a virtualized environment
US7711978B1 (en)Proactive utilization of fabric events in a network virtualization environment
US7120821B1 (en)Method to revive and reconstitute majority node set clusters
US7769844B2 (en)Peer protocol status query in clustered computer system
CN113055236B (en)Method, device, equipment and storage medium for processing fault of cluster service node
US8533331B1 (en)Method and apparatus for preventing concurrency violation among resources

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BLOCK, TIMOTHY ROY;MILLER, ROBERT;THAYIB, KISWANTO;REEL/FRAME:011716/0656;SIGNING DATES FROM 20010402 TO 20010404

FEPPFee payment procedure

Free format text:PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

REMIMaintenance fee reminder mailed
LAPSLapse for failure to pay maintenance fees
STCHInformation on status: patent discontinuation

Free format text:PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FPLapsed due to failure to pay maintenance fee

Effective date:20090712


[8]ページ先頭

©2009-2025 Movatter.jp