Set up a SQL Server cluster on Linux with Always On availability groups and Pacemaker

This tutorial describes how to deploy a Microsoft SQL Server database system on Linux using an Always On availability group (AOAG) and Pacemaker as a high-availability (HA) and disaster recovery (DR) solution. For the purposes of this document, a disaster is an event in which a primary database fails or becomes unavailable.

A primary database can fail when the region it's located in fails or becomes inaccessible. Even if a region is available and operating normally, a primary database can fail because of a system error. In these cases, disaster recovery is the process of making a secondary database available to clients for continued processing.

This tutorial is intended for database architects, administrators, and engineers.

Objectives

Costs

This tutorial uses billable components of Google Cloud,including:

Use thepricing calculator to generate a cost estimate based on your projected usage.

Before you begin

For this tutorial, you need a Google Cloud project. You can create anew one, or select a project you already created:

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  2. Verify that billing is enabled for your Google Cloud project.

  3. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

Prepare the project and network

To prepare your Google Cloud project and VPC for the deployment of SQL ServerAlways On availability groups, do the following:

  1. In the Google Cloud console, openCloud Shell by clicking theActivate Cloud ShellActivate Cloud Shell.button.

    Go to the Google Cloud console

  2. Set your defaultproject ID:

    gcloud config set projectPROJECT_ID

    ReplacePROJECT_ID with the ID of your Google Cloud project.

  3. Set your default region:

    gcloud config set compute/regionREGION

    ReplaceREGION with the ID of the region you want to deploy in.

  4. Set your default zone:

    gcloud config set compute/zoneZONE

    ReplaceZONE with the ID of the zone you want to deploy in. It should be a valid zone in the region specified in the previous step.

Create Linux VMs

To achieve HA and quorum for the SQL Server cluster deploy three Linux virtual machines (VMs) to host the SQL Server cluster.

  1. Initialize the following variables:

    PD_SIZE=30MACHINE_TYPE=n2-standard-8
  2. Create the Linux VMs:

    gcloud compute instances create node-1 \--project=PROJECT_ID \--zoneREGION-a \--machine-type $MACHINE_TYPE \--subnetSUBNET_NAME \--create-disk=auto-delete=yes,boot=yes,device-name=node-1,image=projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20240426,mode=rw,size=$PD_SIZE,type=projects/PROJECT_ID/zones/REGION-a/diskTypes/pd-balanced \--scopes=https://www.googleapis.com/auth/compute,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/trace.append,https://www.googleapis.com/auth/devstorage.read_writegcloud compute instances create node-2 \--project=PROJECT_ID \--zoneREGION-b \--machine-type $MACHINE_TYPE \--subnetSUBNET_NAME \--create-disk=auto-delete=yes,boot=yes,device-name=node-2,image=projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20240426,mode=rw,size=$PD_SIZE,type=projects/PROJECT_ID/zones/REGION-b/diskTypes/pd-balanced \--scopes=https://www.googleapis.com/auth/compute,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/trace.append,https://www.googleapis.com/auth/devstorage.read_writegcloud compute instances create node-3 \--project=PROJECT_ID \--zoneREGION-c \--machine-type $MACHINE_TYPE \--subnetSUBNET_NAME \--create-disk=auto-delete=yes,boot=yes,device-name=node-3,image=projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20240426,mode=rw,size=$PD_SIZE,type=projects/PROJECT_ID/zones/REGION-c/diskTypes/pd-balanced \--scopes=https://www.googleapis.com/auth/compute,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/trace.append,https://www.googleapis.com/auth/devstorage.read_write

    ReplaceSUBNET_NAME with the name of your VPC subnet.

  3. Update the hosts file onnode-1,node-2, andnode-3:

    Note: For production use, the domain resolution should be managed by a dedicated Domain Name System (DNS) server.
    1. Connect to each of your VMs using SSH. Refer to the Connect to Linux VMsdocumentation for more information.
    2. Open the hosts file for edit.

      sudo vi /etc/hosts
    3. Find the internal IP address for each Linux VM and append the host entries to the bottom of the file.

      Go to Compute Engine

      NODE1_INTERNAL_IP node-1NODE2_INTERNAL_IP node-2NODE3_INTERNAL_IP node-3

      ReplaceNODE1_INTERNAL_IP,NODE2_INTERNAL_IP andNODE3_INTERNAL_IP with the internal IP address of each Linux VM.

  4. Check the communication between your VMs. All VMs that participate in the Always On availability group must be able to communicate with other VMs:

    1. Return to each Linux VM, run the commands from each VM, and verify that all VMs can communicate with each other.

      ping -c 4 node-1ping -c 4 node-2ping -c 4 node-3

Install and configure SQL Server

Download, install and configure the SQL Server engine on the three Linux VMs that will participate in the Always On availability group.

  1. SSH tonode-1,node-2, andnode-3 and run the following steps:

    1. Import the public repository keys.

      wget -qO- https://packages.microsoft.com/keys/microsoft.asc \| sudo tee /etc/apt/trusted.gpg.d/microsoft.asc
    2. Register the SQL Server Ubuntu repository.

      sudo add-apt-repository \"$(wget -qO- https://packages.microsoft.com/config/ubuntu/20.04/mssql-server-2019.list)"
    3. Update the package index files and install SQL Server.

      sudo apt-get updatesudo apt-get install -y mssql-server
  2. Configure SQL Server:

    1. Run the mssql-conf tool.

      sudo /opt/mssql/bin/mssql-conf setup
    2. ChooseDeveloper edition for the SQL Server edition and accept the license agreement.

      The developer edition has all the enterprise features included, but you can use it only for non-production environments. More information is available regardingSQL Server editions andMicrosoft licenses.

      Note: For each VM instance a SQL Server license is required. If you are running Microsoft application servers, License Mobility through Software Assurance helps you transition to Google Cloud. For more information, seeUsing License Mobility with Microsoft server applications. If you are not bringing your own license and require to purchase a license through Google (PAYG), follow the steps inAdd a SQL Server license to a boot disk to ensure you purchase a license for each VM instance.
    3. Specify a password for the SA account.

    4. Verify that themssql-server service is running.

      systemctl status mssql-server --no-pager
  3. If you have a firewall enabled on your VMs, open the firewall for SQL Server:

    1. Check ifUncomplicated Firewall is installed and enabled by running the following command.

      sudo ufw status
    2. If the status is active, run the following commands to open the ports.

      sudo ufw allow 1433sudo ufw allow 5022sudo ufw reload
      Note: If you are using other ports for SQL Server or for your availability group endpoint, change the firewalls rules accordingly.

Connect to SQL Server

At this point, SQL Server is installed. To connect to it, create a windows machine in the same VPC, install SQL Server management Studio (SSMS) to connect to your newly created SQL Server instance on your VMs:

  1. Create a Windows VM:

    1. Return to yourCloud Shell and run the following command.

      gcloud compute instances create node4 \--project=PROJECT_ID \--zoneZONE \--subnetSUBNET_NAME \--machine-type=n2-standard-4 \--create-disk=auto-delete=yes,boot=yes,device-name=node4,image=projects/windows-cloud/global/images/windows-server-2022-dc-v20240415,mode=rw,size=50,type=projects/p3rf-sqlserver/zones/ZONE/diskTypes/pd-balanced
  2. Connect to the Windows VM onnode-4 using Remote Desktop:

  3. Update the hosts file onnode-4:

    1. Open notepad in administrator mode.
    2. ClickFile >Open and open the hosts file.

      c:\Windows\System32\drivers\etc\hosts
    3. Append the host entries to the bottom of the file.

      NODE1_INTERNAL_IP node-1NODE2_INTERNAL_IP node-2NODE3_INTERNAL_IP node-3

      ReplaceNODE1_INTERNAL_IP,NODE2_INTERNAL_IP, andNODE3_INTERNAL_IP with the respective internal IP address of each VM.

    4. Save and exit.

  4. Verify connectivitiy to the Linux VMs:

    1. Connect to the Windows VM onnode-4
    2. Click theStart button, and enter powershell into the search bar.
    3. Click to open the Windows PowerShell ISE app.
    4. Tests connectivity by executing the following commands.

      ping node-1ping node-2ping node-3
  5. InstallMicrosoft SQL Server Management Studio (SSMS) with the following steps:

    1. Connect to the Windows VM onnode-4 using Remote Desktop.

    2. In your RDP session, minimize all windows, and start the Windows PowerShell ISE app.

    3. At the PowerShell prompt, download and execute the SSMS installer.

      Start-BitsTransfer `-Source "https://aka.ms/ssmsfullsetup" `-Destination "$env:Temp\ssms-setup.exe"& $env:Temp\ssms-setup.exe
    4. In the SSMS installer, clickInstall.

    5. Accept the prompt to allow changes to be made.

    6. When the installation is finished, clickRestart to restart the remote machine. This closes the RDP session.

  6. Connect to the SQL Server instance on node-1:

    1. Return tonode-4 VM using RDP.

    2. Open SSMS and connect tonode-1 using the following parameters.

      Server name: node-1Authentication: SQL Server AuthenticationLogin: sa

      For more information refer toconnecting to a SQL Server instance usingSQL Server Management Studio documentation.

    3. Enter the password for the SA account created during installation.

    4. SelectTrust server certificate.

    5. ClickConnect.

Enable Always On availability group

On Linux, you must first create an availability group before you can add it as a resource to be managed by Pacemaker:

  1. Enable the Always On availability group feature for each SQL Server instance participating in the availability group. Run the following commands onnode-1,node-2, andnode-3:

    Note: Restart themssql-server service for the changes to take effect.
    sudo /opt/mssql/bin/mssql-conf set hadr.hadrenabled 1sudo systemctl restart mssql-server
  2. Connect to the instance that is the primary host in the availability group using SSMS:

    Tip: We recommend choosingnode-1 as the primary host.
    1. Open a new query window.

    2. Run the following code snippet to create an encryption key, certificate, and private key.

      Tip: Choose strong passwords for the encryption key and the private key. Remember these passwords as you will use them later.Note: The password must be at least 8 characters long and contain characters from three of the following four sets: uppercase letters, lowercase letters, base 10 digits, and symbols.
      USE MASTER;CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'ENCRYPTION_KEY_PASSWORD';CREATE CERTIFICATE my_ag_certificate WITH SUBJECT = 'my_ag_cert';BACKUP CERTIFICATE my_ag_certificateTO FILE = '/var/opt/mssql/data/my_ag_certificate.cer'WITH PRIVATE KEY (    FILE = '/var/opt/mssql/data/my_ag_certificate.pvk',    ENCRYPTION BY PASSWORD = 'PRIVATE_KEY_PASSWORD');

      ReplaceENCRYPTION_KEY_PASSWORD andPRIVATE_KEY_PASSWORD with the passwords for the encryption key and private key.

Transfer the certificate and key files

The certificate and key files created in the previous steps need to be moved to the secondary nodes of SQL Server. There are several methods to move the certificate and key files to the secondary nodes onnode-2 andnode-3.

For other transfer options, seeTransfer files to Linux VMs

Transfer the certificate and key files using Cloud Storage

Create a Cloud Storage to transfer files from the primary to the secondary cluster nodes.

  1. Create a Cloud Storage bucket:

    1. Return to yourCloud Shell, run the following command:

      gcloud storage buckets create gs://BUCKET_NAME \--project=PROJECT_ID \--location=REGION \--public-access-prevention

      ReplaceBUCKET_NAME with the name of the bucket to create. ReplacePROJECT_ID with the ID of your Google Cloud project and replaceREGION with the ID of the region you want the bucket deployed.

    For more information, seeCreate Buckets.

  2. Return to SSh onnode-1,node-2, andnode-3 to initialize Google Cloud CLI:

    1. Run the following command to initialize Google Cloud CLI.

      gcloud init
    2. Chooseoption [1] to use the preinstalled service account.

    3. Enter the name of your project.

    4. Entern to the question to set up the default Region and Zone.

  3. Return tonode-1 to copy the files to Cloud Storage:

    1. Upload the two newly created files to your Cloud Storage by un the following commands.

      sudo gcloud storage cp /var/opt/mssql/data/my_ag_certificate.cer gs://BUCKET_NAME/sudo gcloud storage cp /var/opt/mssql/data/my_ag_certificate.pvk gs://BUCKET_NAME/

      ReplaceBUCKET_NAME with the name of the bucket created.

  4. Return tonode-2 andnode-3 to copy the files from the Cloud Storage:

    1. Download the two files from your Cloud Storage tonode-2.

      sudo gcloud storage cp gs://BUCKET_NAME/my_ag_certificate.cer /var/opt/mssql/data/sudo gcloud storage cp gs://BUCKET_NAME/my_ag_certificate.pvk /var/opt/mssql/data/

      ReplaceBUCKET_NAME with the name of the bucket created.

    2. Change the ownership of the files onnode-2 andnode-3 by running the command in a root shell.

      chown mssql:mssql /var/opt/mssql/data/my_ag_certificate.*chmod 660 /var/opt/mssql/data/my_ag_certificate.*
      Tip: Remember to exit the root shell after running the commands.

Set up database mirroring endpoint

In this section you will create the database endpoint using an encryption key and certificate shared by each node in the SQL Server cluster to ensure secure data replication.

  1. Return to the Windows VM onnode-4 to create the database mirroring endpoints:

    1. Connect to the SQL Server databases onnode-1,node-2, andnode-3 using SSMS. Follow the steps fromConnect to SQL Server usingnode-1,node-2, andnode-3 as the Server name and the respective passwords you set for the SA account.

    2. Create the certificate on the secondary VMsnode-2 andnode-3 from the copied files. Use the passwords you provided when you created the certificate and key on the primary node.

      USE MASTER;CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'ENCRYPTION_KEY_PASSWORD';CREATE CERTIFICATE my_ag_certificateFROM FILE = '/var/opt/mssql/data/my_ag_certificate.cer'WITH PRIVATE KEY (    FILE = '/var/opt/mssql/data/my_ag_certificate.pvk',    DECRYPTION BY PASSWORD = 'PRIVATE_KEY_PASSWORD');

      ReplaceENCRYPTION_KEY_PASSWORD andPRIVATE_KEY_PASSWORD with the passwords for the encryption key and private key.

    3. Return to SSMS to create database mirroring endpoints by running the T-SQL command fornode-1,node-2, andnode-3.

      Tip: The default listening port is 5022, but you can use any other available ports.
      CREATE ENDPOINT [my_ag_endpoint]    AS TCP (LISTENER_PORT = 5022)    FOR DATABASE_MIRRORING (        ROLE = ALL,        AUTHENTICATION = CERTIFICATE my_ag_certificate,        ENCRYPTION = REQUIRED ALGORITHM AES    );ALTER ENDPOINT [my_ag_endpoint] STATE = STARTED;

Create and configure the Always On availability group

Next, create the SQL Server Always On availability group using SQL Server Management Studio and use the previously created endpoints for replication.

Note: A database for replication is necessary to create an Always On availability group. In this tutorial you will create a dummy database named bookshelf. Once the AOAG is created additional databases can be added or removed from the availability group.
  1. Return to the Windows VM and open SSMS:

    1. Connect to the SQL Server database engine onnode-1 and open a new query windows.
  2. Create a database and backup the database in preparation for replication:

    USE MASTER;CREATE DATABASE [bookshelf];ALTER DATABASE [bookshelf] SET RECOVERY FULL;BACKUP DATABASE [bookshelf]TO DISK = N'/var/opt/mssql/data/bookshelf.bak';
  3. Create the Always On availability group:

    1. Run the following T-SQL command in SSMS onnode-1,node-2, andnode-3. This will ensure that the endpoints are enabled and SQL Server on each node is ready for data replication.

      IF (SELECT state FROM sys.endpoints WHERE name = N'my_ag_endpoint')<> 0BEGIN    ALTER ENDPOINT [my_ag_endpoint] STATE = STARTEDENDGOIF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='AlwaysOn_health')BEGIN    ALTER EVENT SESSION [AlwaysOn_health] ON SERVER WITH (STARTUP_STATE=ON);ENDIF NOT EXISTS(SELECT * FROM sys.dm_xe_sessions WHERE name='AlwaysOn_health')BEGIN    ALTER EVENT SESSION [AlwaysOn_health] ON SERVER STATE=START;ENDGO
    2. Run the following T-SQL command onnode-1 to create the AOAG.

      USE [master]GOCREATE AVAILABILITY GROUP [aoag1]WITH (    AUTOMATED_BACKUP_PREFERENCE = SECONDARY,    DB_FAILOVER = OFF,    DTC_SUPPORT = NONE,    CLUSTER_TYPE = EXTERNAL,    REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT = 0)FOR DATABASE [bookshelf]REPLICA ON N'node-1' WITH (    ENDPOINT_URL = N'TCP://node-1:5022', FAILOVER_MODE = EXTERNAL, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, BACKUP_PRIORITY = 50, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)),    N'node-2' WITH (ENDPOINT_URL = N'TCP://node-2:5022', FAILOVER_MODE = EXTERNAL, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, BACKUP_PRIORITY = 50, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)),    N'node-3' WITH (ENDPOINT_URL = N'TCP://node-3:5022', FAILOVER_MODE = EXTERNAL, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, BACKUP_PRIORITY = 50, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE(ALLOW_CONNECTIONS = NO));GO
    3. Run the following T-SQL command onnode-2 andnode-3 for each SQL Server instance to join the new availability group.

      ALTER AVAILABILITY GROUP [aoag1] JOIN WITH (CLUSTER_TYPE = EXTERNAL);GOALTER AVAILABILITY GROUP [aoag1] GRANT CREATE ANY DATABASE;GO

    You have created a new database namedbookshelf and added the new database to a new availability group namedaoag1 on SQL Server instance running onnode-1.Node-2 andnode-3 were added to the availability group and the data in thebookshelf database will be synchronously replicated across the SQL Server instances across all three nodes.

Install and configure Pacemaker

Pacemaker is an open-source high availability resource manager software, used with theCorosync Cluster engine. In this section you install and configure Pacemaker on each of your VMs.

Create a SQL Server login for the pacemaker cluster manager

In this section you will create a new SQL Server account for Pacemaker to use to log into each SQL Server instance and manage the availability group.

  1. Run the following T-SQL command onnode-1,node-2, andnode-3:

    USE [master];CREATE LOGIN [pacemaker] with PASSWORD= N'PACEMAKER_LOGIN_PASSWORD';GO

    ReplacePACEMAKER_LOGIN_PASSWORD with a password for the pacemaker account.

  2. Run the T-SQL command to grant the pacemaker login permissions to the availability group:

    GRANT ALTER, CONTROL, VIEW DEFINITION ON AVAILABILITY GROUP::[aoag1] TO [pacemaker];GRANT VIEW SERVER STATE TO [pacemaker];GO
  3. Return to SSH onnode-1,node-2, andnode-3 to run the commands to save the Pacemaker login and password in the SQL Server secrets folder:

    echo 'pacemaker' >> ~/pacemaker-passwdecho 'PACEMAKER_LOGIN_PASSWORD' >> ~/pacemaker-passwdsudo mv ~/pacemaker-passwd /var/opt/mssql/secrets/passwdsudo chown root:root /var/opt/mssql/secrets/passwdsudo chmod 400 /var/opt/mssql/secrets/passwd

    ReplacePACEMAKER_LOGIN_PASSWORD with the password for the pacemaker account.

Install Pacemaker

Next, install Pacemaker and set up a logon account on all Linux VMs for resource management.

  1. Open firewall ports for pacemaker:

    1. Check ifUncomplicated Firewall is installed and enabled by running the following command onnode-1,node-2, andnode-3.

      sudo ufw status
    2. If the ufw is enabled, open the firewall ports onnode-1,node-2, andnode-3.

      sudo ufw allow 2224/tcpsudo ufw allow 3121/tcpsudo ufw allow 5405/udpsudo ufw allow 21064/tcpsudo ufw allow 1433/tcpsudo ufw allow 5022/tcpsudo ufw reload
      Note: If you are using other ports for SQL Server or for your availability group endpoint, change the firewalls rules accordingly
  2. Install Pacemaker onnode-1,node-2, andnode-3:

    sudo apt-get install -y pacemaker pacemaker-cli-utils crmsh resource-agents  fence-agents corosync python3-azure pcs
  3. Set a new password for thehacluster user onnode-1,node-2, andnode-3:

    Note: Set the same password on all three nodes. This password will be needed for the during the rest of the setup.
    sudo passwd hacluster

Set up Corosync

You will now configure Corosync to manage the cluster membership and messaging across the cluster.

  1. Create an authentication key for Corosync onnode-1:

    sudo corosync-keygen
  2. Modify the Corosync configuration file:

    1. Return tonode-1 and modify thecorosync.conf file.

      sudo vi /etc/corosync/corosync.conf
    2. Update the highlighted sections. After editing, the file should look like the following example.

      # Please read the corosync.conf.5 manual pagetotem {    version: 2    # Corosync itself works without a cluster name, but DLM needs one.    # The cluster name is also written into the VG metadata of newly    # created shared LVM volume groups, if lvmlockd uses DLM locking.cluster_name: my_agcluster    # crypto_cipher and crypto_hash: Used for mutual node authentication.    # If you choose to enable this, then do remember to create a shared    # secret with "corosync-keygen".    # enabling crypto_cipher, requires also enabling of crypto_hash.    # crypto works only with knet transporttransport: udpu    crypto_cipher: none    crypto_hash: none}logging {    # Log the source file and line where messages are being    # generated. When in doubt, leave off. Potentially useful for    # debugging.    fileline: off    # Log to standard error. When in doubt, set to yes. Useful when    # running in the foreground (when invoking "corosync -f")    to_stderr: yes    # Log to a log file. When set to "no", the "logfile" option    # must not be set.    to_logfile: yes    logfile: /var/log/corosync/corosync.log    # Log to the system log daemon. When in doubt, set to yes.    to_syslog: yes    # Log debug messages (very verbose). When in doubt, leave off.    debug: off    # Log messages with time stamps. When in doubt, set to hires (or on)    #timestamp: hires    logger_subsys {        subsys: QUORUM        debug: off    }}quorum {    # Enable and configure quorum subsystem (default: off)    # see also corosync.conf.5 and votequorum.5    provider: corosync_votequorum}nodelist {    # Change/uncomment/add node sections to match cluster configuration    node {        # Hostname of the nodename: node-1        # Cluster membership node identifiernodeid: 1        # Address of first linkring0_addr:NODE1_INTERNAL_IP        # When knet transport is used it's possible to define up to 8 links        #ring1_addr: 192.168.1.1    }node {name: node-2nodeid: 2ring0_addr:NODE2_INTERNAL_IP}node {name: node-3nodeid: 3ring0_addr:NODE3_INTERNAL_IP}    # ...}

      ReplaceNODE1_INTERNAL_IP,NODE2_INTERNAL_IP, andNODE3_INTERNAL_IP with the internal IP addresses of each node.

Transfer the config files using Cloud Storage

  1. Upload the generated authentication key and corosync configuration files fromnode-1 to your Cloud Storage bucket:

    sudo gcloud storage cp /etc/corosync/authkey gs://BUCKET_NAME/sudo gcloud storage cp  /etc/corosync/corosync.conf gs://BUCKET_NAME/

    ReplaceBUCKET_NAME with the name of the bucket previously created.

  2. Download the Authkey and config files tonode-2 andnode-3:

    sudo gcloud storage cp gs://BUCKET_NAME/authkey /etc/corosync/sudo gcloud storage cp gs://BUCKET_NAME/corosync.conf /etc/corosync/

    ReplaceBUCKET_NAME with the name of the bucket where the Corosync configuration files were transferred.

  3. Update the permissions of the files onnode-2 andnode-3:

    sudo chmod 400 /etc/corosync/authkeysudo chmod 400 /etc/corosync/corosync.conf

Restart and verify Cluster communication

  1. Restart the Pacemaker and Corosync services onnode-1,node-2, andnode-3:

    sudo systemctl restart pacemaker corosync
  2. Confirm the status of cluster by running the command onnode-1:

    sudo crm status

    You should see all three nodes online.

Set up the Cluster

Next, you will set up the Pacemaker cluster by creating a new resource for the SQL Server Always On availability group.

  1. Run the following command onnode-1 to set the Cluster properties:

    sudo crm configure property stonith-enabled=falsesudo crm configure property cluster-recheck-interval=2minsudo crm configure property start-failure-is-fatal=true

    For more information, seeCluster Options.

  2. Authorize the nodes in the cluster by running the command onnode-1. Use the password set previously for thehacluster account:

    sudo pcs cluster auth -u hacluster

    You should see all three nodes are authorized.

  3. Install SQL Server resource agent for integration with Pacemaker onnode-1,node-2, andnode-3:

    sudo apt-get install mssql-server-ha
  4. Return tonode-1 and create an availability group resource in the cluster:

    1. Run the Cluster Resource Manager.

      sudo crm
    2. Type inconfigure to enter the configuration menu.

    3. Enter the following configuration.

      primitive aoag1-cluster \ocf:mssql:ag \params ag_name="aoag1" \meta failure-timeout=60s \op start timeout=60s \op stop timeout=60s \op promote timeout=60s \op demote timeout=10s \op monitor timeout=60s interval=10s \op monitor timeout=60s on-fail=demote interval=11s role="Master" \op monitor timeout=60s interval=12s role="Slave" \op notify timeout=60sms ms-ag1 aoag1-cluster \meta master-max="1" master-node-max="1" clone-max="3" \clone-node-max="1" notify="true"
    4. Type incommit to commit the changes.

      Note: You will receive a warning that no fencing rule has been set up. Ignore this message, a fencing rule will be set up in a later step.
    5. Type inexit to exit the Cluster resource manager.

    6. Verify the configuration.

      sudo crm status

      You should see thatnode-1 has been promoted to the primary node.Node-2 andnode-3 should be set as the secondary nodes.

Set up load balancer and Availability Group listener

In this section, you create a virtual IP address and a health check resource in the cluster using an internal passthrough TCP load balancer that routes traffic to the availability group.

  1. Return to your Cloud Shell and reserve a static IP address that you will use as the cluster IP:

    gcloud compute addresses create aoag1-cluster \--regionREGION \--subnetSUBNET_NAMECLUSTER_ADDRESS=$(gcloud compute addresses describe aoag1-cluster \--region $(gcloud config get-value compute/region) \--format=value\(address\)) && \echo "Cluster IP address: $CLUSTER_ADDRESS"

    ReplaceREGION andSUBNET_NAME with the region and subnet where the Linux VMs are deployed.

    Note: Save the reserved IP address, you will use this in a later step.
  2. Create unmanaged instance groups for each of your cluster nodes and assign them to the newly created instance group. Run the following commands in the Cloud Shell:

    gcloud compute instance-groups unmanaged create node-1-uig \--zone=REGION-agcloud compute instance-groups unmanaged add-instances node-1-uig \--zone=REGION-a \--instances=node-1gcloud compute instance-groups unmanaged create node-2-uig \--zone=REGION-bgcloud compute instance-groups unmanaged add-instances node-2-uig \--zone=REGION-b \--instances=node-2gcloud compute instance-groups unmanaged create node-3-uig \--zone=REGION-cgcloud compute instance-groups unmanaged add-instances node-3-uig \--zone=REGION-c \--instances=node-3

    ReplaceREGION with the region where the Linux VMs are deployed.

  3. Create a TCP health check. Load balancers use health checks to determine which backend instances respond properly to traffic.

    gcloud compute health-checks create tcp aoag1-healthcheck \--port=HEALTH_CHECK_PORT --proxy-header=NONE \--check-interval=10 --timeout=10 --unhealthy-threshold=2 \--healthy-threshold=2

    Choose and replaceHEALTH_CHECK_PORT with value of a port that is free and in the private range of 49152-65535 . For example, 60000.

    For more information, read thehealth checks overview.

  4. Add network tags to your cluster nodes. The network tag is used by the firewall rule for the health check:

    gcloud compute instances add-tags node-1 \--tagsNETWORK_TAG_NAME \--zoneREGION-agcloud compute instances add-tags node-2 \--tagsNETWORK_TAG_NAME \--zoneREGION-bgcloud compute instances add-tags node-3 \--tagsNETWORK_TAG_NAME \--zoneREGION-c

    ReplaceNETWORK_TAG_NAME with a name for the network tag.

  5. Create a firewall rule to allow the health checks to reach the cluster nodes based on the tag name:

    gcloud compute firewall-rules create mssql-aoag1-fw-rule \--networkVPC_NAME \--action ALLOW \--direction INGRESS \--source-ranges 35.191.0.0/16,130.211.0.0/22 \--target-tagsNETWORK_TAG_NAME \--rules tcp:HEALTH_CHECK_PORT

    For more information, seeFirewall rules for health checks.

  6. Create the load balancer backend service:

    gcloud compute backend-services create aoag1-backend \--load-balancing-scheme internal \--health-checks aoag1-healthcheck \--no-connection-drain-on-failover \--drop-traffic-if-unhealthy \--failover-ratio 1.0 \--regionREGION \--global-health-checks
  7. Add the three unmanaged instance groups to the backend service:

    gcloud compute backend-services add-backend aoag1-backend \--instance-group node-1-uig \--instance-group-zoneREGION-a \--regionREGIONgcloud compute backend-services add-backend aoag1-backend \--instance-group node-2-uig \--instance-group-zoneREGION-b \--failover \--regionREGIONgcloud compute backend-services add-backend aoag1-backend \--instance-group node-3-uig \--instance-group-zoneREGION-c \--failover \--regionREGION
  8. Define a forwarding rule for your load balancer. A forwarding rule specifies the protocol and ports on which the load balancer accepts traffic:

    gcloud compute forwarding-rules create aoag1-fwd-rule \--load-balancing-scheme internal \--addressCLUSTER_ADDRESS \--subnetSUBNET_NAME \--regionREGION \--backend-service aoag1-backend \--ports ALL

    ReplaceCLUSTER_ADDRESS with the IP address reserved earlier.

    Tip: If you need your cluster to be accessible by clients in any region of the VPC network or a connected network, you need to include the--allow-global-access in the definition of the forwarding rule.

    For more information, seeForwarding rules

  9. To complete the setup and test whether your network load balancer is set up correctly, install and configure theHAProxy tcp listener onnode-1,node-2, andnode-3:

    1. Install the HAProxy.

      sudo apt-get install haproxy

    2. ChooseY to complete installation.

    3. Edit thehaproxy.cfg file.

      sudo vi /etc/haproxy/haproxy.cfg
    4. In the defaults section of thehaproxy.cfg file, change the mode totcp.

    5. Append the following section at the end of thehaproxy.cfg file

      #---------------------------------------------------------------# Set up health check listener for SQL Server Availability Group#---------------------------------------------------------------listen healthcheckbind *:HEALTH_CHECK_PORT

      ReplaceHEALTH_CHECK_PORT with the health check port previously selected. For example, 6000.

    6. Start the service to confirm it is correctly configured:

      sudo systemctl start haproxy.servicesudo systemctl enable haproxy.servicesudo systemctl restart haproxy.service
    7. Go to the Load balancing page, click your load balancer. Observe your three unmanaged instance groups, they should now report as healthy.

      Go to Load balancing

      • Alternatively, you can run the following command in the Cloud Shell to see the status of the backend service.

        gcloud compute backend-services get-health aoag1-backend \--regionREGION

        ReplaceREGION with the region where the Linux VMs are deployed.

    8. Once all three unmanaged instance groups are reporting healthy, continue to the next step.

      Note: If the unmanaged instance group does not report healthy, you may need to restart thehaproxy.service from the node in question.
      sudo systemctl restart haproxy.service
  10. Create the health check resource in Pacemaker:

    1. SSH tonode-1 and create a health check resource for the HAProxy service in your pacemaker cluster:

      sudo pcs resource create aoag1-healthcheck \service:haproxy \op monitor interval=10s timeout=20s
    2. Check that the health resource is started on the primary nodenode-1:

      sudo crm status
    3. If the health check resource is not started on the primary node, move it with the following commands:

      sudo pcs resource move aoag1-healthcheck node-1sudo pcs resource clear aoag1-healthcheck

      You will see that the health check for the load balancer will be healthy only fornode-1.

      Go to Load balancing

  11. Create a virtual IP address resource in your Pacemaker cluster:

    1. Return to SSH onnode-1 and find the name of the network interface of your node. You will need it in the next step.

      ip -c link
    2. Create the virtual IP address resource.

      sudo pcs resource create aoag1-vip ocf:heartbeat:IPaddr2 \ip="CLUSTER_ADDRESS" nic=NIC_NAME cidr_netmask=32 \op monitor interval=3600s timeout=60s

      ReplaceNIC_NAME with the network interface name from the previous step andCLUSTER_ADDRESS with the reserved IP address.

    3. Check that the virtual IP address resource is started on the primary host.

      sudo crm status
    4. If the virtual IP address resource is not started on the primary node, move it with the following commands.

      sudo pcs resource move aoag1-vip node-1
    5. Group the health check and virtual IP address resources together.

      sudo pcs resource group add aoag1-group \aoag1-healthcheck aoag1-vip
    6. Create a constraint that locates the new group on the same node as the primary.

      sudo pcs constraint colocation add master aoag1-group with master ms-ag1 score=INFINITY

Create a listener for your SQL Server availability group

Connections to SQL Server with availability groups should use an availability group listener name instead of the server name. If there is a failover, the listener will automatically redirect connections to the new primary node in the cluster.

  1. Return to SSMS and connect to thenode-1 database.

  2. Run the following query:

    ALTER AVAILABILITY GROUP aoag1ADD LISTENER 'aoag1-listener' (    WITH IP (('CLUSTER_ADDRESS','255.255.255.0')), PORT=1433);GO

    ReplaceCLUSTER_ADDRESS with the reserved IP address.

Set up a STONITH fence

STONITH is a fencing strategy for maintaining the integrity of nodes in a HA cluster. STONITH service works at the node level and protects the cluster from nodes that are either unresponsive or in an unknown state. We recommend thefence_gce fencing device specialized for Compute Engine on Google Cloud.

Set up fencing devices

  1. Check if thefence_gce - Fence agent for Compute Engine is installed onnode1:

    sudo pcs stonith list | grep fence_gce

    For more information, see:

  2. Onnode-1, create thefence_gce fencing type resources for each of the participating nodes:

    sudo pcs stonith create node-1-fence fence_gce \plug=node-1 \zone=REGION-a \project=PROJECT_ID \pcmk_reboot_timeout=300 pcmk_monitor_retries=4 pcmk_delay_max=30 \op monitor interval="300s" timeout="120s" \op start interval="0" timeout="60s"sudo pcs stonith create node-2-fence fence_gce \plug=node-2 \zone=REGION-b \project=PROJECT_ID \pcmk_reboot_timeout=300 pcmk_monitor_retries=4 pcmk_delay_max=30 \op monitor interval="300s" timeout="120s" \op start interval="0" timeout="60s"sudo pcs stonith create node-3-fence fence_gce \plug=node-3 \zone=REGION-c \project=PROJECT_ID \pcmk_reboot_timeout=300 pcmk_monitor_retries=4 pcmk_delay_max=30 \op monitor interval="300s" timeout="120s" \op start interval="0" timeout="60s"

    ReplaceREGION with the region where the Linux VMs are deployed and replacePROJECT_ID with your project ID.

  3. You can test the status of the fencing agents by running the status command:

    sudo fence_gce -o status -n node-1 --zone=REGION-asudo fence_gce -o status -n node-2 --zone=REGION-bsudo fence_gce -o status -n node-3 --zone=REGION-c
  4. Create location constraints for your fencing devices to ensure that they are running only on the intended instances:

    sudo pcs constraint location node-1-fence avoids node-1sudo pcs constraint location node-2-fence avoids node-2sudo pcs constraint location node-3-fence avoids node-3
  5. Enable fencing in your pacemaker cluster and set the cluster fencing timeout:

    sudo pcs -f stonith_cfg property set stonith-enabled=truesudo pcs property set stonith-timeout="300s"
  6. Check the status of the cluster:

    sudo crm status

Test the fencing devices

After the setup of the fencing devices, we recommend you test them using the following steps.

  1. Stop the fence onnode-2:

    1. Connect tonode-1 and run the following command to test the fence device associated withnode-2 from your cluster.

      fence_gce -o off -n node-2 --zone=REGION-b
    2. Check the status of the cluster.

      sudo crm status
    3. You will also see thatnode-2 is turned off in Compute Engine.

      Go to Compute Engine

  2. Restart the fence onnode-2:

    1. Return tonode-1 and restart the instance again by running the following command.

      fence_gce -o on -n node-2 --zone=REGION-b
    2. Check the status of the cluster in Pacemaker and Compute Engine. After a short time, you will see thatnode-2 is back online.

      sudo crm status

Configure Corosync for delayed restart

To avoid timing issues and ensure a proper order of operations performed in case of a fencing action, we recommend delaying the restart of Corosync service for 60 seconds.

For more information, see theRed Hat knowledgebase article.

  1. Create a systemd drop-in file that sets a delay of the Corosync service start onnode-1,node-2, andnode-3:

    1. Open the corosync.service for edit.

      sudo systemctl edit corosync.service

    2. Append the following lines, save the file and exit the editor.

      [Service]ExecStartPre=/bin/sleep 60
    3. Reload the service manager and check if the configuration is taken into account.

      sudo systemctl daemon-reloadsystemctl status corosync.service --no-pager

      If you see the Drop-In section, then the settings in your drop-in file were successfully taken into account

Test failover

You are now ready to test if the failover works as expected.

  1. Connect to the Windows VM onnode-4 through Remote Desktop:
  2. Open a PowerShell session:
  3. Run the following script:

    while ($True){  $Conn = New-Object System.Data.SqlClient.SqlConnection  $Conn.ConnectionString = "Server=CLUSTER_ADDRESS;User ID=sa;Password=SA_PASSWORD;Initial Catalog=master"  $Conn.Open()  $Cmd = New-Object System.Data.SqlClient.SqlCommand  $Cmd.Connection = $Conn  $Cmd.CommandText = "SELECT @@SERVERNAME"  $Adapter = New-Object System.Data.SqlClient.SqlDataAdapter $Cmd  $Data = New-Object System.Data.DataSet  $Adapter.Fill($Data) | Out-Null  $Data.Tables[0] + (Get-Date -Format "MM/dd/yyyy HH:mm:ss")  Start-Sleep -Seconds 2}

    ReplaceCLUSTER_ADDRESS with the listener IP address andSA_PASSWORD with the password of theSA account on SQL Server.

    Every 2 seconds, the script connects to SQL Server by using the availabilitygroup listener or DNN listener, and queries the server name.

    Leave the script running.

  4. Return to SSH onnode-1 and run the commands to trigger a failover tonode-2:

    sudo pcs resource move ms-ag1 node-2 --mastersudo pcs resource move aoag1-group  node-2sudo pcs resource move aoag1-vip node-2
  5. Return to the PowerShell session onnode-4:

    1. Observe the output of the running script and notice that the server namechanges fromnode-1 tonode-2 as a result of the failover.
  6. Return tonode-1 and initiate a failback tonode-1:

    sudo pcs resource move ms-ag1 node-1 --mastersudo pcs resource move aoag1-group  node-1sudo pcs resource move aoag1-vip node-1
  7. Return to Powershell onnode-4 and stop the script by pressingCtrl+C.

Clean up

After you finish the tutorial, you can clean up the resources that you created so that they stop using quota and incurring charges. The following sections describe how to delete or turn off these resources.

Deleting the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-10-02 UTC.