Movatterモバイル変換


[0]ホーム

URL:


Ardent Performance Computing

Jeremy Schneider

Search

    >
    kubernetes,Planet,PostgreSQL,Technical

    Losing Data is Harder Than I Expected

    Posted by1 Comment
    Filed Under  ,,,,,,,,,,,

    This is a follow‑up to the last article:Run Jepsen against CloudNativePG to see sync replication prevent data loss. In that post, we set up a Jepsen lab to make data loss visible when synchronous replication was disabled — and to show that enabling synchronous replication prevents it under crash‑induced failovers.

    Since then, I’ve been trying to make data loss happen more reliably in the “async” configuration so students can observe it on their own hardware and in the cloud. Along the way, I learned that losing data on purpose is trickier than I expected.


    Methodology and a Kubernetes caveat

    To simulate an abrupt primary crash, the lab uses a forced pod deletion, which is effectively a kill -9 for Postgres:

    kubectl delete pod -l role=primary --grace-period=0 --force --wait=false

    This mirrors the very first sanity check I used to run on Oracle RAC clusters about 15 years ago: “unplug the server.” It isn’t a perfect simulation, but it’s a simple, repeatable crash model that’s easy to reason about.

    I should note thatthe labelrole is deprecated by CNPG and will be removed. I originally used it for brevity, but I will update the labs and scripts to use the labelcnpg.io/instanceRole instead.

    After publishing my original blog post, someone pointed out an important Kubernetes caveat with forced deletions:

    Irrespective of whether a force deletion is successful in killing a Pod, it will immediately free up the name from the apiserver. This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod

    https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/

    This caveat would apply to the CNPG controller just like a StatefulSet controller. In practice, for my tests, this caveat did not undermine the goal of demonstrating that synchronous replication prevents data loss. The lab includes an automation script (Exercise 3) to run the 5‑minute Jepsen test in a loop for many hours and collect results automatically.

    Hardware used included an inexpensive HP EliteBook (Ryzen Pro 5, $299 on Amazon) with two CNPG Lab VMs via Hyper‑V, plus multiple cloud instance types. I ran long‑burner loops (8–20 hours) and aggregated failure rates across configurations.

    I’m considering bringing Chaos Mesh into the lab in the future, but for now I’m sticking with the explicit crash model above because it’s easy for folks to see exactly what it does.

    High‑level results:

    • With synchronous replication: 1,061 five‑minute runs, 0 data‑loss failures.
    • With asynchronous replication: 1,448 runs, 478 data‑loss failures.

    These are the total counts across all runs from three different sets of experiments.

    Experiment 1: Checkpoints and replica count

    Hypothesis A: Increase replication traffic (shorter checkpoints which causes more FPWs) to raise odds of “unshipped” WAL at crash ⇒ more losses with async.

    Hypothesis B: Fewer replicas (2 instances total instead of 3) might make losses more likely.

    Each row below shows the fraction of async runs that showed data loss.

    I also ran two of the configurations with sync replication enabled. No data loss was observed in either of the runs with sync replication.

    Checkpoint3 instances2 instances
    5 min (default)5% [async results] / 0% [sync]24% [async results]
    30 second5% [async results]15% [async results] / 0% [sync]

    Findings: Hypothesis B was right—2 instances amplified data loss. Hypothesis A was wrong—shorter checkpoints did not increase loss rates here and even correlated with slightly fewer losses.

    Experiment 2: Jepsen rate and thread count

    I varied the transaction rate and the number of client threads. My intuition was that higher rates would increase the chance of a commit landing during a crash window, and that fewer threads might improve per‑thread throughput (given CPU saturation).

    Rate50 threads20 threads
    100024% [cf. experiment 1]8% [results]
    200051% [results]38% [results]
    300080% [results]39% [results]
    4000N/A [results]N/A

    Findings: Higher rates increased loss frequency (as expected). Reducing thread count lowered CPU pressure and but surprisingly it also reduced loss frequency—even when achieving similar rates. The “4000” rate did not complete successfully; Jepsen analysis stalled and timed out.

    The most reliable async configuration for provoking visible loss so far: 2 instances total, rate 3000, 50 threads.

    Experiment 3: Hardware differences

    To ensure reproducibility beyond my laptop, I repeated runs on several cloud instance types.

    Hardwareasyncsync
    AWS m7g26% [results]0% [results]
    AWS m6g23% [results]0% [results]
    Azure Dpsv651% [results]0% [results]
    HP Elitebook (Ryzen 5675U)75% [results]0% [results]

    I didn’t expect the spread in async failure rates. My current guess is that some combination of CPU and/or IO saturation characteristics change the window for unreplicated commits. The takeaway for teachers and students: if you want to reliably see data loss, Azure Dpsv6 performed best in my runs (about half of iterations saw data loss).

    What this means

    • Synchronous replication remains the guardrail. Across thousands of minutes of testing, I did not observe a single instance of data loss with sync enabled under these test configurations.
    • Topology matters. Two instances (one replica) increases the chance of async loss versus three instances.
    • Workload shape matters. Higher rates raise loss frequency; fewer client threads can reduce it even at similar throughput.
    • Hardware matters. Different CPU/IO profiles change how often you’ll catch an in‑flight commit during a crash.

    Reproduce it yourself

    Use the CloudNativePG LAB and Exercise 3 to run the Jepsen “append” workload and induce rapid primary failures. The looped test and automatic report upload are included. If your goal is to demonstrate loss in async mode, start with:

    • 2 instances
    • rate 3000
    • 50 threads

    If Jepsen analysis is stalling and timing out then try reducing the rate to 2000. And if you have the option, try Azure Dpsv6 for the highest chance of observing loss quickly.

    Unknown's avatar

    About Jeremy

    Building and running reliable data platforms that scale and perform.about.me/jeremy_schneider

    Discussion

    Trackbacks/Pingbacks

    1. Pingback:Losing Data with PostgreSQL and Jepsen – Curated SQL -September 29, 2025

    Leave a New CommentCancel reply

    This site uses Akismet to reduce spam.Learn how your comment data is processed.

    Disclaimer

    This is my personal website. The views expressed here are mine alone and may not reflect the views of my employer.I am currently looking for consulting and/or contracting work in the USA around the oracle database ecosystem.

    contact:312-725-9249 orschneider @ ardentperf.com


    https://about.me/jeremy_schneider

    oaktableocmaceracattack

    (a)

    Enter your email address to receive notifications of new posts by email.

    Email Address:

    Join 73 other subscribers

    Recent Posts

    Recent Comments

    Ardent Performance Computing

    Create a website or blog at WordPress.com

     

    Loading Comments...
     


      [8]ページ先頭

      ©2009-2025 Movatter.jp