Data Safety on a Budget

Posted byJeremy⋅October 4, 2025⋅Leave a comment

Filed Under budget,cost,data loss,database,durability

Many experienced DBAs joke that you can boil down the entire job to a single rule of thumb: Don’t lose your data. It’s simple, memorable, and absolutely true – albeit a little oversimplified.

Mark Porter’s Cultural Hint “The Onion of our Requirements” conveys the same idea with a lot more accuracy:

We need to always make sure we prioritize our requirements correctly. In order, we think about Security, Durability, Correctness, Availability, Scalability via Scale-out, Operability, Features, Performance via Scaleup, and Efficiency. What this means is that for each item on the left side, it is more important than the items on the right side.

But this does not tell the whole story. If we’re honest, there is one critical principle of equal importance to everything on this list: Don’t lose all your money.

Every adult who’s managed their own finances knows we don’t have infinite money. Yes we want to keep the data safe. We also want to be smart about spending our money.

Relational databases are one of the most powerful and versatile places to store your data – and they are also one of the most expensive places to store your data. Just look at the per-GB pricing of block storage with provisioned IOPS and low latency, then compare with the pricing of object storage. No contest. Any time a SQL database is beginning to approach the TB range, we definitely should be looking at the largest tables and asking whether significant portions of that data can be moved to cheaper storage – for example parquet files on S3. (Or F3 files?)

Of course, sometimes we need fast powerful SQL and joins and transactions. So relational databases also should run as efficiently as possible. This has direct implications around how we keep the data safe.

From personal photos to enterprise databases, the core of all data safety is copies of the data. Logs and row-store/column-store files (and indexes) are data copies in different formats. You could almost parse the entire database industry through a lense that compares how each technology is just a unique way to replicate data between different formats and places. The revered and time-honored “3-2-1 Backup Rule” is all about copies of the data. From an information theory standpoint, it can be argued that even RAID5 parity, checksums, CRCs, and hashes are a shadow or fingerprint “copy” of the original – even though they aren’t literal full copies of the data.

One of my favorite cultural hints from Mark is: Don’t Let Entropy Win.

In the absence of people making things better, they will get worse. It’s just a fact.

This isn’t Mark’s point, but I think it’s a related concept: at every business that’s successful enough to grow large, there is a natural gravitation toward forming silos of technology. I think of this as a kind of entropy that we need to actively counteract in every large business. Lets look at an example where an enterprise business team building a public API needs a 600GB write-intensive database. Suppose we can buy enterprise grade high-endurance NVMe SSDs (handling write-intensive database workloads) for $1000 each. How much will the storage cost to “keep the data safe” for this public API?

The business team provisions three environments: one for production and two more for development and testing.
For business continuity in case of regional problems, the database team creates primary and replica CloudNativePG clusters, so that we are able to run from either of our two regions.
To maintain high availability, the database team configures CloudNativePG with three instance within each region and they configure preferred anti-affinity so that kubernetes will attempt to schedule the three instances in different buildings or availability zones.
Persistent storage is provided by the storage team who configures ceph volumes backed by two mirror copies.
Object storage for backups uses two mirror copies.
Servers are built by the infrastructure team who configure RAID 1 (mirroring).

In the worst case, we can easily end up spending $96,000 on disks alone – for a database that can fit on a single $1000 enterprise drive! Now that is some crazy storage amplification.

In order to take a smarter approach, lets work backwards from the problems we’re solving. When we say “keep the data safe” – what are some specific situations we want to protect the data from?

Unavailability during maintenance & deployments at all levels of the stack
Operational mistakes
Software bugs at all levels of the stack, from business app to firmware
Hardware failures of disks
Hardware failures of servers/compute which can make good disks temporarily inaccessable
External threats from direct attacks, malware, social engineering, supply chain attacks, etc
Insider threats arising from situations like personal grievances or personal financial pressures
Natural disasters (and perhaps political disasters…)

Armed with a list, we can now ask ourselves: what is an economical solution that addresses everything here? There isn’t one right answer but we probably don’t need 12 physical copies of each database per data center. A few ideas:

Three CNPG instances that use local SSD storage directly (no hardware RAID), for a total of three copies in the primary data center.
Two or three CNPG instances that use either ceph block storage or local SSD with hardware RAID (but not both) for a total of four or six copies in the primary data center.
A single CNPG instance in the second data center, with the capability to dynamically add instances on switchovers/failovers.
Slower, less expensive disks for development databases.
No CNPG instance for immediate switchover/failover of development databases in second data center.
Testing tier that matches production config but can be provisioned on demand from backups for load testing, and deprovisioned when unused for some period of time. Development tier also provisioned on demand and deprovisioned when unused for some period of time.

There are many ways to keep data safe on a reasonable budget – these are just a few ideas.

About Jeremy

Building and running reliable data platforms that scale and perform.about.me/jeremy_schneider

View all posts by Jeremy»

« Postgres Replication Links

Testing CloudNativePG Preferred Data Durability»

Discussion

No comments yet.

Leave a New CommentCancel reply

This site uses Akismet to reduce spam.Learn how your comment data is processed.

Disclaimer

This is my personal website. The views expressed here are mine alone and may not reflect the views of my employer.I am currently looking for consulting and/or contracting work in the USA around the oracle database ecosystem.

contact:312-725-9249 orschneider @ ardentperf.com