Movatterモバイル変換


[0]ホーム

URL:


Packt
Search iconClose icon
Search icon CANCEL
Subscription
0
Cart icon
Your Cart(0 item)
Close icon
You have no products in your basket yet
Save more on your purchases!discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Profile icon
Account
Close icon

Change country

Modal Close icon
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timerSALE ENDS IN
0Days
:
00Hours
:
00Minutes
:
00Seconds
Home> Programming> High Performance Programming> PostgreSQL High Performance Cookbook
PostgreSQL High Performance Cookbook
PostgreSQL High Performance Cookbook

PostgreSQL High Performance Cookbook: Mastering query optimization, database monitoring, and performance-tuning for PostgreSQL

Arrow left icon
Profile Icon ChauhanProfile Icon Dinesh Kumar
Arrow right icon
€8.98€36.99
eBookMar 2017360 pages1st Edition
eBook
€8.98 €36.99
Paperback
€45.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon ChauhanProfile Icon Dinesh Kumar
Arrow right icon
€8.98€36.99
eBookMar 2017360 pages1st Edition
eBook
€8.98 €36.99
Paperback
€45.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€8.98 €36.99
Paperback
€45.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature iconInstant access to your Digital eBook purchase
Product feature icon Download this book inEPUB andPDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature iconDRM FREE - Read whenever, wherever and however you want

Contact Details

Modal Close icon
Payment Processing...
tickCompleted

Billing Address

Table of content iconView table of contentsPreview book icon Preview Book

PostgreSQL High Performance Cookbook

Chapter 1. Database Benchmarking

In this chapter, we will cover the following recipes:

  • CPU benchmarking
  • Memory benchmarking
  • Disk benchmarking
  • Performing a seek rate test
  • Working with the fysnc commit rate
  • Checking IOPS
  • Storage sizing
  • Discussing RAID levels
  • Configuring pgbench
  • Running read/write pgbench tests

Introduction

PostgreSQL is renowned in the database management system world. With every PostgreSQL release, it's gaining in popularity due to its advanced features and performance. This cookbook is especially designed to give more information about most of the major features in PostgreSQL, and also how to achieve good performance with the help of proper hardware/software benchmarking tools. This cookbook is also designed to discuss, all the high availability options we can achieve with PostgreSQL, and also give some details about how to migrate your database from other commercial databases.

To benchmark the database server, we need to benchmark several hardware/software components. In this chapter, we will discuss major tools that are especially designed to benchmark a certain component.

I would like to say thanks to the Phoronix Test Suite team, for allowing me to discuss their benchmarking tool. Phoronix is an open source benchmarking framework, which, by default, provides test cases for several hardware/software components, thanks to its extensible architecture, where we can write our own test suite with the set of benchmarking test cases. Phoronix also supports to upload your benchmarking results tohttp://openbenchmarking.org/, which is a public/private benchmark results repository, where we can compare our your benchmarking results with others.

Note

Go to the following URL for installation instructions for Phoronix Test Suite:http://www.phoronix-test-suite.com/?k=downloads.

CPU benchmarking

In this recipe, let's discuss how to benchmark the CPU speed using various open source benchmarking tools.

Getting ready

One of the ways to benchmark CPU power is by measuring the wall clock time for the submitted task. The task can be like calculating the factorial of the given number, or calculating thenth Fibonacci number, or some other CPU-intensive task.

How to do it...

Let us discuss about how to configure phoronix and sysbench tools to benchmark the CPU:

Phoronix

Phoronix supports a set of CPU tests in a test suite called CPU. This test suite covers multiple CPU-intensive tasks, which are mentioned at the following URL:https://openbenchmarking.org/suite/pts/CPU.

If you want to run this CPU test suite, then you need to execute the Phoronix Test Suite benchmark CPU command as a root user. We can also run a specific test by mentioning its test name. For example, let's run a sample CPU benchmarking test as follows:

$ phoronix-test-suite benchmark pts/himeno Phoronix Test Suite v6.8.0     To Install: pts/himeno-1.2.0     ...     1 Test To Install     pts/himeno-1.2.0:         Test Installation 1 of 1         1 File Needed         Downloading: himenobmtxpa.tar.bz2  Started Run 2 @ 05:53:40  Started Run 3 @ 05:54:35  [Std. Dev: 1.66%] Test Results:  1503.636072 1512.166077 1550.985494Average: 1522.26 MFLOPS

Phoronix also provides a way to observe the detailed test results via HTML file. Also, it supports the offline generation of PDF, JSON, CSV, and text format outputs. To open these test results in the browser, we need to execute the following command:

$ phoronix-test-suite show-result <Test Name>

The following is a sample screenshot of the results of the preceding command:

Phoronix

sysbench

Thesysbench tool provides a CPU task, which calculates the number of prime numbers within a given range and provides the CPU-elapsed time. Let's execute thesysbench command as shown in the following screenshot, to retrieve the CPU measurements:

[root@localhost ~]# sysbench --test=CPU --CPU-max-prime=10000 --num-threads=4 runDoing CPU performance benchmarkThreads started!Done.Maximum prime number checked in CPU test: 10000Test execution summary:        total time:                          3.2531s    total number of events:              10000    total time taken by event execution: 13.0040    per-request statistics:         min:                                  1.10ms         avg:                                  1.30ms         max:                                  8.60ms         approx.  95 percentile:               1.43msThreads fairness:        events (avg/stddev):           2500.0000/8.46    execution time (avg/stddev):   3.2510/0.00

How it works...

The preceding results are collected from CentOS 7, which was running virtually on a Windows 10 machine. The virtual machine has four processing units (CPU cores) of Intel Core i7-4510U of CPU family six.

Phoronix

The URL http://openbenchmarking.org/ provides a detailed description of each test detail along with its implementation, and would encourage you to read more information about the himeno test case.

sysbench

From the previous results, the system takes 3.2531 seconds to compute the 10,000 prime numbers, with the help of four background threads.

Memory benchmarking

In this recipe, we will be discussing how to benchmark the memory speed using open source tools.

Getting ready

As with the CPU test suite, phoronix supports one another memory test suite, which covers RAM benchmarking. Otherwise, we can also use a dedicated memtest86 benchmarking tool, which performs memory benchmarking during a server bootup phase. Another neat trick would be to create a tmpfs mount point in the RAM and then create a tablespace on it in PostgreSQL. Once we create the tablespace, we can then create in-memory tables, where we can benchmark the table read/write operations. We can also use the dd command to measure the memory read/write operations.

How to do it...

Let us discuss how to install phoronix and how to configure thetmpfs mount point in Linux:

Phoronix

Let's execute the followingphoronix command, which will install the memory test suit and perform memory benchmarking. Once the benchmarking is completed, as aforementioned, observe the HTML report:

$ phoronix-test-suite benchmark pts/memoryPhoronix Test Suite v6.8.0  Installed: pts/ramspeed-1.4.0To Install: pts/stream-1.3.1To Install: pts/cachebench-1.0.0

tmpfs

In Linux,tmpfs is a temporary filesystem, which uses the RAM rather than the disk storage. Anything we store intmpfs will be cleared once we restart the system:

Note

Refer to the URL for more information about tmpfs:https://en.wikipedia.org/wiki/Tmpfs and https://www.jamescoyle.net/knowledge/1659-what-is-tmpfs.

Let's create a new mount point based ontmpfs using the following command:

# mkdir -p /memmount# mount -t tmpfs -o size=1g tmpfs /memmount# df -kh -t tmpfsFilesystem      Size  Used Avail Use% Mounted ontmpfs           1.9G   96K  1.9G   1% /dev/shmtmpfs           1.9G  8.9M  1.9G   1% /runtmpfs           1.9G     0  1.9G   0% /sys/fs/cgrouptmpfs           1.0G     0  1.0G   0% /memmount

Let's create a new folder inmemmount and assign it to the tablespace.

# mkdir -p /memmount/memtabspace# chown -R postgres:postgres /memmount/memtabspace/postgres=# CREATE TABLESPACE memtbs LOCATION '/memmount/memtabspace';CREATE TABLESPACEpostgres=# CREATE TABLE memtable(t INT) TABLESPACE memtbs;CREATE TABLE

Write test

postgres=# INSERT INTO memtable VALUES(generate_series(1, 1000000));INSERT 0 1000000Time: 1372.763 mspostgres=# SELECT pg_size_pretty(pg_relation_size('memtable'::regclass));  pg_size_pretty---------------- 35 MB(1 row)

From the preceding results, to insert 1 million records it took approximately 1 second with a writing speed of 35 MB per second.

Read test

postgres=# SELECT COUNT(*) FROM memtable;count---------1000000(1 row)Time: 87.333 ms

From the preceding results, to read the 1 million records it took approximately 90 milliseconds with a reading speed of 385 MB per second, which is pretty fast for the local system configuration. The preceding read test was performed after clearing the system cache and by restarting the PostgreQSL instance, which avoids the system buffers.

How it works...

In the preceding tmpfs example, we created an in-memory table, and all the system calls PostgreQSL tries to perform to read/write the data will be directly affecting the memory rather than the disk, which gives a major performance boost. Also, we need to consider to drop these in-memory tablespace, tables after testing, since these objects will physically vanish after system reboot.

Disk benchmarking

In this recipe, we will be discussing how to benchmark the disk speed using open source tools.

Getting ready

The well-known command to perform disk I/O benchmarking is dd. We all use the dd command to measure read/write operations by specifying the required block size, and we also measure the direct I/O by skipping the system write buffers. Similarly, phoronix supports a complete test suite for the disk as CPU and memory that perform different storage-related tests. Another famous disk benchmarking tool is bonnie++, which provides more flexibility in measuring the disk I/O.

How to do it...

Let us discuss how to run the disk benchmarking using phoronix and using bonnie++ testing tools:

Phoronix

To run the complete disk test suite on the system, run the following command:

$ phoronix-test-suite benchmark pts/disk

Phoronix also supports a quick I/O test case, where you can perform an instant disk performance test using the following command test, which is interactive and collects the input, and then runs the test cases:

$ phoronix-test-suite benchmark pts/iozonePhoronix Test Suite v6.8.0    Installed: pts/iozone-1.8.0Disk Test Configuration        1: 4Kb        2: 64Kb        3: 1MB        4: Test All Options        Record Size: 1      1: 512MB    2: 2GB    3: 4GB    4: 8GB    5: Test All Options    File Size: 1    1: Write Performance    2: Read Performance    3: Test All Options    Disk Test: 3

bonnie++

bonnie++ is a filesystem and disk-level benchmarking tool and can perform the same test multiple times. You can install this tool using eitheryum orapt-get install or installing it via the source code. Let's run the bulk I/O test case using the following arguments, where it tries to create 8 GB files:

$ /usr/local/sbin/bonnie++ -D -d /tmp/ -s 8G -bWriting with putc()...doneWriting intelligently...done...localhost.localdomain,8G,68996,106,14151,53,46772,15,95343,93,123633,16,201.0,7,16,795,58,+++++,+++,733,46,757,57,+++++,+++,592,38

How it works...

Let us discuss how the bonnie++ performs the benchmarking, and what are all the tools bonnie++ offers to understand the benchmarking results:

bonnie++

From the preceding test case, we provided the results the bonnie++ as to use only direct I/O using the -D option. Also, we asked to create 8 GB random files in the/tmp/ location to measure the disk speed. As the final output from bonnie++, we will get CSV values, which we need to feed to the bon_csv2html command, which provides some detailed information about the test results, as shown in the following screenshot:

$ echo "localhost.localdomain,8G,68996,106,14151,53,46772,15,95343,93,123633,16,201.0,7,16,795,58,+++++,+++,733,46,757,57,+++++,+++,592,38"|bon_csv2html > ~/Desktop/bonresults.html

bonnie++

bonnie++ performs three different tests for disk benchmarking. They are read, write and then seek speed. We will be discussing the seek rate in the further topics. The bonnie++ do always recommend to have high number in/sec section in the preceding table, and lower % CPU values for better disk performance. Also,++++ shows that the test was not performed accurately by bonnie++, as the test was incomplete with the provided arguments. To get the complete results, we need to rerun the same test multiple times using the -n option, where bonnie will get enough time/resources to complete the job.

Performing a seek rate test

In this recipe, we will be discussing how to benchmark the disk seek rate speed using open source tools.

Getting ready

A file can be read from the disk in two ways: sequentially and at random. Reading a file in sequential order requires less effort than reading a file in random order. In PostgreSQL and other database systems, a file needs to be scanned in random order as per the index scans. During the index scans, as per the index lookups, the relation file needs to fetch the data randomly, by moving its file pointer backward and forward, which needs an additional mechanical overhead in spinning the disk in the normal HDD. In SSD, this overhead is lower as it uses the flash memory. This is one of the reasons why we define that random_page_cost as always higher thanseq_page_cost inpostgresql.conf. In the previous bonnie++ example, we have random seeks, which were measured per second as 201.0 and used 7% of the CPU.

How to do it...

We can use the same bonnie++ utility command to measure the random seek rate, or we can also use another disk latency benchmarking tool called ioping:

# ioping -R /dev/sda3 -s 8k -w 30--- /dev/sda3 (block device 65.8 GiB) ioping statistics ---2.23 k requests completed in 29.2 s, 17.5 MiB read, 76 iops, 613.4 KiB/sgenerated 2.24 k requests in 30.0 s, 17.5 MiB, 74 iops, 596.2 KiB/smin/avg/max/mdev = 170.6 us / 13.0 ms / 73.5 ms / 5.76 ms

How it works...

Ioping is a disk latency benchmarking tool that produces an output similar to the network utility command ping. This tool also provides no cache or with cache disk benchmarking as bonnie++ and also includes synchronous and asynchronous I/O latency benchmarking. You can install this tool using yum or apt-get in the respective Linux distributions. The preceding results were generated based on PostgreSQL's default block size of 8 KB, which ran for 30 seconds. Ioping provides another useful feature called ping-pong mode for read/write. This mode displays the instant read/write speed of the disk as shown in the following screenshot:

$ ioping -G /tmp/ -D -s 8k8 KiB >>> /tmp/ (xfs /dev/sda3): request=1 time=1.50 ms (warmup)8 KiB <<< /tmp/ (xfs /dev/sda3): request=2 time=9.73 ms8 KiB >>> /tmp/ (xfs /dev/sda3): request=3 time=2.00 ms8 KiB <<< /tmp/ (xfs /dev/sda3): request=4 time=1.02 ms8 KiB >>> /tmp/ (xfs /dev/sda3): request=5 time=1.95 ms

In the preceding example, we ran ioping in ping-pong mode (-G) and used the direct I/O (-D) with a block size of 8 KB. We can also run the same ping-pong mode in pure cache mode using the (-C) option.

Working with the fsync commit rate

In this recipe, we will be discussing how to benchmark the fsync speed using open source tools.

Getting ready

Fsync is a system call that flushes the data from system buffers into physical files. In PostgreSQL, whenever a CHECKPOINT operation occurs, it internally initiates the fsync, to flush all the modified system buffers into the respective files. The fsync benchmarking defines the transfer ratio of data from memory to the disk.

How to do it...

To perform fsync benchmarking, we can use a dedicated benchmark test calledfs-mark from Phoronix. Thisfs-mark test was built based on a filesystem benchmarking tool calledfs_mark, orfio, which supports severalfsync test cases. We can run thisfs-mark test case using the following command:

$ phoronix-test-suite benchmark fs-mark FS-Mark 3.3:    pts/fs-mark-1.0.1Disk Test Configuration1: 1000 Files, 1MB Size    2: 1000 Files, 1MB Size, No Sync/FSync    3: 5000 Files, 1MB Size, 4 Threads    4: 4000 Files, 32 Sub Dirs, 1MB Size    5: Test All Options    Test:

Note

The preceding command failed to install while testing on the local machine. Once I installedglibc-static viayum install, then the test went smooth.

How it works...

Phoronix installs all the binaries on the local machine when we start benchmarking the corresponding test. In the preceding command, we are benchmarking the testfs-mark, where it installs the tool at~/.phoronix-test-suite/installed-tests/pts/fs-mark-1.0.1/fs_mark-3.3. Let's go to the location, and let's see what fsync tests it supports:

./fs_mark -helpUsage: fs_mark        -S Sync Method (         0:No Sync,    1:fsyncBeforeClose,    2:sync/1_fsync,    3:PostReverseFsync,    4:syncPostReverseFsync,    5:PostFsync,    6:syncPostFsync)

I would encourage you to read the readme file, which exists in the same location, for detailed information about the sync methods. Let's run a simplefs_mark benchmarking by choosing one sync method as shown in the following here:

./fs_mark -w 8096 -S 1 -s 102400 -d /tmp/ -L 3 -n 500#  ./fs_mark  -w  8096  -S  1  -s  102400  -d  /tmp/  -L  3  -n  500#       Version 3.3, 1 thread(s) starting at Fri Dec 30 04:26:28 2016#       Sync method: INBAND FSYNC: fsync() per file in write loop.#       Directories:  no subdirectories used#       File names: 40 bytes long, (16 initial bytes of time stamp with 24 random bytes at end of name)#       Files info: size 102400 bytes, written with an IO size of 8096 bytes per write#       App overhead is time in microseconds spent in the test not doing file writing related system calls.FSUse%        Count         Size    Files/sec     App Overhead    39          500       102400        156.4            1790339         1000       102400         78.9            2290639         1500       102400        116.2            24269

We ran the preceding test with write files of size 102,400 and block size of 8,096. The number of files it needs to create is 500 and it needs to repeat the test three times by choosing sync method 1, which closes the file after writing the content to disk.

Checking IOPS

In this recipe, we will be discussing how to benchmark the disk IOPS using open source tools.

Getting ready

As mentioned previously, a disk can be read in either sequential or random orders. To measure the disk accurately, we need to perform more random read/write operations, which gives more stress to the disk. To calculate theIOPS (Input/Output Per Second) of a disk, we can either use fio or bonnie++ tools, which do sequential/random operations over the disk. In this chapter, let's use the fio (Flexible I/O) tool to calculate the IOPS for the disk.

How to do it...

Let's download the latest version of thefio module fromhttp://brick.kernel.dk/snaps/, also download libaio-devel, which would be theioengine we will be using for the IOPS. Thisioengine defines, how thefio module needs to submit the I/O requests to the kernel. There are multiple ioengines you can specify for the I/O requests such as sync,mmap, and so on. You can refer to the main page of fio for all the supported ioengines. After downloading the fio module, let's follow the regular Linux source installation method asconfigure,make, andmake install.

Sequential mixed read and write

Let's run a sample sequential mixed read/write, as shown here:

$ ./fio --ioengine=libaio --direct=1 --name=test_seq_mix_rw --filename=test_seq --bs=8k --iodepth=32 --size=1G --readwrite=rw --rwmixread=50test_seq_mix_rw: (g=0): rw=rw, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio, iodepth=32......test_seq_mix_rw: (groupid=0, jobs=1): err= 0: pid=43596: Fri Dec 30 23:31:11 2016  read : io=525088KB,bw=1948.1KB/s,iops=243 , runt=269430msec...    bw (KB/s)  : min=   15, max= 6183, per=100.00%, avg=2002.59, stdev=1253.68  write: io=523488KB,bw=1942.1KB/s,iops=242 , runt=269430msec...    bw (KB/s)  : min=  192, max= 5888, per=100.00%, avg=2001.74, stdev=1246.19...Run status group 0 (all jobs):   READ: io=525088KB, aggrb=1948KB/s, minb=1948KB/s, maxb=1948KB/s, mint=269430msec, maxt=269430msec  WRITE: io=523488KB, aggrb=1942KB/s, minb=1942KB/s, maxb=1942KB/s, mint=269430msec, maxt=269430msecDisk stats (read/write):  sda: ios=65608/65423, merge=0/5, ticks=869519/853644, in_queue=1723445, util=99.85%

Random mixed read and write

Let's run a sample random mixed read/write, as shown here:

$ ./fio --ioengine=libaio --direct=1 --name=test_rand_mix_rw --filename=test_rand --bs=8k --iodepth=32 --size=1G --readwrite=randrw --rwmixread=50test_rand_mix_rw: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio, iodepth=32......test_rand_mix_rw: (groupid=0, jobs=1): err= 0: pid=43893: Fri Dec 30 23:49:19 2016  read : io=525088KB,bw=1018.9KB/s,iops=127 , runt=515375msec...    bw (KB/s)  : min=    8, max= 6720, per=100.00%, avg=1124.47, stdev=964.38  write: io=523488KB,bw=1015.8KB/s,iops=126 , runt=515375msec...    bw (KB/s)  : min=    8, max= 6904, per=100.00%, avg=1125.46, stdev=975.04...Run status group 0 (all jobs):   READ: io=525088KB, aggrb=1018KB/s, minb=1018KB/s, maxb=1018KB/s, mint=515375msec, maxt=515375msec  WRITE: io=523488KB, aggrb=1015KB/s, minb=1015KB/s, maxb=1015KB/s, mint=515375msec, maxt=515375msecDisk stats (read/write):  sda: ios=65609/65456, merge=0/4, ticks=7382037/5520238, in_queue=12902772, util=100.00%

How it works...

We ran the preceding test cases to work on 1 GB (--size) file without any cache (--direct), by doing 32 concurrent I/O requests (--iodepth), with a block size of 8 KB (--bs) as 50% read and 50% write operations (--rwmixread). From the preceding sequential test results, thebw (bandwidth), IOPS values are pretty high when compared with random test results. That is, in sequential test cases, we gain approximately 50% more IOPS (read=243,read=242) than with the random IOPS (read=127,write=126).

Fio also provides more information such, as I/O submission latency and complete latency, along with CPU usage on the conducted test cases. I would encourage you to read more useful information about fio's features from its man pages.

Storage sizing

In this recipe, we will be discussing how to estimate disk growth using the pgbench tool.

Getting ready

One of the best practices to predict the database disk storage capacity is by loading a set of sample data into the application's database, and simulating production kind of actions using pgbench over a long period. For a period of time (every 1 hour), let's collect the database size usingpg_database_size() or any native command, which returns the disk usage information. Once we get the periodic intervals for at least 24 hours, then we can find an average disk growth ratio by calculating the average of delta among each interval value.

How to do it...

Prepare the SQL script as follows, which simulates the live application behavior in the database:

Create connection;            --- Create/Use pool connection.INSERT operation              --- Initial write operation.SELECT pg_sleep(0.01);        --- Some application code runs here, and waiting for the next query.UPDATE operation              --- Update other tables for the newly inserted records.SELECT pg_sleep(0.1);         --- Updating other services which shows the live graphs on the updated records.DELETE operation              --- Delete or purge any unnecessary data.SELECT pg_sleep(0.01);        --- Some application code overhead.

Let's run the following pgbench test case, with the preceding test file for 24 hours:

$ pgbench -T 86400 -f <script location> -c <number of concurrent connections>

In parallel, let's schedule a job that collects the database size every hour using the pg_database_size() function, also schedule another job to run for every 10 minutes, which run the VACUUM on the database. This VACUUM job takes care of reclaiming the dead tuples logically at database level. However, in production servers, we will not deploy the VACUUM job to run for every 10 minutes, as the autovacuum process takes care of the dead tuples. As this test is not for database performance benchmarking, we can also make autovacuum more aggressive on the database side as well.

How it works...

Once we find the average disk growth per day, we can predict the database growth for the next 1 or 2 years. However, the database write rate also increases with the business growth. So, we need to deploy the database growth script or we need to analyze any disk storage trends from the monitoring tool to make a better prediction of the storage size.

Discussing RAID levels

In this recipe, we will be discussing about various RAID levels and their unique usage.

Getting ready

In this recipe, we will be discussing several RAID levels, which we configure for database requirements.RAID (Redundant Array of Interdependent Disks) has a dedicated hardware controller to deal with multiple disks, including a separate processor along with a battery backup cache, where data can be flushed to disk properly when a power failure occurs.

How to do it...

RAID levels can be differentiated as per their configurations. RAID supports configuration techniques such as striping, mirroring, and parity to improve the disk storage performance, or high availability. The most popular RAID levels are zero to six, and each level provides its own kind of disk storage capacity, read/write performance and high availability. The common RAID levels we configure for DBMS are 0, 1, 5, 6, or 10 (1 and 0).

How it works...

Let us discuss about how the mostly used RAID level works:

RAID 0

This configuration only focuses on read/write performance by striping the data across multiple devices. With this configuration, we can allocate the complete disk storage for the applications data. The major drawback in this configuration is no high availability. In the case of any single disk failure, it will cause the remaining disks to be useless as they are missing the chunks from the failed disk. This is a not recommended RAID configuration for real-time database systems, but it is a recommended configuration for storing non-critical business data such as historical application logs, database logs, and so on.

RAID 1

This configuration is only to focus on high availability rather than on performance, by broadcasting the data among two disk drives. That is, a single copy of the data will be kept on two disks. If one disk is corrupted, then we can still use the other one for read/write operations. This is also not a recommended configuration for real-time database systems, as it is lacking the write performance. Also, in this configuration, we will be utilizing 50% of the disk to store the actual data, and the rest to keep its duplicated information for high availability. This is a recommended configuration where the durability of data matters when compared with write performance.

RAID 5

This configuration provides more storage and high availability on the disk, by storing the parity blocks across the disks. Unlike RAID 1, it offers more disk space to keep the actual data, as parity blocks are spread among the disks. In any case, if one disk is corrupted, then we can use the parity blocks from the other disk, to fetch the missing data. However, this is also not a recommended configuration, since every read/write operation on the disk needs to process the parity blocks, to get the actual data out of it.

RAID 6

This configuration provides more redundancy than RAID 5 by storing the two parity blocks information for each write operation. That is, if both disks become corrupted, RAID 6 can still get the data from the parity blocks, unlike RAID 5. This configuration is also not recommended for the database systems, as write performance is less as compared than previous RAID levels.

RAID 10

This configuration is the combination of RAID levels 0 and 1. That is, the data will be striped to multiple disks and will be replicated to another disk storage. It is the most recommended RAID level for real-time business applications, where we achieve a better performance than with RAID 1, and higher availability than RAID 0.

Configuring pgbench

In this recipe, we will be discussing how to configure the pgbench to perform various test cases.

Getting ready

By default, PostgreSQL provides a tool, pgbench, which performs a default test suite based on TPC-B, which simulates the live database load on the servers. Using this tool, we can estimate the tps (transactions per second) capacity of the server, by conducting a dedicated read or read/write test cases. Before performing the pgbench test cases on the server, we need to fine-tune the PostgreSQL parameters and make them ready to fully utilize the server resources. Also, it's good practice to run pgbench from a remote machine, where the network latency is trivial among the nodes.

How to do it...

As aforementioned, pgbench simulates a TPC-B-like workload on the servers, by executing three update statements, followed bySELECT andINSERT statements into different pre-defined pgbench tables, and if we want to use those pre-defined tables, then we would need to initiate pgbench using the -i or--initialize options. Otherwise, we can write a customized SQL script.

To get effective results from pgbench, we need to fine-tune the PostgreSQL server with the following parameters:

Parameter

Description

shared_buffers

This is the amount of memory for database operations

huge_pages

This improves the OS-level memory management

work_mem

This is the amount of memory for each backend for data operations such as sort, join, and so on

autovacuum_work_mem

This is the amount of memory for postgres internal process autovacuum

max_worker_processes

This is the maximum number of worker processes for the database

max_parallel_workers_per_gather

This is the number of worker processes to consider for a gather node type

max_connections

This is the number of database connections

backend_flush_after

Do fsync, after this many bytes have been flushed to disk by each user process

checkpoint_completion_target

This is used to set I/O usage ratio during the checkpoint

archive_mode

This is used to determine whether the database should run in the archive mode

log_lock_waits

This is used to log useful information about concurrent database locks

log_temp_files

This is used to log information about the database's temporary files

random_page_cost

This is used to set random page cost value for the index scans

Note

You can also find other parameters at the following URL, which are also important before conducting any benchmarking on the database server:https://www.postgresql.org/docs/9.6/static/runtime-config.html.

Another good practice to get good performance is to keep the transaction logs (pg_xlog) in another mount point, and also have unique tablespaces for tables and indexes. While performing the pgbench testing with predefined tables, we can specify these unique tablespaces using the--index-tablespace and--tablespace options.

How it works...

As we discussed earlier, pgbench is a TPC-B benchmarking tool for PostgreSQL, which simulates the live transactions load on the database server by collecting the required metrics such astps,latency, and so on. Using pgbench, we can also increase the database size by choosing the test scale factor while using predefined tables. If you wanted to test multiple concurrent connections to the database and wanted to use the pooling mechanism, then it's good practice to configure the pgbouner/pgpool on the local database node to reuse the connections.

Note

For more features and options with the pgbench tool, visithttps://www.postgresql.org/docs/9.6/static/pgbench.html.

Running read/write pgbench test cases

In this recipe, we will be discussing how to perform various tests using the pgbench tool.

Getting ready

Using pgbench options, we can benchmark the database for read/write operations. Using these measurements, we can estimate the disk read-write speed by including the system buffers. To perform a read-write-only test, then either we can go with pgbench arguments, or create a custom SQL script with the requiredSELECT,INSERT,UPDATE, orDELETE statements, then execute them with the required number of concurrent connections.

How to do it...

Let us discuss about read-only and write-only in brief:

Read-only

To perform read-only benchmarking with pgbench predefined tables, we need to use the -S option. Otherwise, as we discussed earlier, we need to prepare a SQL file with the requiredSELECT statements.

Write-only

To perform write-only benchmarking with pgbench predefined tables, we need to use the -N or-b simple-update options. Otherwise, as we discussed earlier, we have to prepare a SQL file with the requiredUPDATE,DELETE, andINSERT statements.

How it works...

While running read-only test cases, it's good practice to measure the database cache hit ratio, which defines the reduction in I/O usage. You can get the database hit ratio using the following SQL command:

postgres=# SELECT TRUNC(((blks_hit)/(blks_read+blks_hit)::numeric)*100, 2) hit_ratio FROM pg_stat_database WHERE datname = 'postgres';hit_ratio-----------99.69(1 row)

Also, if we enabletrack_io_timing inpostgresql.conf, it will provide some information about disk blocks read/write operations by each backend process. We can get these disk I/O timing values from the pg_stat_database catalog view.

Note

Refer to the following URL, where pgbench supports various test suites, such as disk, CPU, memory, and so on: https://wiki.postgresql.org/wiki/Pgbenchtesting.

Left arrow icon

Page1 of 12

Right arrow icon
Download code iconDownload Code

Key benefits

  • Perform essential database tasks such as benchmarking the database and optimizing the server’s memory usage
  • Learn ways to improve query performance and optimize the PostgreSQL server
  • Explore a wide range of high availability and replication mechanisms to build robust, highly available, scalable, and fault-tolerant PostgreSQL databases

Description

PostgreSQL is one of the most powerful and easy to use database management systems. It has strong support from the community and is being actively developed with a new release every year. PostgreSQL supports the most advanced features included in SQL standards. It also provides NoSQL capabilities and very rich data types and extensions. All of this makes PostgreSQL a very attractive solution in software systems.If you run a database, you want it to perform well and you want to be able to secure it. As the world’s most advanced open source database, PostgreSQL has unique built-in ways to achieve these goals. This book will show you a multitude of ways to enhance your database’s performance and give you insights into measuring and optimizing a PostgreSQL database to achieve better performance. This book is your one-stop guide to elevate your PostgreSQL knowledge to the next level. First, you’ll get familiarized with essential developer/administrator concepts such as load balancing, connection pooling, and distributing connections to multiple nodes. Next, you will explore memory optimization techniques before exploring the security controls offered by PostgreSQL. Then, you will move on to the essential database/server monitoring and replication strategies with PostgreSQL. Finally, you will learn about query processing algorithms.

Who is this book for?

If you are a developer or administrator with limited PostgreSQL knowledge and want to develop your skills with this great open source database, then this book is ideal for you. Learning how to enhance the database performance is always an exciting topic to everyone, and this book will show you enough ways to enhance the database performance.

What you will learn

  • * Build replication strategies for homogeneous and heterogeneous databases
  • * Test and build a powerful machine with multiple bench marking techniques
  • * Get to know a few SQL injection techniques
  • * Find out how to manage the replication using multiple tools
  • * Benchmark the database server using multiple strategies
  • * Work with the query processing algorithms and their internal behaviors
  • * Build a proper plan to upgrade or migrate to PostgreSQL from other databases
  • * See the essential database load balancing techniques and the various partitioning approaches PostgreSQL provides
  • * Learn memory optimization techniques and database server configurations

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date :Mar 29, 2017
Length:360 pages
Edition :1st
Language :English
ISBN-13 :9781785287244
Vendor :
PostgreSQL Global Development Group
Category :

What do you get with eBook?

Product feature iconInstant access to your Digital eBook purchase
Product feature icon Download this book inEPUB andPDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature iconDRM FREE - Read whenever, wherever and however you want

Contact Details

Modal Close icon
Payment Processing...
tickCompleted

Billing Address

Product Details

Publication date :Mar 29, 2017
Length:360 pages
Edition :1st
Language :English
ISBN-13 :9781785287244
Vendor :
PostgreSQL Global Development Group
Category :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99billed monthly
Feature tick iconUnlimited access to Packt's library of 7,000+ practical books and videos
Feature tick iconConstantly refreshed with 50+ new titles a month
Feature tick iconExclusive Early access to books as they're written
Feature tick iconSolve problems while you work with advanced search and reference features
Feature tick iconOffline reading on the mobile app
Feature tick iconSimple pricing, no contract
€189.99billed annually
Feature tick iconUnlimited access to Packt's library of 7,000+ practical books and videos
Feature tick iconConstantly refreshed with 50+ new titles a month
Feature tick iconExclusive Early access to books as they're written
Feature tick iconSolve problems while you work with advanced search and reference features
Feature tick iconOffline reading on the mobile app
Feature tick iconChoose a DRM-free eBook or Video every month to keep
Feature tick iconPLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick iconExclusive print discounts
€264.99billed in 18 months
Feature tick iconUnlimited access to Packt's library of 7,000+ practical books and videos
Feature tick iconConstantly refreshed with 50+ new titles a month
Feature tick iconExclusive Early access to books as they're written
Feature tick iconSolve problems while you work with advanced search and reference features
Feature tick iconOffline reading on the mobile app
Feature tick iconChoose a DRM-free eBook or Video every month to keep
Feature tick iconPLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick iconExclusive print discounts

Frequently bought together


PostgreSQL High Availability Cookbook
PostgreSQL High Availability Cookbook
Read more
Feb 2017536 pages
Full star icon5 (2)
eBook
eBook
€8.98€36.99
€45.99
PostgreSQL Administration Cookbook, 9.5/9.6 Edition
PostgreSQL Administration Cookbook, 9.5/9.6 Edition
Read more
Apr 2017556 pages
Full star icon3.3 (3)
eBook
eBook
€8.98€39.99
€49.99
PostgreSQL High Performance Cookbook
PostgreSQL High Performance Cookbook
Read more
Mar 2017360 pages
eBook
eBook
€8.98€36.99
€45.99
Stars icon
Total141.97
PostgreSQL High Availability Cookbook
€45.99
PostgreSQL Administration Cookbook, 9.5/9.6 Edition
€49.99
PostgreSQL High Performance Cookbook
€45.99
Total141.97Stars icon
Buy 2+ to unlock€6.99 prices - master what's next.
SHOP NOW

Table of Contents

12 Chapters
1. Database BenchmarkingChevron down iconChevron up icon
1. Database Benchmarking
Introduction
CPU benchmarking
Memory benchmarking
Disk benchmarking
Performing a seek rate test
Working with the fsync commit rate
Checking IOPS
Storage sizing
Discussing RAID levels
Configuring pgbench
Running read/write pgbench test cases
2. Server Configuration and ControlChevron down iconChevron up icon
2. Server Configuration and Control
Introduction
Starting the server manually
Stopping the server quickly
Stopping the server in an emergency
Reloading server configuration
Restarting the database server quickly
Tuning connection-related parameters
Tuning query-related parameters
Tuning logging-related parameters
3. Device OptimizationChevron down iconChevron up icon
3. Device Optimization
Introduction
Understanding memory units in PostgreSQL
Handling Linux/Unix memory parameters
CPU scheduling parameters
Disk tuning parameters
Identifying checkpoint overhead
Analyzing buffer cache contents
4. Monitoring Server PerformanceChevron down iconChevron up icon
4. Monitoring Server Performance
Introduction
Monitoring CPU usage
Monitoring paging and swapping
Tracking CPU consuming processes
Monitoring CPU load
Identifying CPU bottlenecks
Identifying disk I/O bottlenecks
Monitoring system load
Tracking historical CPU usage
Tracking historical memory usage
Monitoring disk space
Monitoring network status
5. Connection Pooling and Database PartitioningChevron down iconChevron up icon
5. Connection Pooling and Database Partitioning
Introduction
Installing pgpool-II
Configuring pgpool and testing the setup
Installing PgBouncer
Connection pooling using PgBouncer
Managing PgBouncer
Implementing partitioning
Managing partitions
Installing PL/Proxy
Partitioning with PL/Proxy
6. High Availability and ReplicationChevron down iconChevron up icon
6. High Availability and Replication
Introduction
Setting up hot streaming replication
Replication using Slony
Replication using Londiste
Replication using Bucardo
Replication using DRBD
Setting up a Postgres-XL cluster
7. Working with Third-Party Replication Management UtilitiesChevron down iconChevron up icon
7. Working with Third-Party Replication Management Utilities
Introduction
Setting up Barman
Backup and recovery using Barman
Setting up OmniPITR
WAL management with OmniPITR
Setting up repmgr
Using repmgr to create replica
Setting up walctl
Using walctl to create replica
8. Database Monitoring and PerformanceChevron down iconChevron up icon
8. Database Monitoring and Performance
Introduction
Checking active sessions
Finding out what the users are currently running
Finding blocked sessions
Dealing with deadlocks
Table access statistics
Logging slow statements
Determining disk usage
Preventing page corruption
Routine reindexing
Generating planner statistics
Tuning with background writer statistics
9. Vacuum InternalsChevron down iconChevron up icon
9. Vacuum Internals
Introduction
Dealing with bloating tables and indexes
Vacuum and autovacuum
Freezing and transaction ID wraparound
Monitoring vacuum progress
Control bloat using transaction age
10. Data Migration from Other Databases to PostgreSQL and Upgrading the PostgreSQL ClusterChevron down iconChevron up icon
10. Data Migration from Other Databases to PostgreSQL and Upgrading the PostgreSQL Cluster
Introduction
Using pg_dump to upgrade data
Using the pg_upgrade utility for version upgrade
Replicating data from other databases to PostgreSQL using Goldengate
11. Query OptimizationChevron down iconChevron up icon
11. Query Optimization
Introduction
Using sample data sets
Timing overhead
Studying hot and cold cache behavior
Clearing the cache
Query plan node structure
Generating an explain plan
Computing basic cost
Running sequential scans
Running bitmap heap and index scan
Aggregate and hash aggregate
Running CTE scan
Nesting loops
Working with hash and merge join
Grouping
Working with set operations
Working on semi and anti joins
12. Database IndexingChevron down iconChevron up icon
12. Database Indexing
Introduction
Measuring query and index block statistics
Index lookup
Comparing indexed scans and sequential scans
Clustering against an index
Concurrent indexes
Combined indexes
Partial indexes
Finding unused indexes
Forcing a query to use an index
Detecting a missing index

Recommendations for you

Left arrow icon
Debunking C++ Myths
Debunking C++ Myths
Read more
Dec 2024226 pages
Full star icon5 (1)
eBook
eBook
€8.98€23.99
€29.99
Go Recipes for Developers
Go Recipes for Developers
Read more
Dec 2024350 pages
eBook
eBook
€8.98€23.99
€29.99
50 Algorithms Every Programmer Should Know
50 Algorithms Every Programmer Should Know
Read more
Sep 2023538 pages
Full star icon4.5 (64)
eBook
eBook
€8.98€29.99
€37.99
€37.99
Asynchronous Programming with C++
Asynchronous Programming with C++
Read more
Nov 2024424 pages
Full star icon5 (1)
eBook
eBook
€8.98€25.99
€31.99
Modern CMake for C++
Modern CMake for C++
Read more
May 2024504 pages
Full star icon4.7 (13)
eBook
eBook
€8.98€29.99
€37.99
Learn Python Programming
Learn Python Programming
Read more
Nov 2024616 pages
Full star icon3.5 (2)
eBook
eBook
€8.98€23.99
€29.99
Learn to Code with Rust
Learn to Code with Rust
Read more
Sep 202557hrs 40mins
Full star icon5 (1)
Video
Video
€8.98€56.99
Modern Python Cookbook
Modern Python Cookbook
Read more
Jul 2024818 pages
Full star icon4.9 (17)
eBook
eBook
€8.98€32.99
€41.99
Right arrow icon

People who bought this also bought

Left arrow icon
50 Algorithms Every Programmer Should Know
50 Algorithms Every Programmer Should Know
Read more
Sep 2023538 pages
Full star icon4.5 (64)
eBook
eBook
€8.98€29.99
€37.99
€37.99
Event-Driven Architecture in Golang
Event-Driven Architecture in Golang
Read more
Nov 2022384 pages
Full star icon4.9 (10)
eBook
eBook
€8.98€29.99
€37.99
€33.99
The Python Workshop Second Edition
The Python Workshop Second Edition
Read more
Nov 2022600 pages
Full star icon4.6 (19)
eBook
eBook
€8.98€31.99
€38.99
Template Metaprogramming with C++
Template Metaprogramming with C++
Read more
Aug 2022480 pages
Full star icon4.6 (13)
eBook
eBook
€8.98€28.99
€35.99
Domain-Driven Design with Golang
Domain-Driven Design with Golang
Read more
Dec 2022204 pages
Full star icon4.4 (18)
eBook
eBook
€8.98€26.99
€33.99
Right arrow icon

About the authors

Left arrow icon
Profile icon Chauhan
Chauhan
Chitij Chauhan currently works as a senior database administrator at an IT-based MNC in Chandigarh. He has over 10 years of work experience in the field of database and system administration, with specialization in MySQL clustering, PostgreSQL, Greenplum, Informix DB2, SQL Server 2008, Sybase, and Oracle. He is a leading expert in the area of database security, with expertise in database security products such as IBM InfoSphere Guardium, Oracle Database Vault, and Imperva.
Read more
See other products by Chauhan
Profile icon Dinesh Kumar
Dinesh Kumar
Dinesh Kumar is an enthusiastic open source developer and has written several open source tools for PostgreSQL. He recently announced pgBucket, a brand new job scheduler for PostgreSQL. He is also a frequent blogger at manojadinesh.blogpsot.com, where he talks more about PostgreSQL. He is currently working as a senior database engineer in OpenSCG and building the PostgreSQL cloud operations. He has more than 6 years of experience as an Oracle and PostgreSQL database administrator and developer, and is currently focusing on PostgreSQL.
Read more
See other products by Dinesh Kumar
Right arrow icon
Getfree access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook?Chevron down iconChevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website?Chevron down iconChevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook?Chevron down iconChevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support?Chevron down iconChevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks?Chevron down iconChevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook?Chevron down iconChevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.

Create a Free Account To Continue Reading

Modal Close icon
OR
    First name is required.
    Last name is required.

The Password should contain at least :

  • 8 characters
  • 1 uppercase
  • 1 number
Notify me about special offers, personalized product recommendations, and learning tips By signing up for the free trial you will receive emails related to this service, you can unsubscribe at any time
By clicking ‘Create Account’, you are agreeing to ourPrivacy Policy andTerms & Conditions
Already have an account? SIGN IN

Sign in to activate your 7-day free access

Modal Close icon
OR
By redeeming the free trial you will receive emails related to this service, you can unsubscribe at any time.

[8]ページ先頭

©2009-2025 Movatter.jp