|
13 | 13 |
|
14 | 14 | <para>
|
15 | 15 | Databases are used to store larger number of text and duplicated information. This is why compression of most of databases
|
16 |
| - can be quite efficient and reduce used storage size 3..5 times.Postgres performs compression of TOAST data, but small |
| 16 | + can be quite efficient and reduce used storage size 3..5 times.PostgreSQL performs compression of TOAST data, but small |
17 | 17 | text fields which fits in the page are not compressed. Also not only heap pages can be compressed, indexes on text keys
|
18 | 18 | or indexes with larger number of duplicate values are also good candidates for compression.
|
19 | 19 | </para>
|
20 | 20 |
|
21 | 21 | <para>
|
22 |
| -Postgres is working with disk data through buffer pool which accumulates most frequently used buffers. |
| 22 | +PostgreSQL is working with disk data through buffer pool which accumulates most frequently used buffers. |
23 | 23 | Interface between buffer manager and file system is the most natural place for performing compression.
|
24 |
| - Buffers are stored on the disk in compressed for reducing disk usage and minimizing amount of data to be read. |
| 24 | + Buffers are stored on the disk in compressedformfor reducing disk usage and minimizing amount of data to be read. |
25 | 25 | And in-memory buffer pool contains uncompressed buffers, providing access to the records at the same speed as without
|
26 | 26 | compression. As far as modern server have large enough size of RAM, substantial part of the database can be cached in
|
27 | 27 | memory and accessed without any compression overhead penalty.
|
|
37 | 37 | <term>Reducing amount of disk IO</term>
|
38 | 38 | <listitem>
|
39 | 39 | <para>
|
40 |
| - Compressionhelp to reduce size of data which should be written to the disk or read from it. |
| 40 | + Compressionhelps to reduce size of data which should be written to the disk or read from it. |
41 | 41 | Compression ratio 3 actually means that you need to read 3 times less data or same number of records can be fetched
|
42 |
| - 3 times faster |
| 42 | + 3 times faster. |
43 | 43 | </para>
|
44 | 44 | </listitem>
|
45 | 45 | </varlistentry>
|
|
48 | 48 | <term>Improving locality</term>
|
49 | 49 | <listitem>
|
50 | 50 | <para>
|
51 |
| - When modified buffers are flushed from buffer pool to the disk,them are now written to the random locations |
52 |
| - on the disk.Postgres cache replacement algorithm makes a decision about throwing away buffer from the pool |
| 51 | + When modified buffers are flushed from buffer pool to the disk,they are written to the random locations |
| 52 | + on the disk.PostgreSQL cache replacement algorithm makes a decision about throwing away buffer from the pool |
53 | 53 | based on its access frequency and ignoring its location on the disk. So two subsequently written buffers can be
|
54 | 54 | located in completely different parts of the disk. For HDD seek time is quite large - about 10msec, which corresponds
|
55 | 55 | to 100 random writes per second. And speed of sequential write can be about 100Mb/sec, which corresponds to
|
|
58 | 58 | Size of buffer in PostgreSQL is fixed (8kb by default). Size of compressed buffer depends on the content of the buffer.
|
59 | 59 | So updated buffer can not always fit in its old location on the disk. This is why we can not access pages directly
|
60 | 60 | by its address. Instead of it we have to use map which translates logical address of the page to its physical location
|
61 |
| - on the disk. Definitely this extra level of indirection adds overhead.For in most cases this map canfir in memory, |
| 61 | + on the disk. Definitely this extra level of indirection adds overhead.But in most cases this map canfit in memory, |
62 | 62 | so page lookup is nothing more than just accessing array element. But presence of this map also have positive effect:
|
63 | 63 | we can now write updated pages sequentially, just updating their map entries.
|
64 |
| -Postgres is doing much to avoid "write storm" intensive flushing of data to the disk when bufferpoll space is |
| 64 | +PostgreSQL is doing much to avoid "write storm" intensive flushing of data to the disk when bufferpool space is |
65 | 65 | exhausted. Compression allows to significantly reduce disk load.
|
66 | 66 | </para>
|
67 | 67 | </listitem>
|
|
72 | 72 | Another useful feature which can be combined with compression is database encryption.
|
73 | 73 | Encryption allows to protected you database from unintended access (if somebody stole your notebook, hard drive or make
|
74 | 74 | copy from it, thief will not be able to extract information from your database if it is encrypted).
|
75 |
| - Postgres provide contrib module pgcrypto, allowing you to encrypt some particular types/columns. |
| 75 | + PostgreSQL provide contrib module pgcrypto, allowing you to encrypt some particular types/columns. |
| 76 | + |
76 | 77 | But safer and convenient way is to encrypt all data in the database. Encryption can be combined with compression.
|
77 | 78 | Data should be stored at disk in encrypted form and decrypted when page is loaded in buffer pool.
|
78 | 79 | It is essential that compression should be performed before encryption, otherwise encryption eliminates regularities in
|
79 | 80 | data and compression rate will be close to 1.
|
80 | 81 | </para>
|
81 | 82 |
|
82 | 83 | <para>
|
83 |
| - Why do we need to perform compression/encryption inPostgres and do not use correspondent features of underlying file |
| 84 | + Why do we need to perform compression/encryption inPostgreSQL and do not use correspondent features of underlying file |
84 | 85 | systems? First answer is that there are not so much file system supporting compression and encryption for all OSes.
|
85 | 86 | And even if such file systems are available, it is not always possible/convenient to install such file system just
|
86 | 87 | to compress/protect your database. Second question is that performing compression at database level can be more efficient,
|
87 |
| - because here we canhereuse knowledge about size of database page andperforms compression more efficiently. |
| 88 | + because here we can use knowledge about size of database page andcan perform compression more efficiently. |
88 | 89 | </para>
|
89 | 90 |
|
90 | 91 | </sect1>
|
91 | 92 |
|
92 | 93 | <sect1 id="cfs-implementation">
|
93 |
| - <title>How compression/encryption are integrated inPostgres</title> |
| 94 | + <title>How compression/encryption are integrated inPostgreSQL</title> |
94 | 95 |
|
95 | 96 | <para>
|
96 |
| - To improve efficiency of disk IO,Postgres is working with files through buffer manager, which pins in memory |
| 97 | + To improve efficiency of disk IO,PostgreSQL is working with files through buffer manager, which pins in memory |
97 | 98 | most frequently used pages. Each page is fixed size (8kb by default). But if we compress page, then
|
98 | 99 | its size will depend on its content. So updated page can require more (or less) space than original page.
|
99 | 100 | So we may not always perform in-place update of the page. Instead of it we have to locate new space for the page and somehow release
|
|
136 | 137 | </para>
|
137 | 138 |
|
138 | 139 | <para>
|
139 |
| -Postgres is storingrelation in set of files, size of each file is not exceeding 2Gb. Separate page map is constructed for each file. |
| 140 | +PostgreSQL storesrelation in a set of files, size of each file is not exceeding 2Gb. Separate page map is constructed for each file. |
140 | 141 | Garbage collection in CFS is done by several background workers. Number of this workers and pauses in their work can be
|
141 | 142 | configured by database administrator. This workers are splitting work based on inode hash, so them do not conflict with each other.
|
142 | 143 | Each file is proceeded separately. The files is blocked for access at the time of garbage collection but complete relation is not
|
|
149 | 150 | </para>
|
150 | 151 |
|
151 | 152 | <para>
|
152 |
| - CFS can be build with several compression libraries:Postgres lz, zlib, lz4, snappy, lzfse... |
| 153 | + CFS can be build with several compression libraries:PostgreSQL lz, zlib, lz4, snappy, lzfse... |
153 | 154 | But this is build time choice: it is not possible now to dynamically choose compression algorithm.
|
154 |
| - CFS stores in tablespace information about used compression algorithm and produce error ifPostgres is build with different |
| 155 | + CFS stores in tablespace information about used compression algorithm and produce error ifPostgreSQL is build with different |
155 | 156 | library.
|
156 | 157 | </para>
|
157 | 158 |
|
|