Movatterモバイル変換

17.2. Creating a Database Cluster
Prev	Up	Chapter 17. Server Setup and Operation	Home	Next

17.2. Creating a Database Cluster#

17.2.1. Use of Secondary File Systems
17.2.2. File Systems

Before you can do anything, you must initialize a database storage area on disk. We call this adatabase cluster. (TheSQL standard uses the term catalog cluster.) A database cluster is a collection of databases that is managed by a single instance of a running database server. After initialization, a database cluster will contain a database namedpostgres, which is meant as a default database for use by utilities, users and third party applications. The database server itself does not require thepostgres database to exist, but many external utility programs assume it exists. There are two more databases created within each cluster during initialization, namedtemplate1 andtemplate0. As the names suggest, these will be used as templates for subsequently-created databases; they should not be used for actual work. (See Chapter 21 for information about creating new databases within a cluster.)

In file system terms, a database cluster is a single directory under which all data will be stored. We call this thedata directory ordata area. It is completely up to you where you choose to store your data. There is no default, although locations such as/usr/local/pgsql/data or/var/lib/pgsql/data are popular. The data directory must be initialized before being used, using the programinitdb which is installed withPostgres Pro.

If you are using a pre-packaged version ofPostgreSQL, it may well have a specific convention for where to place the data directory, and it may also provide a script for creating the data directory. In that case you should use that script in preference to runninginitdb directly. Consult the package-level documentation for details.

To initialize a database cluster manually, runinitdb and specify the desired file system location of the database cluster with the-D option, for example:

$initdb -D /usr/local/pgsql/data

Note that you must execute this command while logged into thePostgres Pro user account, which is described in the previous section.

Tip

As an alternative to the-D option, you can set the environment variablePGDATA.

Alternatively, you can runinitdb via thepg_ctl program like so:

$pg_ctl -D /usr/local/pgsql/data initdb

This may be more intuitive if you are usingpg_ctl for starting and stopping the server (seeSection 17.3), so thatpg_ctl would be the sole command you use for managing the database server instance.

initdb will attempt to create the directory you specify if it does not already exist. Of course, this will fail ifinitdb does not have permissions to write in the parent directory. It's generally recommendable that thePostgres Pro user own not just the data directory but its parent directory as well, so that this should not be a problem. If the desired parent directory doesn't exist either, you will need to create it first, using root privileges if the grandparent directory isn't writable. So the process might look like this:

root#mkdir /usr/local/pgsqlroot#chown postgres /usr/local/pgsqlroot#su postgrespostgres$initdb -D /usr/local/pgsql/data

initdb will refuse to run if the data directory exists and already contains files; this is to prevent accidentally overwriting an existing installation.

Because the data directory contains all the data stored in the database, it is essential that it be secured from unauthorized access.initdb therefore revokes access permissions from everyone but thePostgres Pro user, and optionally, group. Group access, when enabled, is read-only. This allows an unprivileged user in the same group as the cluster owner to take a backup of the cluster data or perform other operations that only require read access.

Note that enabling or disabling group access on an existing cluster requires the cluster to be shut down and the appropriate mode to be set on all directories and files before restartingPostgres Pro. Otherwise, a mix of modes might exist in the data directory. For clusters that allow access only by the owner, the appropriate modes are0700 for directories and0600 for files. For clusters that also allow reads by the group, the appropriate modes are0750 for directories and0640 for files.

However, while the directory contents are secure, the default client authentication setup allows any local user to connect to the database and even become the database superuser. If you do not trust other local users, we recommend you use one ofinitdb's-W,--pwprompt or--pwfile options to assign a password to the database superuser. Also, specify-A scram-sha-256 so that the defaulttrust authentication mode is not used; or modify the generatedpg_hba.conf file after runninginitdb, butbefore you start the server for the first time. (Other reasonable approaches include usingpeer authentication or file system permissions to restrict connections. SeeChapter 19 for more information.)

initdb also initializes the default locale for the database cluster. Normally, it will just take the locale settings in the environment and apply them to the initialized database. It is possible to specify a different locale for the database; more information about that can be found inSection 22.1. The default sort order used within the particular database cluster is set byinitdb, and while you can create new databases using different sort order, the order used in the template databases that initdb creates cannot be changed without dropping and recreating them. There is also a performance impact for using locales other thanC orPOSIX. Therefore, it is important to make this choice correctly the first time.

initdb also sets the default character set encoding for the database cluster. Normally this should be chosen to match the locale setting. For details seeSection 22.3.

Non-C and non-POSIX locales rely on the operating system's collation library for character set ordering. This controls the ordering of keys stored in indexes. For this reason, a cluster cannot switch to an incompatible collation library version, either through snapshot restore, binary streaming replication, a different operating system, or an operating system upgrade.

17.2.1. Use of Secondary File Systems#

Many installations create their database clusters on file systems (volumes) other than the machine's“root” volume. If you choose to do this, it is not advisable to try to use the secondary volume's topmost directory (mount point) as the data directory. Best practice is to create a directory within the mount-point directory that is owned by thePostgres Pro user, and then create the data directory within that. This avoids permissions problems, particularly for operations such aspg_upgrade, and it also ensures clean failures if the secondary volume is taken offline.

17.2.2. File Systems#

Generally, any file system with POSIX semantics can be used for Postgres Pro. Users prefer different file systems for a variety of reasons, including vendor support, performance, and familiarity. Experience suggests that, all other things being equal, one should not expect major performance or behavior changes merely from switching file systems or making minor file system configuration changes.

17.2.2.1. NFS#

It is possible to use anNFS file system for storing thePostgres Pro data directory.Postgres Pro does nothing special for NFS file systems, meaning it assumes NFS behaves exactly like locally-connected drives.Postgres Pro does not use any functionality that is known to have nonstandard behavior on NFS, such as file locking.

The only firm requirement for using NFS withPostgres Pro is that the file system is mounted using thehard option. With thehard option, processes can“hang” indefinitely if there are network problems, so this configuration will require a careful monitoring setup. Thesoft option will interrupt system calls in case of network problems, butPostgres Pro will not repeat system calls interrupted in this way, so any such interruption will result in an I/O error being reported.

It is not necessary to use thesync mount option. The behavior of theasync option is sufficient, sincePostgres Pro issuesfsync calls at appropriate times to flush the write caches. (This is analogous to how it works on a local file system.) However, it is strongly recommended to use thesync export option on the NFSserver on systems where it exists (mainly Linux). Otherwise, anfsync or equivalent on the NFS client is not actually guaranteed to reach permanent storage on the server, which could cause corruption similar to running with the parameter fsync off. The defaults of these mount and export options differ between vendors and versions, so it is recommended to check and perhaps specify them explicitly in any case to avoid any ambiguity.

In some cases, an external storage product can be accessed either via NFS or a lower-level protocol such as iSCSI. In the latter case, the storage appears as a block device and any available file system can be created on it. That approach might relieve the DBA from having to deal with some of the idiosyncrasies of NFS, but of course the complexity of managing remote storage then happens at other levels.

Prev	Up	Next
17.1. ThePostgres Pro User Account	Home	17.3. Starting the Database Server

epub pdf