18.2. Creating a Database Cluster
Before you can do anything, you must initialize a database storage area on disk. We call this adatabase cluster. (TheSQL standard uses the term catalog cluster.) A database cluster is a collection of databases that is managed by a single instance of a running database server. After initialization, a database cluster will contain a database namedpostgres
, which is meant as a default database for use by utilities, users and third party applications. The database server itself does not require thepostgres
database to exist, but many external utility programs assume it exists. Another database created within each cluster during initialization is calledtemplate1
. As the name suggests, this will be used as a template for subsequently created databases; it should not be used for actual work. (SeeChapter 22 for information about creating new databases within a cluster.)
In file system terms, a database cluster is a single directory under which all data will be stored. We call this thedata directory ordata area. It is completely up to you where you choose to store your data. There is no default, although locations such as/usr/local/pgsql/data
or/var/lib/pgsql/data
are popular. To initialize a database cluster, use the commandinitdb, which is installed withPostgreSQL. The desired file system location of your database cluster is indicated by the-D
option, for example:
$
initdb -D /usr/local/pgsql/data
Note that you must execute this command while logged into thePostgreSQL user account, which is described in the previous section.
Alternatively, you can runinitdb
via thepg_ctl program like so:
$
pg_ctl -D /usr/local/pgsql/data initdb
This may be more intuitive if you are usingpg_ctl
for starting and stopping the server (seeSection 18.3), so thatpg_ctl
would be the sole command you use for managing the database server instance.
initdb
will attempt to create the directory you specify if it does not already exist. Of course, this will fail ifinitdb
does not have permissions to write in the parent directory. It's generally recommendable that thePostgreSQL user own not just the data directory but its parent directory as well, so that this should not be a problem. If the desired parent directory doesn't exist either, you will need to create it first, using root privileges if the grandparent directory isn't writable. So the process might look like this:
root#mkdir /usr/local/pgsql
root#chown postgres /usr/local/pgsql
root#su postgres
postgres$initdb -D /usr/local/pgsql/data
initdb
will refuse to run if the data directory exists and already contains files; this is to prevent accidentally overwriting an existing installation.
Because the data directory contains all the data stored in the database, it is essential that it be secured from unauthorized access.initdb
therefore revokes access permissions from everyone but thePostgreSQL user.
However, while the directory contents are secure, the default client authentication setup allows any local user to connect to the database and even become the database superuser. If you do not trust other local users, we recommend you use one ofinitdb
's-W
,--pwprompt
or--pwfile
options to assign a password to the database superuser. Also, specify-A md5
or-A password
so that the defaulttrust
authentication mode is not used; or modify the generatedpg_hba.conf
file after runninginitdb
, butbefore you start the server for the first time. (Other reasonable approaches include usingpeer
authentication or file system permissions to restrict connections. SeeChapter 20 for more information.)
initdb
also initializes the default locale for the database cluster. Normally, it will just take the locale settings in the environment and apply them to the initialized database. It is possible to specify a different locale for the database; more information about that can be found inSection 23.1. The default sort order used within the particular database cluster is set byinitdb
, and while you can create new databases using different sort order, the order used in the template databases that initdb creates cannot be changed without dropping and recreating them. There is also a performance impact for using locales other thanC
orPOSIX
. Therefore, it is important to make this choice correctly the first time.
initdb
also sets the default character set encoding for the database cluster. Normally this should be chosen to match the locale setting. For details seeSection 23.3.
Non-C
and non-POSIX
locales rely on the operating system's collation library for character set ordering. This controls the ordering of keys stored in indexes. For this reason, a cluster cannot switch to an incompatible collation library version, either through snapshot restore, binary streaming replication, a different operating system, or an operating system upgrade.
18.2.1. Use of Secondary File Systems
Many installations create their database clusters on file systems (volumes) other than the machine's“root” volume. If you choose to do this, it is not advisable to try to use the secondary volume's topmost directory (mount point) as the data directory. Best practice is to create a directory within the mount-point directory that is owned by thePostgreSQL user, and then create the data directory within that. This avoids permissions problems, particularly for operations such aspg_upgrade, and it also ensures clean failures if the secondary volume is taken offline.
18.2.2. Use of Network File Systems
Many installations create their database clusters on network file systems. Sometimes this is done viaNFS, or by using a Network Attached Storage (NAS) device that usesNFS internally.PostgreSQL does nothing special forNFS file systems, meaning it assumesNFS behaves exactly like locally-connected drives. If the client or serverNFS implementation does not provide standard file system semantics, this can cause reliability problems (seehttp://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). Specifically, delayed (asynchronous) writes to theNFS server can cause data corruption problems. If possible, mount theNFS file system synchronously (without caching) to avoid this hazard. Also, soft-mounting theNFS file system is not recommended. Storage Area Networks (SAN) typically use communication protocols other thanNFS, and may or may not be subject to hazards of this sort. It's advisable to consult the vendor's documentation concerning data consistency guarantees.PostgreSQL cannot be more reliable than the file system it's using.