среда, 19 августа 2015 г.

How to Backup and Restore PostgreSQL 9.3 Databases

In my previous blog post, I explained how to install PostgreSQL 9.3 on Oracle Linux. As a DBA, one of the first things we want to learn is how we can recover the database in case of a failure. So in this blog post, I’ll show how to backup and restore PostgreSQL databases in a server. We’ll backup PGDATA directory, so it’ll contain all databases in the instance (remember that PostgreSQL is a multitenant database).
PostgreSQL supports Write Ahead Log (WAL) mechanism like Oracle. So everything will be written to (redo)logs before they written into actual datafiles. So we will use a similar method to Oracle. We need to start “the backup mode”, copy the (data) files, and stop the backup mode, and add the archived logs to our backup. There are SQL commands for starting backup mode (pg_start_backup) and for stopping backup mode (pg_stop_backup), and we can copy the files using OS commands. Good thing is, since 9.1, PostgreSQL comes with a backup tool named “pg_basebackup”. It’ll do everything for us.
To be able to use pg_basebackup, we need to turn on archiving and also turn on wal_sender process. Archiving controlled by “archive_mode”, “archive_command” and “wal_level” parameters. An interesting thing about PostgreSQL is, you need to write OS commands to copy redologs (WAL) files to an archive location. While writing a copy command, you can use %p and %f variables. %p variable holds the full path of wal file, and %f holds only the file name. So something like “cp %p /archives/%f”, will copy the wal file to /archives directory.
WAL Sender process is used to feed logs to a replicate database, and pg_basebackup also uses it to receive archive logs. It’s controlled by max_wal_senders parameter. We’ll set it to 1, so our pg_basebackup will be able to connect PostgreSQL server to fetch the required WAL files (the ones created during backup period). Ok Let’s create directories for backups and archives (login as root):

Now I’ll set the required parameters for archiving and enabling WAL sender process. I switch to postgres user and modify postgresql.conf:

I hope you haven’t surprised by $PGDATA. The PGDATA variable should point to the data directory of PostgreSQL. In my previous post, I recommend to declare these variables in bash_profile script. Wal_level controls the logging level, it can be set to hot_standby or archive to enable log-archiving. I recommend you to set wal_level to hot_standby instead of “archive”. If you also set “hot_standby” to “on”, you can run the queries in recovery mode. I’ll explain it in point in time recovery. We also need to give permission to pg_basebackup to connect our server, so we issue the following line to pg_hba.conf, and then restart the server:

Now here’s the magical command to backup the PostgreSQL database:

It will create a directory such as 20141117 (depending the date it’s executed), compress all database files into a TAR file (–format=t) and put it into that folder. The name of backup file will be base.tar. This TAR file will contain the required WAL files (–xlog) to be able to recover the database. It even contains postgresql.conf and pg_hba.conf.

So we have a backup, how to recover it?

All we need is to extract tar file into the $PGDATA folder. Make sure that PostgreSQL services are down before you copy the files. If we want to restore the databases on a new server, we need to install PostgreSQL software first.

So we restored our database files, and have required wal (redolog) files to recover the database, but if we open the database now, we’ll lose the data since the backup is done. We may want to apply the logs which are created after the backup. To be able to do it, create a “recovery.conf” file which has the below line:
When we issue “pg_ctl start”, PostgreSQL will see the “recovery.conf” file and start recovering the database. When the recovery is done, the “recovery.conf” file will be renamed to “recovery.done”.

Point in-time Recovery

PostgreSQL supports point in time recovery (PIT). You need to include one of the following parameters into the recovery.conf file for PIT:
  • recovery_target_name (string): You specify a named restore point, created with pg_create_restore_point(). For example:
    Then you can use this name while recovering by adding the below line to your recovery.conf file:
  • recovery_target_time (timestamp): Instead of giving a name, (in most cases you won’t be prepared so well), you enter a timestamp.
  • recovery_target_xid (string): This parameter specifies the transaction ID up to which recovery will proceed.
You may remember that I said it’s better to set wal_level to hot_standby instead of archive. In point in-time recovery, you can make PostgreSQL pause the recovery on the recovery_target point, and wait for your approval to end the recovery process. So you can run queries on the database to check if it’s the correct time before you open the database for read write operations. To be able to do it, you also need to set hot_standby=on parameter. It sounds weird to enable hot_standby on a stand-alone server but this mode actually tells PostgreSQL to accept queries while recovery in process.

If the wal_level is hot_standby, and hot_standby is enabled on postgres.conf, you can write a recovery.conf file similar to below:

This recovery.conf file will cause to stop the recovery process before committing of transactions later than 2014-12-01 10:19:42. You will be able to connect the database, run (read-only) queries and after you verify that you reached the corrent point in-time, you run the following command to end the recovery and open the database:

It will resume (in fact ends) the recovery (no further logs will be applied). If you want to apply more logs, change recovery.conf file and set another recovery_target_time, and then restart the server.

What will happen to archived WAL files?

You may write a script to compress the archived log files and keep them with your base backups. After backing up, you better delete the old archived files. Here’s a sample script to backup and clean the archive logs older than 2 days, do not forget the change the folder name according to your real archive location:

I’ll explain how to take logical backups (dumps) in future blog posts.

Комментариев нет: