About Me

My photo
I am currently working as a DB Consultant with Ashnik. We are into Open Source and Cloud based enterprise class solutions. Prior to this I was working as a DBA @ Misys India (Banking Services). Working on IBM DB2, Oracle Microsoft SQL Server. Selected as IBM Champion for Data Management 2011. I like bit of photography, cycling, poetry and I just love gadgets!!
Showing posts with label EnterpriseDB Recovery. Show all posts
Showing posts with label EnterpriseDB Recovery. Show all posts

Thursday, July 11, 2013

Recoverability of PostgreSQL-1: Recovery Concepts

Recently one of my colleagues told me that, it’s very difficult (he actually said "impossible") to do fail back and switchback in PostgreSQL. Well, I thought of giving it a try. In my upcoming blog, I will be discussing the process and details of my experiment. But before that, it’s very important to understand certain concepts that people get generally confused with, thus ending up with a messed up recovery strategy. Let me discuss a few terms which are very important for designing your backup-recovery and disaster-recovery processes:

1) Archiving: Means archiving your transaction/xlog/wal segments/WAL. This is not a substitute for creation of DR site. It is required for taking a hot backup/online backup. And so is the case with at least MS SQL Server and DB2.

2) Backup: A process in which you take a backup of your database. It can be logical where you take a backup of tables and rows or it can be physical where you backup the datafiles and other configuration files needed by your database/instance/cluster. Physical backups can be cold/offline or can be online/hot.

3) Crash Recovery: You had a crash of your database service/server and your database is recovering from the same. Generally if your disk is intact this can be achieved with your WAL segments/WAL itself. Whereas, if your disk has crashed and you are restoring your database from a backup you will need to archive files and latest WAL files for applying the changes since the backup was taken.

4) Disaster Recovery: You had a crash after which your database server is not available and you have to either rebuild the whole system from scratch using a backup or have to start using a DR site you had built for just this scenario.

5) Fail over: A term which refers to using your DR site in case there is a major failure at your primary site.

6) Switchover: A term which refers to switching the roles of your Primary and DR sites either because Primary site has come up again after a recent major failure or due to some maintenance work for which you have to switch the roles.

7) Switchback: A term which means switching back/restoring back the roles of "new primary" and "new DR" sites. i.e. the original Primary site regains control. This is generally a planned activity and everyone desires to have minimal downtime for this.

8) Fail back: A term which is generally used interchangeably for switchback, but I would think it is more of those scenarios when your "new primary" or "switched primary" server goes down for some reason and you have to now promote the "new DR" (which originally was ’primary’ site in start of time) as Primary

9) Warm standby: A standby server which is an exactly (almost real time) replica of your primary server and is always available in recovery mode and cannot be accessed until a recovery is triggered

10) Hot Standby: A standby server which is an exactly (almost real time) replica of your primary server and is always available in recovery mode and cannot be accessed for write commands until a recovery is triggered. But this server can be utilized for read queries

11) WAL Archive Replication: A replication process where the WAL segments which have been archived are copied and replayed on the standby server

12) Streaming Replication: A replication process where the WAL segments are copied directly to the standby servers (standby server need not wait for the WAL files to be archived)

So folks, these are some the terms that would help give you a better insight into understanding the concepts. And if you already have a recovery strategy, then hopefully these terms may have helped you to understand the process in a better way.
I shall be back with a test which does fail-over, switchover (after a fail-over) and then switchback.

PS: Any comments or corrections about the terms and their definition/explanation used here, are more than welcome.