About Me

My photo
I am currently working as a DB Consultant with Ashnik. We are into Open Source and Cloud based enterprise class solutions. Prior to this I was working as a DBA @ Misys India (Banking Services). Working on IBM DB2, Oracle Microsoft SQL Server. Selected as IBM Champion for Data Management 2011. I like bit of photography, cycling, poetry and I just love gadgets!!

Tuesday, November 12, 2013

Secure your DB the hollywood style



Today I was discussing with a colleague who found it strange that some people use some random names for their database or their database directories.
After pondering over it for a while, I thought its a good thing.

Now if you were a hacker, and had somehow got access to my DB server, you will still have to guess the name of my database. So you will do a  " ps -ef | grep -i postgres "

Or if you are lucky you could just go to the default location or use $PGDATA variable.

Well, these make life of a DBA easy but they make life of a hacker equally easy.

Few elements of security which I have learnt from hollywood flicks are:
  1. Create confusion with duplicates: Remember "Italian Job"
  2. Use unusual names: In war movies do you hear "Tangos" talking to "Charlies" 
  3. Avoid/hide using usual (default) identity: That is why they posed as Movie Crew in "Argo"
  4. Hide/lock the identity/password of superuser and pose a pseudo superuser for everyone to use and interact with: Didn't Jim die before he could control everything and hide his identity in Mission Impossible

Suppose, If I have 3 instances running:

pg_ctl -D /opt/Bali/db
pg_ctl -D /opt/KulaLumpur/db
pg_ctl -D /opt/Bangkok/db

Now you will have to log-in into at least 3 instances, where again I can have multiple databases by name
Bali/db: TamanAyana
Bali/db: Tan
Bali/db: Ulluwatu

KulaLumpur/db: TwinTower
KulaLumpur/db:  CyberJaya


Bangkok/db: WatArun
Bangkok/db: RoyalPalace


This means you have to do more work to get into my system, which means I have more time to stop the intrusion.


Of course if you were smart, you can check memory utilization and number of connections etc. But what if I have set some rules for process priority, I am using same amount of memory for each instance (and smartly allowing the pages of instances not being used to be paged-out). I would be successful to keep you confused for a little longer before you know which database/instance is your true target. Well anyways, Italian Job too was made by figuring out load etc and hence isolating the identity of true target.


Same ways, if possible run your PostgreSQL database on non-default port. Why? Suppose if I have not been able to crack into your DB Server but I managed to figure out it's IP, my first attempt at connecting to PostgreSQL would be aimed at 5432. A determined hacker will eventually be able to hit in the correct spot/port. Sooner or later, your true identity will be out but that will take time and that means you have more time to board that Swiss Airlines flight and safeguard yourself.

Now, some really great feature that PostgreSQL has is hba authentication. It is always good to add a line as below at the end in your hba conf:


host all all 0.0.0.0/0 reject


It will make sure that any connection from unauthorized IP addresses are rejected.

Additionally, the first attempt someone will make at connecting to you DB Server, would be using standard users e.g. enterprisedb, postgres etc. Make sure you set password of these users to something that is completely random and you don't need to remember it. Just create another superuser and always use that user for connecting and doing maintenance work. Make sure that the username is not your first name. I would prefer "wolverine".


Some installers of PostgreSQL might create a .pgpass in your OS user's home directory. It's better to remove them unless you think there is a use-case for it in your environment.


Grant your users only what they need. I will follow this to death. There is no application written which needs DBA, superuser or any similar privilege on your database or instance. Never ever grant these (of course unless your job is at stake) to any user. Grant them only few database privileges which "you think" they are going to need. If they hit an issue or database tells them that they can not do certain operation, grant permission for that operation. If you want a really secure database do this as many times as your users hit a problem.

Never take that shortcut "alter user my_app_user with superuser"


It's good to segregate roles for your users. There should atleast 3 roles, superuser (who is db superuser), application superuser (one who owns and controls application schema changes), application user (one who uses the application). Application reader/auditor etc could be some other roles depending on your environment.

It is a good idea to set a limit on number of connections which can established by these user and restrict their work_mem and few other parameters which you can control at user level. 

Friday, October 25, 2013

Installation steps for pgBouncer on PowerLinux



Our team has been excited ever since IBM and Enterprise DB announced availability of Postgres Plus Advanced Server on PowerLinux.
Our team recently started engaging with IBM to evaluate the scalability of PPAS on PowerLinux and to determine how to fine-tune such setups.
(It's a work-in-progress and I am not sure how much I can share here. But if you are thinking of migrating your IT infrastructure to a much more cost effective stack of Power PC running RHEL and using EnterpriseDB PPAS/PostgreSQL as database please contact me or write to success@ashnik.com).
Though I really welcome this new announcement from EnterpriseDB, one thing which disappointed me was non-availability of pgBouncer in the rpm installers provided by EnterpriseDB for PPAS on PowerLinux.

I have always preferred pgBouncer as a connection pooling tool over pgpool. It's light weight and quite easy to configure. In the past I had done some benchmarking study to see how PPAS scales with and without connection pooling. I had seen some significant gain and hence decided to install pgBouncer before I could to proceed with the performance measurement of PPAS on PowerLinux.
I faced a few issues but could finally install it and get it running. So I decided to share my experience of the same (which can hopefully help someone, as more and more people try to move away from costly hardware and proprietary database to PPAS on PowerLinux).
First thing that one should do is, make sure you have IBM Advanced ToolChain installed on your PowerLinux box.
As per my understanding, the compiler which is part IBM Advanced ToolChain is supposed to be optimized for Power PC. So it's preferred for code compilation instead of using gcc compiler shipped with Red Hat.
Generally it will be installed in /opt/atXX
So I set my path to include Advanced Toolchain:
[root@pflex75 pgbouncer]# export PATH=/opt/at7.0/bin:$PATH

I just wanted to be sure about my version of gcc compiler
[root@pflex75 pgbouncer]# which gcc
                /opt/at7.0/bin/gcc

[root@pflex75 pgbouncer]# gcc --version
gcc (GCC) 4.8.2 20130819 (Advance-Toolchain-at7.0) [ibm/gcc-4_8-branch, revision: 201854 merged from gcc-4_8-branch, revision 200989]
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


I downloaded the latest version of pgbouncer (for me its 1.5.4) from pgfoundries site and extracted the content of the tar file
[root@pflex75 pgbouncer]# tar -xzvf pgbouncer-1.5.4.tar.gz

Now was the time to configure the binaries and install them as per the instruction in README file which is extracted from above tar. So I fired below commands:

[root@pflex75 pgbouncer]# cd pgbouncer-1.5.4
[root@pflex75 pgbouncer-1.5.4]# ./configure --prefix=/usr/local --with-libevent=libevent-prefix
[root@pflex75 pgbouncer-1.5.4]# make
[root@pflex75 pgbouncer-1.5.4]# make install
[root@pflex75 pgbouncer-1.5.4]# cd ..

Now I had to setup the pgBouncer and start it.
So I coped the pgbouncer.ini file to a location of my choice (/etc/pgbouncer)
[root@pflex75 pgbouncer]# mkdir -p /etc/pgbouncer
[root@pflex75 pgbouncer]# chown enterprisedb:enterprisedb -Rf /etc/pgbouncer
[root@pflex75 pgbouncer]#  cp /usr/local/share/doc/pgbouncer/pgbouncer.ini /etc/pgbouncer/pgbouncer.ini

Then I created a userlist.txt in a location of my choice (once again /etc/pgbouncer):
[root@pflex75 pgbouncer]#  cat /etc/pgbouncer/userlist.txt
                                "enterprisedb" "password"

Now I followed all the standard steps to edit the pgbouncer to add all the address and databases. Then I went ahead to set pool size etc and then set path for userlist file.

In pgbouncer.ini file, I checked the value for logfile and pidfile. I had to ensure that these locations are present and owned by proper user (the user which I will be using for running pgbouncer must have permissions to run them).

                                logfile = /var/log/pgbouncer/pgbouncer.log
                                pidfile = /var/run/pgbouncer/pgbouncer.pid

[root@pflex75 pgbouncer]# mkdir -p /var/log/pgbouncer;
[root@pflex75 pgbouncer]# touch /var/log/pgbouncer/pgbouncer.log;
[root@pflex75 pgbouncer]# chown enterprisedb:enterprisedb -Rf /var/log/pgbouncer
[root@pflex75 pgbouncer]# mkdir -p /var/run/pgbouncer/
[root@pflex75 pgbouncer]# chown enterprisedb:enterprisedb -Rf /var/run/pgbouncer

But once I tried to run pgbouncer, I got below issue
[root@pflex75 pgbouncer]# pgbouncer  -d  /etc/pgbouncer/pgbouncer.ini
pgbouncer: error while loading shared libraries: libevent-2.0.so.5: cannot open shared object file: No such file or directory

To resolve (I followed below steps, but may be one can just resolve it by setting LD_LIBRARY):

First thing I did was removing the existing installation:
[root@pflex75 pgbouncer]# rm  /usr/local/bin/pgbouncer
[root@pflex75 pgbouncer]# rm  /usr/local/share/man/man5/pgbouncer.5
[root@pflex75 pgbouncer]# rm  /usr/local/share/man/man1/pgbouncer.1
[root@pflex75 pgbouncer]# rm -rf /usr/local/share/doc/pgbouncer

Then I removed the installer I had extracted and then extracted it afresh:
[root@pflex75 pgbouncer-1.5.4]# cd .. ; rm -rf pgbouncer-1.5.4
[root@pflex75 pgbouncer]# tar -xzvf pgbouncer-1.5.4.tar.gz



I made sure you have latest libevent was installed:
[root@pflex75 pgbouncer]# yum info libevent
                                Loaded plugins: product-id, refresh-packagekit, security, subscription-manager
                                This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
                                Installed Packages
                                Name        : libevent
                                Arch        : ppc64
                                Version     : 1.4.13
                                Release     : 4.el6
                                Size        : 335 k
                                Repo        : installed
                                From repo   : anaconda-RedHatEnterpriseLinux-201301301505.ppc64
                                Summary     : Abstract asynchronous event notification library
                                URL         : http://monkey.org/~provos/libevent/
                                License     : BSD
                                Description : The libevent API provides a mechanism to execute a callback function
                                                    : when a specific event occurs on a file descriptor or after a timeout
                                                    : has been reached. libevent is meant to replace the asynchronous event
                                                    : loop found in event driven network servers. An application just needs
                                                    : to call event_dispatch() and can then add or remove events dynamically
                                                    : without having to change the event loop.

                                Available Packages
                                Name        : libevent
                                Arch        : ppc
                                Version     : 1.4.13
                                Release     : 4.el6
                                Size        : 66 k
                                Repo        : local
                                Summary     : Abstract asynchronous event notification library
                                URL         : http://monkey.org/~provos/libevent/
                                License     : BSD
                                Description : The libevent API provides a mechanism to execute a callback function
                                                    : when a specific event occurs on a file descriptor or after a timeout
                                                    : has been reached. libevent is meant to replace the asynchronous event
                                                    : loop found in event driven network servers. An application just needs
                                                    : to call event_dispatch() and can then add or remove events dynamically
                                                    : without having to change the event loop.


Then I set the value for LD_LIBRARY_PATH parameter to look at /usr/lib64 (for power64 system it will have the libevent)
[root@pflex75 pgbouncer-1.5.4]# export LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH

I created an appropriate Softlink for Libevent (I believe this step was not needed)
[root@pflex75 pgbouncer-1.5.4]# ln ln -s  /usr/lib64/libevent-1.4.so.2 /usr/local/lib
[root@pflex75 pgbouncer-1.5.4]# ./configure --prefix=/usr/local --with-libevent=/usr/lib64
[root@pflex75 pgbouncer-1.5.4]# make
[root@pflex75 pgbouncer-1.5.4]# make install
[root@pflex75 ~]# pgbouncer --help
Usage: pgbouncer [OPTION]... config.ini
  -d, --daemon           Run in background (as a daemon)
  -R, --restart          Do a online restart
  -q, --quiet            Run quietly
  -v, --verbose          Increase verbosity
  -u, --user=<username>  Assume identity of <username>
  -V, --version          Show version
  -h, --help             Show this help screen and exit

Now switch user to enterprisedb (or any user which you wish to use for running pgbouncer)
[root@pflex75 ~]# su - enterprisedb

-bash-4.1$  pgbouncer  -d  /etc/pgbouncer/pgbouncer.ini
pgbouncer: error while loading shared libraries: libevent-2.0.so.5: cannot open shared object file: No such file or directory


Ahhhha!!! Isn’t that the same error I just resolved? So I added below line in .bash_profile:

                export LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH


Now apply the same for current shell environment
-bash-4.1$ export LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH

Try again:
-bash-4.1$  pgbouncer  -d  /etc/pgbouncer/pgbouncer.ini
2013-10-25 19:31:40.325 11412 LOG File descriptor limit: 1024 (H:4096), max_clie  nt_conn: 200, max fds possible: 250

-bash-4.1$ ps -ef | grep -i pgbouncer
495      14384     1  0 19:41 ?        00:00:00 pgbouncer -d /etc/pgbouncer/pgbouncer.ini
Wow!!! Issue resolved!!! :-)

A few more steps which won't harm (just to ensure permissions)
-bash-4.1$ chown enterprisedb:enterprisedb -Rf /usr/local/share/man/man1/pgbouncer.1
-bash-4.1$ chown enterprisedb:enterprisedb -Rf /usr/local/share/man/man5/pgbouncer.5

-bash-4.1$ chown enterprisedb:enterprisedb -Rf /usr/local/share/doc/pgbouncer

Wednesday, August 7, 2013

Things you didn't know about UPDATE statement

I am taking a deviation from the series of posts I am doing for Recovery Concepts and Implementation.
I have already posted two in that series. This one has to come, cause I was recently discussing with some developers and they were having tough time dealing with updates to data based on/using data from some other source table.
I  realized, people generally do not know all the capabilities of an Update statement. The form they know is:

UPDATE TABLE_NAME
set COLUMN_NAME= << expression >> [, COLUMN_NAME= << expression >> ]
where << boolean expression >>
Because they are not aware of the much more flexible syntax they end up exercising all the flexibility they get with expression and boolean expression. We will see an example where a complicated expression is used to achieve something which is fairly simple.

Before I rush you into any technical details, let me explain the table structures to be used in this example.
Table "enterprisedb.update_target"
      Column       |          Type          | Modifiers
-------------------+------------------------+-----------
 column_id         | integer                | not null
 columne_to_update | character varying(100) |
Indexes:
    "update_target_pkey" PRIMARY KEY, btree (column_id)

Table "enterprisedb.update_source"
      Column       |          Type          | Modifiers
-------------------+------------------------+-----------
 column_id         | integer                | not null
 source_for_update | character varying(100) |
Indexes:
    "update_source_pkey" PRIMARY KEY, btree (column_id)

count_update_target | count_update_source
---------------------+---------------------
             1310720 |               10240
(1 row)

edb=# select * from update_source t1 where not exists (select 1 from update_target t2 where t2.column_id=t1.column_id);
 column_id |        source_for_update
-----------+----------------------------------
   2621436 | this is an new value for 2621437
   2621438 | this is an new value for 2621439
   2621440 | this is an new value for 2621441
   

edb=# select count(*) from update_target where columne_to_update is null;
 count
-------
     0
(1 row)
Now suppose I want to update all those rows, in update_target which has a matching ID (column_id) in update_source. For this simple table (where I have to update only one column) the statement may be as simple as below. 

update update_target t1 set columne_to_update =(select source_for_update from update_source t2 where t1.column_id=t2.column_id);

This could be a bit more complex, with multiple select statements (one for each column to be updated). 
Now let's see what the plan looks like for this simple case:

edb=# explain analyze update update_target t1 set columne_to_update =(select source_for_update from update_source t2 where t1.column_id=t2.column_id);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on update_target t1  (cost=0.00..4392654.13 rows=1341439 width=14) (actual time=18187.897..18187.897 rows=0 loops=1)
   ->  Seq Scan on update_target t1  (cost=0.00..4392654.13 rows=1341439 width=14) (actual time=0.058..5259.089 rows=1310720 loops=1)
         SubPlan 1
           ->  Index Scan using update_source_pkey on update_source t2  (cost=0.00..3.27 rows=1 width=30) (actual time=0.002..0.002 rows=0 loops=1310720)
                 Index Cond: (t1.column_id = column_id)
 Total runtime: 18188.005 ms

Keep in mind that the complexity of the plan would had been double if there were two columns to be updated. 

So let me add an additional column here:
edb=# alter table update_target add column column_2_for_update integer;
ALTER TABLE
edb=# alter table update_source add column column_2_source integer;
ALTER TABLE
edb=# update update_source set column_2_source=column_id;
UPDATE 10240

Do you notice an additional Subplan in the plan below?

edb=# explain analyze update update_target t1
	set columne_to_update =(select source_for_update from update_source t2
			where t1.column_id=t2.column_id),
	column_2_for_update=(select column_2_source from update_source t2
			  where t1.column_id=t2.column_id);

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on update_target t1  (cost=0.00..8573070.33 rows=1310720 width=10) (actual time=22256.890..22256.890 rows=0 loops=1)
   ->  Seq Scan on update_target t1  (cost=0.00..8573070.33 rows=1310720 width=10) (actual time=0.096..9104.759 rows=1310720 loops=1)
         SubPlan 1
           ->  Index Scan using update_source_pkey on update_source t2  (cost=0.00..3.27 rows=1 width=30) (actual time=0.002..0.002 rows=0 loops=1310720)
                 Index Cond: (t1.column_id = column_id)
         SubPlan 2
           ->  Index Scan using update_source_pkey on update_source t2  (cost=0.00..3.27 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1310720)
                 Index Cond: (t1.column_id = column_id)
 Total runtime: 22256.999 ms
(9 rows)

That's a difference of 22%.
So let's fire the query and see its effect.

edb=# update update_target t1
	set columne_to_update =(select source_for_update 
                                     from update_source t2
		              	     where t1.column_id=t2.column_id),
	column_2_for_update=(select column_2_source 
                                     from update_source t2
		              	     where t1.column_id=t2.column_id);

But what has happened to those rows in update_target which were not able to find a match in update_source? Well those would have got themselves updated to null. Let's confirm

edb# select count(*) from update_target where columne_to_update is null;
 
   count
 ---------
 1300483
 (1 row)
 

Actually the above UPDATE statement is supposed to update everything at update_target by fetching data from update_source. If something is not found then it is unknown i.e. NULL. So we need to restrict the rows which needs to be updated, with a filter which says "update only those rows for which a match exists in update_source". So the query will look something like this:

update update_target t1 
set columne_to_update =(select source_for_update from update_source t2 
	where t1.column_id=t2.column_id),
column_2_for_update=(select column_2_source from update_source t2 
	where t1.column_id=t2.column_id)
where exists (select 1 from update_source t3 where t3.column_id=t1.column_id); 

Let's see how this one performs.

edb=#  explain analyze update update_target t1 
	set columne_to_update =(select source_for_update from update_source t2 
		where t1.column_id=t2.column_id),
	column_2_for_update=(select column_2_source from update_source t2 
		where t1.column_id=t2.column_id)
	where exists (select 1 from update_source t3 where t3.column_id=t1.column_id);
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on update_target t1  (cost=14461.94..81679.34 rows=10240 width=16) (actual time=1454.999..1454.999 rows=0 loops=1)
   ->  Hash Join  (cost=14461.94..81679.34 rows=10240 width=16) (actual time=1179.335..1331.664 rows=10237 loops=1)
         Hash Cond: (t3.column_id = t1.column_id)
         ->  Seq Scan on update_source t3  (cost=0.00..172.02 rows=10240 width=10) (actual time=0.013..7.162 rows=10240 loops=1)
         ->  Hash  (cost=11054.07..11054.07 rows=1310720 width=10) (actual time=1178.699..1178.699 rows=1310720 loops=1)
               Buckets: 131072  Batches: 1  Memory Usage: 53760kB
               ->  Seq Scan on update_target t1  (cost=0.00..11054.07 rows=1310720 width=10) (actual time=0.011..710.234 rows=1310720 loops=1)
         SubPlan 1
           ->  Index Scan using update_source_pkey on update_source t2  (cost=0.00..3.27 rows=1 width=30) (actual time=0.004..0.004 rows=1 loops=10237)
                 Index Cond: (t1.column_id = column_id)
         SubPlan 2
           ->  Index Scan using update_source_pkey on update_source t2  (cost=0.00..3.27 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=10237)
                 Index Cond: (t1.column_id = column_id)
 Total runtime: 1468.033 ms
(14 rows)

The plan would be bit simpler if we had to update just one column.

edb=#  explain analyze update update_target t1 
	set columne_to_update =(select source_for_update from update_source t2 
		where t1.column_id=t2.column_id)
	where exists (select 1 from update_source t3 where t3.column_id=t1.column_id);
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
--
 Update on update_target t1  (cost=20556.94..54328.97 rows=10240 width=20) (actual time=1222.157..1222.157 rows=0 loops=1)
   ->  Hash Join  (cost=20556.94..54328.97 rows=10240 width=20) (actual time=1002.328..1105.557 rows=10237 loops=1)
         Hash Cond: (t3.column_id = t1.column_id)
         ->  Seq Scan on update_source t3  (cost=0.00..172.02 rows=10240 width=10) (actual time=0.013..7.244 rows=10240 loops=1)
         ->  Hash  (cost=17149.07..17149.07 rows=1310720 width=14) (actual time=1001.109..1001.109 rows=1310720 loops=1)
               Buckets: 131072  Batches: 1  Memory Usage: 53800kB
               ->  Seq Scan on update_target t1  (cost=0.00..17149.07 rows=1310720 width=14) (actual time=0.011..521.482 rows=1310720 loops=1)
         SubPlan 1
           ->  Index Scan using update_source_pkey on update_source t2  (cost=0.00..3.27 rows=1 width=30) (actual time=0.004..0.004 rows=1 loops=10237
)
                 Index Cond: (t1.column_id = column_id)
 Total runtime: 1237.864 ms
(11 rows)
The difference in response time is of around 12.5%, just because of the additional column.

Let's check if this update goes as per our expectation.

edb=# update update_target t1 
	set columne_to_update =(select source_for_update from update_source t2 
		where t1.column_id=t2.column_id),
	column_2_for_update=(select column_2_source from update_source t2 
		where t1.column_id=t2.column_id)
	where exists (select 1 from update_source t3 where t3.column_id=t1.column_id);
 UPDATE 10237
 
edb=# select count(*) from update_target where columne_to_update is null;
  count
 -------
      0
 (1 row)
Very few people might know that in PostgreSQL the actual syntax for Update command is as below:

[ WITH [ RECURSIVE ] with_query [, ...] ]
UPDATE [ ONLY ] table_name [ * ] [ [ AS ] alias ]
    SET { column_name = { expression | DEFAULT } |
          ( column_name [, ...] ) = ( { expression | DEFAULT } [, ...] ) } [, ...]
    [ FROM from_list ]
    [ WHERE condition | WHERE CURRENT OF cursor_name ]
    [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]
The highlighted part of this syntax is partcularly important. It helps you update records beased on join. So for our requirement a much better and cleaner code is as below:
update update_target t1 
	set columne_to_update=t2.source_for_update ,
	t1.column_2_for_update=t2.column_2_source
	from update_source t2 where t1.column_id=t2.column_id;
	
or 

update update_target t1 
	set (columne_to_update,t1.column_2_for_update)=(t2.source_for_update ,t2.column_2_source)
	from update_source t2 where t1.column_id=t2.column_id;
The plan for above option is
								QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Update on update_target t1  (cost=14688.35..14950.99 rows=10240 width=50) (actual time=1095.327..1095.327 rows=0 loops=1)
   ->  Hash Join  (cost=14688.35..14950.99 rows=10240 width=50) (actual time=945.722..973.844 rows=10237 loops=1)
         Hash Cond: (t2.column_id = t1.column_id)
         ->  Seq Scan on update_source t2  (cost=0.00..172.02 rows=10240 width=44) (actual time=0.012..7.936 rows=10240 loops=1)
         ->  Hash  (cost=11227.12..11227.12 rows=1331239 width=10) (actual time=944.189..944.189 rows=1310720 loops=1)
               Buckets: 262144  Batches: 1  Memory Usage: 53760kB
               ->  Seq Scan on update_target t1  (cost=0.00..11227.12 rows=1331239 width=10) (actual time=0.009..490.419 rows=1310720 loops=1)
 Total runtime: 1111.271 ms
(8 rows)
Let's see how much of an overhead was caused cause of multiple columns. We test it by generating a plan for update of only one column.

edb=# explain analyze update update_target t1 set columne_to_update=t2.source_for_update from update_source t2 where t1.column_id=t2.column_id;
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Update on update_target t1  (cost=15719.74..15982.38 rows=10240 width=46) (actual time=1124.990..1124.990 rows=0 loops=1)
   ->  Hash Join  (cost=15719.74..15982.38 rows=10240 width=46) (actual time=977.541..1005.278 rows=10237 loops=1)
         Hash Cond: (t2.column_id = t1.column_id)
         ->  Seq Scan on update_source t2  (cost=0.00..172.02 rows=10240 width=40) (actual time=0.236..7.509 rows=10240 loops=1)
         ->  Hash  (cost=12015.47..12015.47 rows=1424717 width=10) (actual time=976.187..976.187 rows=1310720 loops=1)
               Buckets: 262144  Batches: 1  Memory Usage: 53760kB
               ->  Seq Scan on update_target t1  (cost=0.00..12015.47 rows=1424717 width=10) (actual time=0.012..524.587 rows=1310720 loops=1)
 Total runtime: 1141.219 ms
(8 rows)
Surprisingly there is hardly any difference in response time and cost as well has not sufferred much. So I can be assured that even if I have to update a third column, the plan will remain more or less the same.
There are some RDBMS implementations which has a dedicated statement (MERGE) for achieving UPSERT (Update/Insert based on some criteria), via that they support something similar to what I have explained here.
Now, I know there will be people talking and commenting about lack of UPSERT in PostgreSQL. Well, I don't quite agree to that. Do you know you can return the details from an UPDATE query and PostgreSQL v9.1 (and onwards) allows you to capitalize that returned in data to insert/delete records. So, it makes it a bit more powerful than the MERGE query. Below is an example.
If you remember there were three rows which did not exist at update_target but were present in update_source (if not then just scroll up)
-- Using RETURNING to achieve an UPSERT/MERGE
edb=# with updated as (update update_target t1 set columne_to_update=t2.source_for_update from update_source t2 where t1.column_id=t2.column_id
	returning t1.column_id)
	insert into update_target
	select * from update_source t_src where t_src.column_id not in
	(select column_id from updated);
INSERT 0 3
I hope now before writing a complex subquery to update your data, you will first try to explore the possibility of using FROM clause. Well I will expect happy users of PostgreSQL to stop complaining about absence of UPSERT feature.
PS: The flexibility explained above is not just limited to UPDATE, it is also available to you when you are doing the deletion. There is a USING clause in DELETE statement.
A detailed documentation for UPDATE and DELETE commands can be found on PostgreSQL website. 

Tuesday, July 30, 2013

Recoverability of PostgreSQL-2: Setup of Hot Standby, failover and rebuilding the master node

In my last post, I had discussed about certain concepts of backup,recovery and High Availability. And as promised, in this post, I will be covering specifics about implementation of Hot Standby server, failover and switch over scenarios.

Step 1: Installation

The first step would be to install PPAS (same version) on two different servers (running same OS/OS version). For putting up this blog, I tried these steps on 2 servers, each running Postgres Plus Advanced Server 9.2.1 on RHEL 6.1. For the sake of simplicity, instead of overwriting the default DB cluster/instance on secondary db server, I created a different db instance
  • Primary DB Server:
    •  Name:dbserver1
    •  IP Address: 192.168.160.147
    •  Data directory: /opt/PostgresPlus/9.2AS/data
    •  Port: 5444
  • Stand by DB Server
    •  Name:dbserver2
    •  IP Address: 192.168.160.150
    •  Data Directory: /opt/PostgresPlus/9.2AS/data2
    •  Port: 5222
Step 2: Parameter Setup

Change the below parameter:
  •   wal_level = hot_standby
  •   max_wal_senders = 3
  •   wal_keep_segments = 128 (optional/depending on load)
  •   replication_timeout = 5 sec (optional)
  •   hot_standby = on (required/effective only for hot stand by server)
To ease the pain of back-up-restore, failover-failback I created two postgresql.conf backups on primary server (under data directory)
1.   postgresql.conf.dbserver1
2.    postgresql.conf.dbserver2

Both the files are same, with only difference in the value for port (dbserver1=5444 and dbserver2=5222).
The value for replication related parameters and hot_standby same in both the files. As the replication parameters are not going to cause any difference on secondary server unless you use cascaded replication and hot_standby value is ignored on Primary Server.

Add a new value in pg_hba.conf
host  replication   all   192.168.160.0/24      trust

Step 3: Create recovery.conf

Create two dummy recovery.conf files namely recovery.conf.dbserver1_is_master and recovery.conf.dbserver2_is_master.

recovery.conf.dbserver1_is_master
standby_mode = 'on'
primary_conninfo = 'host=192.168.160.147 port=5444 user=enterprisedb password=ashnik'
restore_command = 'scp enterprisedb@192.168.160.150:/mnt/arch/%f %p' 
             # optional
            # needs archiving command to be enabled on primary
recovery_target_timeline = 'latest'      #optional
trigger_file = '/opt/PostgresPlus/9.2AS/data/recover.trigger'

recovery.conf.dbserver2_is_master
standby_mode = 'on'
primary_conninfo = 'host=192.168.160.150 port=5222 user=enterprisedb password=ashnik'
restore_command = 'scp enterprisedb@192.168.160.150:/mnt/arch/%f %p' 
         # optional
        # needs archiving command to be enabled on primary
recovery_target_timeline = 'latest'      #optional
trigger_file = '/opt/PostgresPlus/9.2AS/data/recover.trigger'

For the ease of management, you should keep the passwords same on both the servers. Here the password is used in plain text, but one can always use password files or md5 password.

Step 4: Backup

You need to take a cold backup of your primary server and restore it on secondary server. With rest of the files in the backup you will also get below files:
  1.    postgresql.conf.dbserver1
  2.    postgresql.conf.dbserver2
  3.    recovery.conf.dbserver1_is_master
  4.    recovery.conf.dbserver2_is_master
Step 5: Setup of Standby Server

Now copy the postgresql.conf.dbserver2 as postgresql.conf on dbserver2.
Then copy the recover.conf.dbserver1_is_master as recovery.conf
Now start the db cluster on secondary server first to make sure it is not going to lose any transaction. You will see an error in log complaining about non-availability of primary server, which can be ignored at this point. Confirm that secondary database instance is up and running and you can connect and fire read queries on the database.

Step 6: Start-up the Primary Server

Once sure, start the primary server.

Check the primary and secondary server. Connect to each of them to confirm the connectivity, role and read/write accessibility.

You can use “pg_ctl status” OS command and “pg_is_in_recovery()” SQL function to confirm the status of each database.

Now let’s try to do a failover.

Step 7: Create a Failure

So for that, first there has to be a failure. On the primary site, fire a command
        kill -9 <>

Now connect to the secondary database, check if you can do write operation or simply check if it’s still in recovery state by pg_is_in_recovery() function. So do you still get “t” as the output of the above command? Are you still not able to write operations on the secondary database?

Well don’t be surprised, actually PostgreSQL does not do failover on its own. There have always been two school of thoughts about any failover mechanism, one which says “auto failover” and other which says “manual failover”. If given a choice, I always prefer a manual failover for DR site. This ensures that my DR site does not mistakenly assume a network failover as a disaster or a failure. Moreover, the DR site is not just database, one also needs to make sure the application and client connectivity is modified accordingly. Hence it’s best to keep it manual. Auto-failover is useful if you want to do hot-streaming replication for HA (which I would discuss in a later post in this series).

Step 8: Switch Secondary/slave database to become read/write

On the secondary server create the trigger file (as specified in recovery.conf)
touch opt/PostgresPlus/9.2AS/data2/recover.trigger

Now connect again and check if your current live sessions/new sessions to the secondary database are able to do write operations to the database. If yes, then great! You just completed a successful failover.

Step 9: Rebuilding the Master Database

Now, we need to re-build the master (assuming that the master database server is up).

First of all clean up the database cluster on old Primary server:
            rm -Rf /opt/PostgresPlus/9.2AS/data

Now, take a backup from new primary (dbserver2) to rebuild the master:
pg_basebackup -D /opt/PostgresPlus/9.2AS/data -h 192.168.160.150 -W -p 5222 -Fp --xlog-method=stream

Once the backup is complete, copy the postgresql.conf.dbserver1 as postgresql.conf and then remove recovery.done and copy the recover.conf.dbserver2_is_master as recovery.conf.

Step 10: Start the Primary DB as new slave

Now start the db cluster on master node. Once the start-up is successful, make sure everything is alright by connecting to the database and firing the below command:
SELECT pg_is_in_recovery(); #expected output is “t”

Then fire the below command on dbserver1 and dbserver2 and both should be same:
SELECT txid_current_snapshot();

Carefully inspect the log files on secondary node (dbserver1) to confirm the recovery is in progress and there is no issues in the replication.

I have dodged a few details in this post like:
1.     Details of pg_basebackup command (to take a backup), which will be included in one of my upcoming posts in this series
2.     Details of archive logging and its setup – This is going to be the next post