About Me

My photo
I am currently working as a DB Consultant with Ashnik. We are into Open Source and Cloud based enterprise class solutions. Prior to this I was working as a DBA @ Misys India (Banking Services). Working on IBM DB2, Oracle Microsoft SQL Server. Selected as IBM Champion for Data Management 2011. I like bit of photography, cycling, poetry and I just love gadgets!!
Showing posts with label EnterpriseDB. Show all posts
Showing posts with label EnterpriseDB. Show all posts

Tuesday, November 18, 2014

Emulate redo Multiplexing in PostgreSQL


When it comes to reliability and recover-ability, PostgreSQL is as good a database as any other RDBMS implementation we see in the market. Postgres very strongly complies with ACID philosophy of  Relational Databases. It has Multi-version concept where the older snapshot of rows/data is preserved until any transaction which may need it lives (something similar to UNDO tablespaces or rollback segment of Oracle). The database recover-ability is ensured by WAL segments which you can relate to redo log files of Oracle or transaction logs of SQL Server. These WAL files are useful for single phase recovery (after an OS or hardware crash). You can enable Archiving of these WAL files and copy them to a separate disk or server and use it for complex recovery. I will talk about this at length in one of my next blogs.

One thing which I have seen Oracle DBAs boast about is multiplexing of redo log files by creating multiple redo log groups. Well that is something which is not there is PostgreSQL out of box (and mainly because it's not really needed given that we have RAID devices and multiplexing/mirroring at OS/storage level). Still if you have a need to multiplex your WAL segments you can do that quite easily v9.2 onwards with pg_receivexlog.

Do pg_receivexlog is a utility which is based on streaming replication and can stream the WAL files in real time. Unlike archive_command it does not require WAL files to be completely filled (16MB) before they can sent over. The files replicated/streamed can be used for archiving or replication.

You can setup a local stream for which you need to-

1. set max_wal_senders to a value 1 or greater. If you have setup replication and using pg_basebackup please count them too.

                     max_wal_senders=2

2. Set the wal_level to archive or hot_standby
                     wal_level=hot_standby

3. Set pg_hba.conf to allow replication connections from local (UNIX Socket)

                      local  replication postgres        trust

4. Restart your DB server after these changes
                     pg_ctl restart -m fast

5. Start pg_receivexlog-
                    pg_receivexlog -D /usr/local/pgsql/wal_2 -U postgres

Make sure that the PGHOST is not set in the session from where you are firing this command.

This command will copy (stream) the WALs from a server locally (on default port pointed by PGPORT) and store them in secondary WAL location - /usr/local/pgsql/wal_2


Now if your server crashes and you lose your WAL directory, all you need to do is look for a file with ".partial" as extension in your secondary WAL location (/usr/local/pgsql/wal_2), rename the file to remove the extension. Then copy the content of secondary WAL location to primary location and restart the server. The last chuck of changes were present in this ".partial" file which otherwise would not have got archived.


Note: There is one caveat though, the files in secondary WAL directory will not get rotated or removed automatically, you may do that on periodic basis with pg_archivecleanup command.


Tuesday, June 3, 2014

25 Features your Developers are missing when not using Postgres


I recently met a few prospects who were keen to know what Postgres can offer to their developers which they weren’t already getting while using the contemporary databases. So I thought of putting together a list.
Though the list refers to features of Postgres Plus Advanced Server, lot of these features are applicable for PostgreSQL too. I have found great resources and documentation for “how to use” references for most of these features and hence I am not going to repeat that here. Let me share the list and some quick reference/user guides for each of them.
1.       JSON Datatype for storing non-relational data in JSON document- More details http://www.postgresql.org/docs/9.3/static/datatype-json.html

2.       JSON Functions to convert relational data into JSON format- More details http://www.postgresql.org/docs/9.3/static/functions-json.html

3.       HSTORE datatype for storing non-relational data as key value pair- More details http://www.postgresql.org/docs/9.1/static/hstore.html

4.       Range Datatype- DATE Range and Integer Range for storing RANGE [minValue-maxValue] together instead of having to store minValue and maxValue in separate columns- For More details- http://www.postgresql.org/docs/9.2/static/rangetypes.html

5.       Boolean Datatype: Support for Boolean as datatype for columns- For more details- http://www.postgresql.org/docs/9.3/static/datatype-boolean.html

6.       Support for TIME datatype- As far as I remember TIME data-type was not present in DB2 (supported by Oracle)

7.       Infinite Cache for Application Performance- Exclusive to EnterpriseDB Postgres Plus

8.       Handling and Playing with Time Series Data is much more simpler and easier in PostgreSQL: http://no0p.github.io/postgresql/2014/05/08/timeseries-tips-pg.html

9.       Rich Foriegn Data Wrapper Support for querying form various different sources as foriegn table- More details here: http://wiki.postgresql.org/wiki/Foreign_data_wrappers

10.   UNLOGGED Tables: You can create unlogged tables which does not have WAL (redo-log equivalent) overhead for write operations.

11.   Serializable Snapshot Isolation: SSI provides features of Serializable transaction but without the performance impacts More details: http://wiki.postgresql.org/wiki/SSI and http://wiki.postgresql.org/wiki/Serializable

12.   NOT VALID Constraints: Add a check constraint/FK for new incoming data but ignore any inconsistency in existing data. This will make adding constraints faster. Later, you can clean-up the data and validate the constraint

13.   Filter Based Indexes: You can index a particular column based on some filter. This can be useful for indexing columns based on criteria e.g. index on employee_table(dept) where hasLeft=false

14.   Create Recursive Views using shorthand syntax - More details here: https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.3#Recursive_View_Syntax

15.   Define sort memory for each query- Gives you flexibility to assign more resources for one query which takes huge resources. You can define work_mem at user, transaction, query/statement or at DB level.

16.   LATERAL JOIN: A LATERAL JOIN enables a subquery in the FROM part of a clause to reference columns from preceding items in the FROM list. More details: https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.3#LATERAL_JOIN

17.   Define User defined Operators- Flexibility to define user defined operators makes it easy to define your own operators. e.g while migrating from SQL Server one can define a new Operator '+' with two string operands which would work similar to concatenation operator of SQL Server (+).

18.   Rich Language Support: Create user defined objects in Pl/pgSQL, PL/SQL, pl/perl, pl/python, pl/java etc. Your developers can choose a language of their choice for implementing the Stored Procedure

19.   IS NOT NULL queries can make use of indexes- I have seen RDBMS which boasts about its optimizer failing to do so

20.   GRANT/REVOKE access on all tables of a particular schema- More general thatn grant access on each table but at the same time more granular than "SELECT ANY TABLE" and similar privileges

21.   LIMIT and OFFSET features- Limit the number of rows in your output using LIMIT keyword. You can further set an OFFSET from where to start fetching using OFFSET

22.   Granular security and logging: Set User privileges and logging parameter per user/role

23.   Full Text Search Features- More details here- http://www.postgresql.org/docs/9.3/static/textsearch.html

24.   Use indexes for getting statistics on the fly- This feature can help you get a better plan when you are reading data from a table which has been loaded with bulk data recently

25.   COPY JDBC API: Use COPY API to do bulk load into table via JDBC. Check my earlier post on this- http://pgpen.blogspot.com/2013/05/using-copy-in-your-jdbc-code.html


Remember, I have not even added the awesome new features which we will see in PostgreSQL v9.4 [https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.4].

If you are developer who is not using PostgreSQL then now you know what you have missing all along. If you are a CIO/IT Manager or Product Manager then you know what great features your developers have been deprived off. What are you waiting for? 


If you are a happy postgreSQL user and you think I missed any features, please help me update this list. Thanks in advance!

Tuesday, November 12, 2013

Secure your DB the hollywood style



Today I was discussing with a colleague who found it strange that some people use some random names for their database or their database directories.
After pondering over it for a while, I thought its a good thing.

Now if you were a hacker, and had somehow got access to my DB server, you will still have to guess the name of my database. So you will do a  " ps -ef | grep -i postgres "

Or if you are lucky you could just go to the default location or use $PGDATA variable.

Well, these make life of a DBA easy but they make life of a hacker equally easy.

Few elements of security which I have learnt from hollywood flicks are:
  1. Create confusion with duplicates: Remember "Italian Job"
  2. Use unusual names: In war movies do you hear "Tangos" talking to "Charlies" 
  3. Avoid/hide using usual (default) identity: That is why they posed as Movie Crew in "Argo"
  4. Hide/lock the identity/password of superuser and pose a pseudo superuser for everyone to use and interact with: Didn't Jim die before he could control everything and hide his identity in Mission Impossible

Suppose, If I have 3 instances running:

pg_ctl -D /opt/Bali/db
pg_ctl -D /opt/KulaLumpur/db
pg_ctl -D /opt/Bangkok/db

Now you will have to log-in into at least 3 instances, where again I can have multiple databases by name
Bali/db: TamanAyana
Bali/db: Tan
Bali/db: Ulluwatu

KulaLumpur/db: TwinTower
KulaLumpur/db:  CyberJaya


Bangkok/db: WatArun
Bangkok/db: RoyalPalace


This means you have to do more work to get into my system, which means I have more time to stop the intrusion.


Of course if you were smart, you can check memory utilization and number of connections etc. But what if I have set some rules for process priority, I am using same amount of memory for each instance (and smartly allowing the pages of instances not being used to be paged-out). I would be successful to keep you confused for a little longer before you know which database/instance is your true target. Well anyways, Italian Job too was made by figuring out load etc and hence isolating the identity of true target.


Same ways, if possible run your PostgreSQL database on non-default port. Why? Suppose if I have not been able to crack into your DB Server but I managed to figure out it's IP, my first attempt at connecting to PostgreSQL would be aimed at 5432. A determined hacker will eventually be able to hit in the correct spot/port. Sooner or later, your true identity will be out but that will take time and that means you have more time to board that Swiss Airlines flight and safeguard yourself.

Now, some really great feature that PostgreSQL has is hba authentication. It is always good to add a line as below at the end in your hba conf:


host all all 0.0.0.0/0 reject


It will make sure that any connection from unauthorized IP addresses are rejected.

Additionally, the first attempt someone will make at connecting to you DB Server, would be using standard users e.g. enterprisedb, postgres etc. Make sure you set password of these users to something that is completely random and you don't need to remember it. Just create another superuser and always use that user for connecting and doing maintenance work. Make sure that the username is not your first name. I would prefer "wolverine".


Some installers of PostgreSQL might create a .pgpass in your OS user's home directory. It's better to remove them unless you think there is a use-case for it in your environment.


Grant your users only what they need. I will follow this to death. There is no application written which needs DBA, superuser or any similar privilege on your database or instance. Never ever grant these (of course unless your job is at stake) to any user. Grant them only few database privileges which "you think" they are going to need. If they hit an issue or database tells them that they can not do certain operation, grant permission for that operation. If you want a really secure database do this as many times as your users hit a problem.

Never take that shortcut "alter user my_app_user with superuser"


It's good to segregate roles for your users. There should atleast 3 roles, superuser (who is db superuser), application superuser (one who owns and controls application schema changes), application user (one who uses the application). Application reader/auditor etc could be some other roles depending on your environment.

It is a good idea to set a limit on number of connections which can established by these user and restrict their work_mem and few other parameters which you can control at user level. 

Friday, October 25, 2013

Installation steps for pgBouncer on PowerLinux



Our team has been excited ever since IBM and Enterprise DB announced availability of Postgres Plus Advanced Server on PowerLinux.
Our team recently started engaging with IBM to evaluate the scalability of PPAS on PowerLinux and to determine how to fine-tune such setups.
(It's a work-in-progress and I am not sure how much I can share here. But if you are thinking of migrating your IT infrastructure to a much more cost effective stack of Power PC running RHEL and using EnterpriseDB PPAS/PostgreSQL as database please contact me or write to success@ashnik.com).
Though I really welcome this new announcement from EnterpriseDB, one thing which disappointed me was non-availability of pgBouncer in the rpm installers provided by EnterpriseDB for PPAS on PowerLinux.

I have always preferred pgBouncer as a connection pooling tool over pgpool. It's light weight and quite easy to configure. In the past I had done some benchmarking study to see how PPAS scales with and without connection pooling. I had seen some significant gain and hence decided to install pgBouncer before I could to proceed with the performance measurement of PPAS on PowerLinux.
I faced a few issues but could finally install it and get it running. So I decided to share my experience of the same (which can hopefully help someone, as more and more people try to move away from costly hardware and proprietary database to PPAS on PowerLinux).
First thing that one should do is, make sure you have IBM Advanced ToolChain installed on your PowerLinux box.
As per my understanding, the compiler which is part IBM Advanced ToolChain is supposed to be optimized for Power PC. So it's preferred for code compilation instead of using gcc compiler shipped with Red Hat.
Generally it will be installed in /opt/atXX
So I set my path to include Advanced Toolchain:
[root@pflex75 pgbouncer]# export PATH=/opt/at7.0/bin:$PATH

I just wanted to be sure about my version of gcc compiler
[root@pflex75 pgbouncer]# which gcc
                /opt/at7.0/bin/gcc

[root@pflex75 pgbouncer]# gcc --version
gcc (GCC) 4.8.2 20130819 (Advance-Toolchain-at7.0) [ibm/gcc-4_8-branch, revision: 201854 merged from gcc-4_8-branch, revision 200989]
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


I downloaded the latest version of pgbouncer (for me its 1.5.4) from pgfoundries site and extracted the content of the tar file
[root@pflex75 pgbouncer]# tar -xzvf pgbouncer-1.5.4.tar.gz

Now was the time to configure the binaries and install them as per the instruction in README file which is extracted from above tar. So I fired below commands:

[root@pflex75 pgbouncer]# cd pgbouncer-1.5.4
[root@pflex75 pgbouncer-1.5.4]# ./configure --prefix=/usr/local --with-libevent=libevent-prefix
[root@pflex75 pgbouncer-1.5.4]# make
[root@pflex75 pgbouncer-1.5.4]# make install
[root@pflex75 pgbouncer-1.5.4]# cd ..

Now I had to setup the pgBouncer and start it.
So I coped the pgbouncer.ini file to a location of my choice (/etc/pgbouncer)
[root@pflex75 pgbouncer]# mkdir -p /etc/pgbouncer
[root@pflex75 pgbouncer]# chown enterprisedb:enterprisedb -Rf /etc/pgbouncer
[root@pflex75 pgbouncer]#  cp /usr/local/share/doc/pgbouncer/pgbouncer.ini /etc/pgbouncer/pgbouncer.ini

Then I created a userlist.txt in a location of my choice (once again /etc/pgbouncer):
[root@pflex75 pgbouncer]#  cat /etc/pgbouncer/userlist.txt
                                "enterprisedb" "password"

Now I followed all the standard steps to edit the pgbouncer to add all the address and databases. Then I went ahead to set pool size etc and then set path for userlist file.

In pgbouncer.ini file, I checked the value for logfile and pidfile. I had to ensure that these locations are present and owned by proper user (the user which I will be using for running pgbouncer must have permissions to run them).

                                logfile = /var/log/pgbouncer/pgbouncer.log
                                pidfile = /var/run/pgbouncer/pgbouncer.pid

[root@pflex75 pgbouncer]# mkdir -p /var/log/pgbouncer;
[root@pflex75 pgbouncer]# touch /var/log/pgbouncer/pgbouncer.log;
[root@pflex75 pgbouncer]# chown enterprisedb:enterprisedb -Rf /var/log/pgbouncer
[root@pflex75 pgbouncer]# mkdir -p /var/run/pgbouncer/
[root@pflex75 pgbouncer]# chown enterprisedb:enterprisedb -Rf /var/run/pgbouncer

But once I tried to run pgbouncer, I got below issue
[root@pflex75 pgbouncer]# pgbouncer  -d  /etc/pgbouncer/pgbouncer.ini
pgbouncer: error while loading shared libraries: libevent-2.0.so.5: cannot open shared object file: No such file or directory

To resolve (I followed below steps, but may be one can just resolve it by setting LD_LIBRARY):

First thing I did was removing the existing installation:
[root@pflex75 pgbouncer]# rm  /usr/local/bin/pgbouncer
[root@pflex75 pgbouncer]# rm  /usr/local/share/man/man5/pgbouncer.5
[root@pflex75 pgbouncer]# rm  /usr/local/share/man/man1/pgbouncer.1
[root@pflex75 pgbouncer]# rm -rf /usr/local/share/doc/pgbouncer

Then I removed the installer I had extracted and then extracted it afresh:
[root@pflex75 pgbouncer-1.5.4]# cd .. ; rm -rf pgbouncer-1.5.4
[root@pflex75 pgbouncer]# tar -xzvf pgbouncer-1.5.4.tar.gz



I made sure you have latest libevent was installed:
[root@pflex75 pgbouncer]# yum info libevent
                                Loaded plugins: product-id, refresh-packagekit, security, subscription-manager
                                This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
                                Installed Packages
                                Name        : libevent
                                Arch        : ppc64
                                Version     : 1.4.13
                                Release     : 4.el6
                                Size        : 335 k
                                Repo        : installed
                                From repo   : anaconda-RedHatEnterpriseLinux-201301301505.ppc64
                                Summary     : Abstract asynchronous event notification library
                                URL         : http://monkey.org/~provos/libevent/
                                License     : BSD
                                Description : The libevent API provides a mechanism to execute a callback function
                                                    : when a specific event occurs on a file descriptor or after a timeout
                                                    : has been reached. libevent is meant to replace the asynchronous event
                                                    : loop found in event driven network servers. An application just needs
                                                    : to call event_dispatch() and can then add or remove events dynamically
                                                    : without having to change the event loop.

                                Available Packages
                                Name        : libevent
                                Arch        : ppc
                                Version     : 1.4.13
                                Release     : 4.el6
                                Size        : 66 k
                                Repo        : local
                                Summary     : Abstract asynchronous event notification library
                                URL         : http://monkey.org/~provos/libevent/
                                License     : BSD
                                Description : The libevent API provides a mechanism to execute a callback function
                                                    : when a specific event occurs on a file descriptor or after a timeout
                                                    : has been reached. libevent is meant to replace the asynchronous event
                                                    : loop found in event driven network servers. An application just needs
                                                    : to call event_dispatch() and can then add or remove events dynamically
                                                    : without having to change the event loop.


Then I set the value for LD_LIBRARY_PATH parameter to look at /usr/lib64 (for power64 system it will have the libevent)
[root@pflex75 pgbouncer-1.5.4]# export LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH

I created an appropriate Softlink for Libevent (I believe this step was not needed)
[root@pflex75 pgbouncer-1.5.4]# ln ln -s  /usr/lib64/libevent-1.4.so.2 /usr/local/lib
[root@pflex75 pgbouncer-1.5.4]# ./configure --prefix=/usr/local --with-libevent=/usr/lib64
[root@pflex75 pgbouncer-1.5.4]# make
[root@pflex75 pgbouncer-1.5.4]# make install
[root@pflex75 ~]# pgbouncer --help
Usage: pgbouncer [OPTION]... config.ini
  -d, --daemon           Run in background (as a daemon)
  -R, --restart          Do a online restart
  -q, --quiet            Run quietly
  -v, --verbose          Increase verbosity
  -u, --user=<username>  Assume identity of <username>
  -V, --version          Show version
  -h, --help             Show this help screen and exit

Now switch user to enterprisedb (or any user which you wish to use for running pgbouncer)
[root@pflex75 ~]# su - enterprisedb

-bash-4.1$  pgbouncer  -d  /etc/pgbouncer/pgbouncer.ini
pgbouncer: error while loading shared libraries: libevent-2.0.so.5: cannot open shared object file: No such file or directory


Ahhhha!!! Isn’t that the same error I just resolved? So I added below line in .bash_profile:

                export LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH


Now apply the same for current shell environment
-bash-4.1$ export LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH

Try again:
-bash-4.1$  pgbouncer  -d  /etc/pgbouncer/pgbouncer.ini
2013-10-25 19:31:40.325 11412 LOG File descriptor limit: 1024 (H:4096), max_clie  nt_conn: 200, max fds possible: 250

-bash-4.1$ ps -ef | grep -i pgbouncer
495      14384     1  0 19:41 ?        00:00:00 pgbouncer -d /etc/pgbouncer/pgbouncer.ini
Wow!!! Issue resolved!!! :-)

A few more steps which won't harm (just to ensure permissions)
-bash-4.1$ chown enterprisedb:enterprisedb -Rf /usr/local/share/man/man1/pgbouncer.1
-bash-4.1$ chown enterprisedb:enterprisedb -Rf /usr/local/share/man/man5/pgbouncer.5

-bash-4.1$ chown enterprisedb:enterprisedb -Rf /usr/local/share/doc/pgbouncer

Thursday, July 11, 2013

Recoverability of PostgreSQL-1: Recovery Concepts

Recently one of my colleagues told me that, it’s very difficult (he actually said "impossible") to do fail back and switchback in PostgreSQL. Well, I thought of giving it a try. In my upcoming blog, I will be discussing the process and details of my experiment. But before that, it’s very important to understand certain concepts that people get generally confused with, thus ending up with a messed up recovery strategy. Let me discuss a few terms which are very important for designing your backup-recovery and disaster-recovery processes:

1) Archiving: Means archiving your transaction/xlog/wal segments/WAL. This is not a substitute for creation of DR site. It is required for taking a hot backup/online backup. And so is the case with at least MS SQL Server and DB2.

2) Backup: A process in which you take a backup of your database. It can be logical where you take a backup of tables and rows or it can be physical where you backup the datafiles and other configuration files needed by your database/instance/cluster. Physical backups can be cold/offline or can be online/hot.

3) Crash Recovery: You had a crash of your database service/server and your database is recovering from the same. Generally if your disk is intact this can be achieved with your WAL segments/WAL itself. Whereas, if your disk has crashed and you are restoring your database from a backup you will need to archive files and latest WAL files for applying the changes since the backup was taken.

4) Disaster Recovery: You had a crash after which your database server is not available and you have to either rebuild the whole system from scratch using a backup or have to start using a DR site you had built for just this scenario.

5) Fail over: A term which refers to using your DR site in case there is a major failure at your primary site.

6) Switchover: A term which refers to switching the roles of your Primary and DR sites either because Primary site has come up again after a recent major failure or due to some maintenance work for which you have to switch the roles.

7) Switchback: A term which means switching back/restoring back the roles of "new primary" and "new DR" sites. i.e. the original Primary site regains control. This is generally a planned activity and everyone desires to have minimal downtime for this.

8) Fail back: A term which is generally used interchangeably for switchback, but I would think it is more of those scenarios when your "new primary" or "switched primary" server goes down for some reason and you have to now promote the "new DR" (which originally was ’primary’ site in start of time) as Primary

9) Warm standby: A standby server which is an exactly (almost real time) replica of your primary server and is always available in recovery mode and cannot be accessed until a recovery is triggered

10) Hot Standby: A standby server which is an exactly (almost real time) replica of your primary server and is always available in recovery mode and cannot be accessed for write commands until a recovery is triggered. But this server can be utilized for read queries

11) WAL Archive Replication: A replication process where the WAL segments which have been archived are copied and replayed on the standby server

12) Streaming Replication: A replication process where the WAL segments are copied directly to the standby servers (standby server need not wait for the WAL files to be archived)

So folks, these are some the terms that would help give you a better insight into understanding the concepts. And if you already have a recovery strategy, then hopefully these terms may have helped you to understand the process in a better way.
I shall be back with a test which does fail-over, switchover (after a fail-over) and then switchback.

PS: Any comments or corrections about the terms and their definition/explanation used here, are more than welcome.