About Me

My photo
I am currently working as a DB Consultant with Ashnik. We are into Open Source and Cloud based enterprise class solutions. Prior to this I was working as a DBA @ Misys India (Banking Services). Working on IBM DB2, Oracle Microsoft SQL Server. Selected as IBM Champion for Data Management 2011. I like bit of photography, cycling, poetry and I just love gadgets!!

Wednesday, May 29, 2013

Easy way to Attach a partition in EnterpriseDB 9.2

For some people, partitioning is always a pain when using PostgreSQL. You have to create child tables, constraints and then add triggers/rules for re-directing the data. Maintenance is an even painful task. Imagine if you have to add a new partition. You will end up following below tasks:

1) Add new child table and constraint
2) Modify/Add trigger functions or rules for re-directing proper data in this new partition

Well EnterpriseDB offers an easy way to ADD a partition.

Now you can insert data in the new partition emp_president.

But many a time you may want to attach a partition from an existing table. This is particularly important when you are doing some maintenance activities on the underlying tables or during migration when you have loaded the child tables using bulk loader script and now want to attach them to the main table.

Suppose I have a table called "employees_emp_president_temp" and I want to attach the table as a new partition to employees, given that the structure is same.



So the purpose is to add a new partition called "emp_president" and to add data from "employees_emp_president_temp" to this new partition.



A conventional way is to add a new partition "emp_president" and then insert data from "employees_emp_president_temp".
insert into employees_emp_president select * from employees_emp_president_temp;
But will that be the best way to do it for huge no of records?

Let's use a "magic" function to achieve the task:

Completes in 2 seconds. Wow!!! 
All I have to do is, pass the name of my main table (to which I want to attach a new partition, the name of new partition, the domain of values for the new partition either as VALUES clause of RANGE clause and then name of the table whose data needs to be migrated to the new partition.

Let's see what is the effect of this.

Well we see that there is a new child table. Let's see the structure (I am particularly interested in indexes).

So the table structure of our new partition is same as the old table plus it holds the indexes as well.

Now let's confirm if it copied the data (well 2 seconds was too fast for copying the data wasn't it?).

Now let's just test how much time an insert into..select * from would have taken:

Now you know what you will be doing the next time you need to attach a partition.

I forgot about attach_table_to_partition. It's not an inbuilt function. I wrote it using some features available in EnterpriseDB's PPAS. Here you go:


Monday, May 20, 2013

Using COPY in your JDBC code


COPY operation is a very useful command when using PostgreSQL. It helps you load the data efficiently into your database. But many a times the data needs to be inserted programaticallye.g. When developing an ETL tool, you need to do a bulk load of data in your database after some modification/cleansing/transformation. 

Below is a sample program which can help you load data using the COPY command.

_________________________________________________________________

// Create the table test before running the file: 
//create table public.test(col1 int , col2 int);
//Compile using javac -classpath ./edb-jdbc14.jar pgCopy.java
// Run using java -cp ./edb-jdbc14.jar;. pgCopy

import com.edb.*;
import java.io.FileReader;
import java.io.IOException;
import java.sql.PreparedStatement;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.logging.Level;
import java.util.logging.Logger;
public class pgCopy
{
    public static void main(String[] args)
    {
            Connection con = null;
            PreparedStatement pst = null;
/* These are the insert values
* You can also build them from a list or
* you can accept the entries programatically 
* and then separate them with the delimeter 
* and build a byte array
* | is used a delimiter between two fields
* you can use your own here and replace the same in copyIn call
*/
            byte[
            insert_values="1|10\n2|20\n3|30\n4|40".getBytes();

//These are my connection parameters
            String url = "jdbc:edb://SAMEER:5444/edb";
            String user = "enterprisedb";
            String password = "ashnik";
            CopyIn cpIN=null;
            String driver="com.edb.Driver";
        try{
            Class.forName(driver);
            con = DriverManager.getConnection(
                         url, user, password);

            CopyManager cm = new CopyManager(
                               (BaseConnection) con);
/*Copy command
* Replace public.test(col1, col2) with you table Name and
* replace | with the delimeter of you choice. 
* It should be same as the delimeter used in defining 
* the variable byte[] insert_values
*/
          cpIN= 
            cm.copyIn(
     "COPY public.test(col1, col2) FROM STDIN WITH DELIMITER '|'"
      );
                
          cpIN.writeToCopy(insert_values,0,insert_values.length);
          cpIN.endCopy();
          System.out.println("Below Values are inserted");
          System.out.println(new String(insert_values));
        }
        catch (SQLException ex)
        {
            Logger lgr = 
                      Logger.getLogger(CopyFrom.class.getName());
            lgr.log(Level.SEVERE, ex.getMessage(), ex);
        }
        catch (ClassNotFoundException ex)
        {
            Logger lgr = 
                      Logger.getLogger(CopyFrom.class.getName());
            lgr.log(Level.SEVERE, ex.getMessage(), ex);
        }
        finally
        {
                try
                {
                    if (con!=null)
                        con.close();
                }
                catch (SQLException ex)
                {
                    Logger lgr = 
                      Logger.getLogger(CopyFrom.class.getName());
                    lgr.log(Level.SEVERE, ex.getMessage(), ex);
                }
        }
    }
}
_________________________________________________________________


Let's understand what this code does.

Firstly it opens a new connection and uses it to create a new copy manager, which is similar to creating a new prepared statement object.
con = DriverManager.getConnection(url, user, password);
CopyManager cm = new CopyManager((BaseConnection) con);
Now once we have created a copy manager, we need to use it to start a copy operation:

cpIN= cm.copyIn("COPY public.test(col1, col2) FROM STDIN WITH DELIMITER '|'");

This will create a new copy stream and to put it in simple terms, will return a handler for inputs.

Now we need to provide data to the copy operation (rather to the handler which has been returned).
cpIN.writeToCopy(insert_values,0,insert_values.length);
This will write the data from variable (of byte array) insert values. To end the copy operation we need to say:
cpIN.endCopy();
If you are storing your data in a text/csv file, instead of using a byte variable you can also use a file handler 
cm.copyIn("COPY public.test(col1, col2) 
FROM STDIN WITH DELIMITER '|'",fileRHandle);

Where fileRHandle is a varaible declared for FileReader.
fileRHandle=new FileReader("sample.txt");

You must have noticed that in this format of copyIn the return type is void. There is a similar copyOut method under copyManager which can help you send the data from a table to a file/variable.

fileWHandle=new FileWriter("sample.txt");
 cm.copyOut
("COPY public.test to STDOUT WITH DELIMITER '|'",fileWHandle);
For further exploring the copy jdbc extension refer to online Postgres documentation:
http://jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/CopyManager.html

The driver used by me was enterprisedb driver for PPAS, in case you want to use Postgres driver, you may have to change the JDBC url and JDBC class and compile it with proper jar file.

I hope to soon post some more statistics on improvements one can achieve with Copy instead of Batch mode jdbc insert.