About Me

My photo
I am currently working as a DB Consultant with Ashnik. We are into Open Source and Cloud based enterprise class solutions. Prior to this I was working as a DBA @ Misys India (Banking Services). Working on IBM DB2, Oracle Microsoft SQL Server. Selected as IBM Champion for Data Management 2011. I like bit of photography, cycling, poetry and I just love gadgets!!

Monday, May 20, 2013

Using COPY in your JDBC code


COPY operation is a very useful command when using PostgreSQL. It helps you load the data efficiently into your database. But many a times the data needs to be inserted programaticallye.g. When developing an ETL tool, you need to do a bulk load of data in your database after some modification/cleansing/transformation. 

Below is a sample program which can help you load data using the COPY command.

_________________________________________________________________

// Create the table test before running the file: 
//create table public.test(col1 int , col2 int);
//Compile using javac -classpath ./edb-jdbc14.jar pgCopy.java
// Run using java -cp ./edb-jdbc14.jar;. pgCopy

import com.edb.*;
import java.io.FileReader;
import java.io.IOException;
import java.sql.PreparedStatement;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.logging.Level;
import java.util.logging.Logger;
public class pgCopy
{
    public static void main(String[] args)
    {
            Connection con = null;
            PreparedStatement pst = null;
/* These are the insert values
* You can also build them from a list or
* you can accept the entries programatically 
* and then separate them with the delimeter 
* and build a byte array
* | is used a delimiter between two fields
* you can use your own here and replace the same in copyIn call
*/
            byte[
            insert_values="1|10\n2|20\n3|30\n4|40".getBytes();

//These are my connection parameters
            String url = "jdbc:edb://SAMEER:5444/edb";
            String user = "enterprisedb";
            String password = "ashnik";
            CopyIn cpIN=null;
            String driver="com.edb.Driver";
        try{
            Class.forName(driver);
            con = DriverManager.getConnection(
                         url, user, password);

            CopyManager cm = new CopyManager(
                               (BaseConnection) con);
/*Copy command
* Replace public.test(col1, col2) with you table Name and
* replace | with the delimeter of you choice. 
* It should be same as the delimeter used in defining 
* the variable byte[] insert_values
*/
          cpIN= 
            cm.copyIn(
     "COPY public.test(col1, col2) FROM STDIN WITH DELIMITER '|'"
      );
                
          cpIN.writeToCopy(insert_values,0,insert_values.length);
          cpIN.endCopy();
          System.out.println("Below Values are inserted");
          System.out.println(new String(insert_values));
        }
        catch (SQLException ex)
        {
            Logger lgr = 
                      Logger.getLogger(CopyFrom.class.getName());
            lgr.log(Level.SEVERE, ex.getMessage(), ex);
        }
        catch (ClassNotFoundException ex)
        {
            Logger lgr = 
                      Logger.getLogger(CopyFrom.class.getName());
            lgr.log(Level.SEVERE, ex.getMessage(), ex);
        }
        finally
        {
                try
                {
                    if (con!=null)
                        con.close();
                }
                catch (SQLException ex)
                {
                    Logger lgr = 
                      Logger.getLogger(CopyFrom.class.getName());
                    lgr.log(Level.SEVERE, ex.getMessage(), ex);
                }
        }
    }
}
_________________________________________________________________


Let's understand what this code does.

Firstly it opens a new connection and uses it to create a new copy manager, which is similar to creating a new prepared statement object.
con = DriverManager.getConnection(url, user, password);
CopyManager cm = new CopyManager((BaseConnection) con);
Now once we have created a copy manager, we need to use it to start a copy operation:

cpIN= cm.copyIn("COPY public.test(col1, col2) FROM STDIN WITH DELIMITER '|'");

This will create a new copy stream and to put it in simple terms, will return a handler for inputs.

Now we need to provide data to the copy operation (rather to the handler which has been returned).
cpIN.writeToCopy(insert_values,0,insert_values.length);
This will write the data from variable (of byte array) insert values. To end the copy operation we need to say:
cpIN.endCopy();
If you are storing your data in a text/csv file, instead of using a byte variable you can also use a file handler 
cm.copyIn("COPY public.test(col1, col2) 
FROM STDIN WITH DELIMITER '|'",fileRHandle);

Where fileRHandle is a varaible declared for FileReader.
fileRHandle=new FileReader("sample.txt");

You must have noticed that in this format of copyIn the return type is void. There is a similar copyOut method under copyManager which can help you send the data from a table to a file/variable.

fileWHandle=new FileWriter("sample.txt");
 cm.copyOut
("COPY public.test to STDOUT WITH DELIMITER '|'",fileWHandle);
For further exploring the copy jdbc extension refer to online Postgres documentation:
http://jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/CopyManager.html

The driver used by me was enterprisedb driver for PPAS, in case you want to use Postgres driver, you may have to change the JDBC url and JDBC class and compile it with proper jar file.

I hope to soon post some more statistics on improvements one can achieve with Copy instead of Batch mode jdbc insert.

1 comment:

  1. Have a look at https://github.com/eclecticlogic/pedal-dialect. It allows you to use the COPY command directly with JPA Entities and supports a number of thread-pool/data sources.

    ReplyDelete