I am taking a deviation from the series of posts I am doing for Recovery Concepts and Implementation. I have already posted two in that series. This one has to come, cause I was recently discussing with some developers and they were having tough time dealing with updates to data based on/using data from some other source table. I realized, people generally do not know all the capabilities of an Update statement. The form they know is:
Because they are not aware of the much more flexible syntax they end up exercising all the flexibility they get with expression and boolean expression. We will see an example where a complicated expression is used to achieve something which is fairly simple. Before I rush you into any technical details, let me explain the table structures to be used in this example.
Now suppose I want to update all those rows, in update_target which has a matching ID (column_id) in update_source. For this simple table (where I have to update only one column) the statement may be as simple as below.
This could be a bit more complex, with multiple select statements (one for each column to be updated). Now let's see what the plan looks like for this simple case:
Keep in mind that the complexity of the plan would had been double if there were two columns to be updated. So let me add an additional column here:
Do you notice an additional Subplan in the plan below?
That's a difference of 22%. So let's fire the query and see its effect.
But what has happened to those rows in update_target which were not able to find a match in update_source? Well those would have got themselves updated to null. Let's confirm
Actually the above UPDATE statement is supposed to update everything at update_target by fetching data from update_source. If something is not found then it is unknown i.e. NULL. So we need to restrict the rows which needs to be updated, with a filter which says "update only those rows for which a match exists in update_source". So the query will look something like this:
Let's see how this one performs.
The plan would be bit simpler if we had to update just one column.
The difference in response time is of around 12.5%, just because of the additional column.Let's check if this update goes as per our expectation.Very few people might know that in PostgreSQL the actual syntax for Update command is as below:The highlighted part of this syntax is partcularly important. It helps you update records beased on join. So for our requirement a much better and cleaner code is as below:update update_target t1 set columne_to_update=t2.source_for_update , t1.column_2_for_update=t2.column_2_source from update_source t2 where t1.column_id=t2.column_id; or update update_target t1 set (columne_to_update,t1.column_2_for_update)=(t2.source_for_update ,t2.column_2_source) from update_source t2 where t1.column_id=t2.column_id;The plan for above option isLet's see how much of an overhead was caused cause of multiple columns. We test it by generating a plan for update of only one column.Surprisingly there is hardly any difference in response time and cost as well has not sufferred much. So I can be assured that even if I have to update a third column, the plan will remain more or less the same.There are some RDBMS implementations which has a dedicated statement (MERGE) for achieving UPSERT (Update/Insert based on some criteria), via that they support something similar to what I have explained here.Now, I know there will be people talking and commenting about lack of UPSERT in PostgreSQL. Well, I don't quite agree to that. Do you know you can return the details from an UPDATE query and PostgreSQL v9.1 (and onwards) allows you to capitalize that returned in data to insert/delete records. So, it makes it a bit more powerful than the MERGE query. Below is an example.If you remember there were three rows which did not exist at update_target but were present in update_source (if not then just scroll up)-- Using RETURNING to achieve an UPSERT/MERGEI hope now before writing a complex subquery to update your data, you will first try to explore the possibility of using FROM clause. Well I will expect happy users of PostgreSQL to stop complaining about absence of UPSERT feature.PS: The flexibility explained above is not just limited to UPDATE, it is also available to you when you are doing the deletion. There is a USING clause in DELETE statement.