Blog
SQL : Using Joins To Insert, Update and Delete Data
04 Dec 2009 | 0 Comments | Permalink
For the uninitiated using JOINs in SQL to perform anything other than SELECTs probably seems like some kind of scary voodoo but the reasoning behind the ability is the same: It allows you to filter the data you are working with. This article walks through a made up scenario (similar to a real life one i've been tasked with, no less!) and shows how all three modification types are done with simple JOINS. This is a fairly long article so make sure you have a fresh coffee to hand before you dive in!
The Scenario
Let's say your employer is a widget manufacturer and they source the screws, nuts, bolts and washers from various suppliers. In fact they might get identical parts from different suppliers depending on availability. However their purchase history database was never set up to allow this and the only way of discriminating where a part came from is by looking at it's part number within the order. Let's start with three tables, a Parts table an Orders table and an OrderItems link table:
PartID PartName ------ -------- 1 Screw 2 Nut 3 Bolt OrderID OrderDate ------- ---------- 1 2009-12-01 2 2009-12-02 3 2009-12-03 ItemID OrderID PartID PartNo ------ ------- ------ ------- 1 1 1 BOB1234 2 1 2 BOB5678 3 2 1 898-122 4 2 2 820-111 5 2 3 133-022 6 3 2 22/0010 7 3 3 12/0666
This setup worked fine for a while, but performing any kind of analysis was a night on impossible task, with hardcoded LIKE statments all over the place. once the Items table had reached a million rows and another couple of suppliers were coming on board - meaning recoding all the views and queries, it was apparent something had to change. We had to create a Suppliers table, give each a unique CHAR(5) ID and then split the Parts table up so that each supplier had their own entry for each part they supplied.
SupplierID SupplierName Pattern BasePartID ---------- ---------------- ------- ---------- BOBPT Bob's Parts BOB% 0 JIMMY Jimmy's Emporium ___-___ 1000 DIRCT Direct Fixings __/____ 2000
At this point was decided to denormalise the Parts table - rather than have a seperate link table between parts and suppliers - and have a different row for each part and supplier combo, each with a different PartID, (sigh). Anyway, the Supplier Pattern column allows us to match part numbers in the Items table to each supplier using a LIKE and the BasePartID is used to split the PartIDs up. Both of these are only used during the spilt and in real life were manually stuffed into a secondary temp table.
The Procedure
First we need to get an authoratitive list of PartIDs along with the values to convert them to, we're going to precomupte these and store them in a temp table, rather than include this particular JOIN in all the later queries.
CREATE TABLE #Map
(
OldPartID INTEGER,
NewPartID INTEGER,
SupplierID CHAR(5),
PartNo VARCHAR(8)
)
INSERT INTO #Map( OldPartID, NewPartID, SupplierID, PartNo )
SELECT DISTINCT
ITM.PartID, ITM.PartID + SUP.BasePartID, SUP.SupplierID, ITM.PartNo
FROM OrderItems AS ITM
INNER JOIN Suppliers AS SUP
ON ITM.PartNo LIKE SUP.Pattern
Notice the tasty JOIN...LIKE, this results in a table that matches each Part to each supplier, along with a new PartID. SELECT * FROM #Map yields:
OldPartID NewPartID SupplierID PartNo --------- --------- ---------- -------- 1 1 BOBPT BOB1234 2 2 BOBPT BOB5678 1 1001 JIMMY 898-122 2 1002 JIMMY 820-111 3 1003 JIMMY 133-022 2 2002 DIRCT 22/0010 3 2003 DIRCT 12/0666
UPDATE with JOIN
Now we have a mapping we can go ahead and re-set all the PartIDs in the OrderItems table, we JOIN it to our #Map table and pick out the NewPartIDs to update each to.
UPDATE ITM
SET PartID = MAP.NewPartID
FROM
OrderItems AS ITM
INNER JOIN #Map AS MAP
ON ITM.PartNo = MAP.PartNo
There, that was easy wasnt it. The key part of the UPDATE...JOIN - and this is the same for the DELETE...JOIN - is that the table being modified is referenced not by name but by the alias as specified in the join, here it's my 3-char alias ITM. You need to do it this way because you can occasionally find yourself updating a table using a join back to itself!
As I stated earlier, you could skip the temp table and do this directly, but as we use the same mapping for each of the next set of operations the temp table really does make life easier. Though this operation could easily have been done without it:
UPDATE ITM
SET PartID = PartID + SUP.BasePartID
FROM
OrderItems AS ITM
INNER JOIN Suppliers AS SUP
ON ITM.PartNo LIKE SUP.Pattern
My only caveat here is that i get nervous when doing UPDATEs with direct arithmetic in, I like to write scripts (Transaction or no) where pressing the button to execute more than once will not royally bodge things up, a direct UPDATE to a particular value cures that and also helps when you're doing large operations in batches. This is the new OrderItems table:
ItemID OrderID PartID PartNo ------ ------- ------ ------- 1 1 1 BOB1234 2 1 2 BOB5678 3 2 1001 898-122 4 2 1002 820-111 5 2 1003 133-022 6 3 2002 22/0010 7 3 2003 12/0666
DELETE with JOIN
So now we can map each ordered part to each supplier, first we retool the Parts table to change all the id's around, this is an INSERT...JOIN. We'd need to disable the PartID autoincrement during this, our solution was to SELECT INTO a new table, INSERT into that and then use sp_rename to replace the old table with the modified one. Here's the INSERT:
ALTER TABLE Parts
ADD SupplierID CHAR(5)
INSERT INTO Parts( PartID , SupplierID , PartName )
SELECT MAP.NewPartID , MAP.SupplierID , PRT.PartName
FROM Parts AS PRT
INNER JOIN #Map AS MAP
ON PRT.PartID = MAP.OldPartID
WHERE MAP.NewPartID > 1000
INSERT has no native JOIN syntax but the above is close enough, it creates new entries for all the parts each supplier stocks - or at least, has supplied to us in the history of the database. The > 1000 check is because we are not remapping BOB's parts, in real life there were so many BOB rows (around 75% of the total) that the procedure killed the server and it was decided to leave BOB's IDs intact. What this means, however is that there are still entries in the parts table with no SupplierID and these need updating to belong to BOB. That's a simple UPDATE...WHERE SupplierID=NULL, first though we delete all the parts that BOB does not supply:
DELETE FROM PRT
FROM Parts AS PRT
LEFT OUTER JOIN #Map AS MAP
ON PRT.PartID = MAP.OldPartID
AND MAP.SupplierID='BOBPT'
WHERE MAP.NewPartID IS NULL
AND PRT.PartID < 1000
UPDATE Parts SET SupplierID='BOBPT' WHERE SupplierID IS NULL
You could probably achieve the same with a NOT IN() but the outer join is much more fun. Like UPDATE...JOIN you reference the alias in the DELETE FROM section, then there's another FROM to signal the start of the JOIN. The LEFT OUTER preserves entries in Parts (on the LEFT of the join) that are not in #Map (on the RIGHT of the join). The NULL check is the final filter of what to delete. Finally we need the PRT.PartID < 1000 check so that we dont DELETE all our newly created parts for JIMMY and DIRCT. To break this down we can do a simple SELECT with the same JOIN:
SELECT * FROM Parts AS PRT
LEFT OUTER JOIN #Map AS MAP
ON PRT.PartID = MAP.OldPartID
AND MAP.SupplierID='BOBPT'
WHERE PRT.PartID < 1000
Returns:
PartID SupplierID PartName OldPartID NewPartID SupplierID PartNo ------ ---------- -------- --------- --------- ---------- ------- 1 NULL Screw 1 1 BOBPT BOB1234 2 NULL Nut 1 2 BOBPT BOB5678 3 NULL Bolt NULL NULL NULL NULL
So we can see that filtering these WHERE NewPartID IS NULL will simply return the entry for Bolt. Here is the final Parts table:
PartID SupplierID PartName ------ ---------- -------- 1 BOBPT Screw 2 BOBPT Nut 1001 JIMMY Screw 1002 JIMMY Nut 1003 JIMMY Bolt 2002 DIRCT Nut 2003 DIRCT Bolt
At this point you'd add a couple of CHECK constraints to make sure that each Orders and Parts row contains only valid SupplierIDs and PartIDs. You could also - making the likely assumption that an order only ever involves one supplier at a time - UPDATE the Orders table to contain a SupplierID but we'll leave that as an excersise for the reader :-)