Blog

SQL : Using Joins To Insert, Update and Delete Data

04 Dec 2009 | 0 Comments | Permalink

For the uninitiated using JOINs in SQL to perform anything other than SELECTs probably seems like some kind of scary voodoo but the reasoning behind the ability is the same: It allows you to filter the data you are working with. This article walks through a made up scenario (similar to a real life one i've been tasked with, no less!) and shows how all three modification types are done with simple JOINS. This is a fairly long article so make sure you have a fresh coffee to hand before you dive in!

The Scenario

Let's say your employer is a widget manufacturer and they source the screws, nuts, bolts and washers from various suppliers. In fact they might get identical parts from different suppliers depending on availability. However their purchase history database was never set up to allow this and the only way of discriminating where a part came from is by looking at it's part number within the order. Let's start with three tables, a Parts table an Orders table and an OrderItems link table:

PartID PartName
------ --------
1      Screw
2      Nut
3      Bolt

OrderID OrderDate
------- ----------
1       2009-12-01
2       2009-12-02
3       2009-12-03

ItemID OrderID PartID PartNo
------ ------- ------ -------
1      1       1      BOB1234
2      1       2      BOB5678
3      2       1      898-122
4      2       2      820-111
5      2       3      133-022
6      3       2      22/0010
7      3       3      12/0666

This setup worked fine for a while, but performing any kind of analysis was a night on impossible task, with hardcoded LIKE statments all over the place. once the Items table had reached a million rows and another couple of suppliers were coming on board - meaning recoding all the views and queries, it was apparent something had to change. We had to create a Suppliers table, give each a unique CHAR(5) ID and then split the Parts table up so that each supplier had their own entry for each part they supplied.

SupplierID SupplierName     Pattern BasePartID
---------- ---------------- ------- ----------
BOBPT      Bob's Parts      BOB%             0
JIMMY      Jimmy's Emporium ___-___       1000
DIRCT      Direct Fixings   __/____       2000

At this point was decided to denormalise the Parts table - rather than have a seperate link table between parts and suppliers - and have a different row for each part and supplier combo, each with a different PartID, (sigh). Anyway, the Supplier Pattern column allows us to match part numbers in the Items table to each supplier using a LIKE and the BasePartID is used to split the PartIDs up. Both of these are only used during the spilt and in real life were manually stuffed into a secondary temp table.

The Procedure

First we need to get an authoratitive list of PartIDs along with the values to convert them to, we're going to precomupte these and store them in a temp table, rather than include this particular JOIN in all the later queries.

  CREATE TABLE #Map
  (
    OldPartID INTEGER,
    NewPartID INTEGER,
    SupplierID CHAR(5),
    PartNo VARCHAR(8)
  )
  
  INSERT INTO #Map( OldPartID, NewPartID, SupplierID, PartNo )
  SELECT DISTINCT 
    ITM.PartID, ITM.PartID + SUP.BasePartID, SUP.SupplierID, ITM.PartNo
    FROM OrderItems AS ITM
      INNER JOIN Suppliers AS SUP
        ON ITM.PartNo LIKE SUP.Pattern

Notice the tasty JOIN...LIKE, this results in a table that matches each Part to each supplier, along with a new PartID. SELECT * FROM #Map yields:

OldPartID NewPartID SupplierID PartNo
--------- --------- ---------- --------
1            1      BOBPT      BOB1234
2            2      BOBPT      BOB5678
1         1001      JIMMY      898-122
2         1002      JIMMY      820-111
3         1003      JIMMY      133-022
2         2002      DIRCT      22/0010
3         2003      DIRCT      12/0666

UPDATE with JOIN

Now we have a mapping we can go ahead and re-set all the PartIDs in the OrderItems table, we JOIN it to our #Map table and pick out the NewPartIDs to update each to.

  UPDATE ITM
    SET PartID = MAP.NewPartID
    FROM
      OrderItems AS ITM
        INNER JOIN #Map AS MAP
          ON ITM.PartNo = MAP.PartNo

There, that was easy wasnt it. The key part of the UPDATE...JOIN - and this is the same for the DELETE...JOIN - is that the table being modified is referenced not by name but by the alias as specified in the join, here it's my 3-char alias ITM. You need to do it this way because you can occasionally find yourself updating a table using a join back to itself!

As I stated earlier, you could skip the temp table and do this directly, but as we use the same mapping for each of the next set of operations the temp table really does make life easier. Though this operation could easily have been done without it:

  UPDATE ITM
    SET PartID = PartID + SUP.BasePartID
    FROM
      OrderItems AS ITM
        INNER JOIN Suppliers AS SUP
          ON ITM.PartNo LIKE SUP.Pattern

My only caveat here is that i get nervous when doing UPDATEs with direct arithmetic in, I like to write scripts (Transaction or no) where pressing the button to execute more than once will not royally bodge things up, a direct UPDATE to a particular value cures that and also helps when you're doing large operations in batches. This is the new OrderItems table:

ItemID OrderID PartID PartNo
------ ------- ------ -------
1      1       1      BOB1234
2      1       2      BOB5678
3      2       1001   898-122
4      2       1002   820-111
5      2       1003   133-022
6      3       2002   22/0010
7      3       2003   12/0666

DELETE with JOIN

So now we can map each ordered part to each supplier, first we retool the Parts table to change all the id's around, this is an INSERT...JOIN. We'd need to disable the PartID autoincrement during this, our solution was to SELECT INTO a new table, INSERT into that and then use sp_rename to replace the old table with the modified one. Here's the INSERT:

  ALTER TABLE Parts
    ADD SupplierID CHAR(5)
    
  INSERT INTO Parts( PartID , SupplierID , PartName )
    SELECT MAP.NewPartID , MAP.SupplierID , PRT.PartName
    FROM Parts AS PRT
      INNER JOIN #Map AS MAP
        ON PRT.PartID = MAP.OldPartID
    WHERE MAP.NewPartID > 1000

INSERT has no native JOIN syntax but the above is close enough, it creates new entries for all the parts each supplier stocks - or at least, has supplied to us in the history of the database. The > 1000 check is because we are not remapping BOB's parts, in real life there were so many BOB rows (around 75% of the total) that the procedure killed the server and it was decided to leave BOB's IDs intact. What this means, however is that there are still entries in the parts table with no SupplierID and these need updating to belong to BOB. That's a simple UPDATE...WHERE SupplierID=NULL, first though we delete all the parts that BOB does not supply:

  DELETE FROM PRT
  FROM Parts AS PRT
    LEFT OUTER JOIN #Map AS MAP
      ON PRT.PartID = MAP.OldPartID
      AND MAP.SupplierID='BOBPT'
    WHERE MAP.NewPartID IS NULL
      AND PRT.PartID < 1000      
      
  UPDATE Parts SET SupplierID='BOBPT' WHERE SupplierID IS NULL

You could probably achieve the same with a NOT IN() but the outer join is much more fun. Like UPDATE...JOIN you reference the alias in the DELETE FROM section, then there's another FROM to signal the start of the JOIN. The LEFT OUTER preserves entries in Parts (on the LEFT of the join) that are not in #Map (on the RIGHT of the join). The NULL check is the final filter of what to delete. Finally we need the PRT.PartID < 1000 check so that we dont DELETE all our newly created parts for JIMMY and DIRCT. To break this down we can do a simple SELECT with the same JOIN:

  SELECT * FROM Parts AS PRT
    LEFT OUTER JOIN #Map AS MAP
      ON PRT.PartID = MAP.OldPartID
      AND MAP.SupplierID='BOBPT'
    WHERE PRT.PartID < 1000  

Returns:

PartID SupplierID PartName OldPartID NewPartID SupplierID PartNo
------ ---------- -------- --------- --------- ---------- -------
1      NULL       Screw    1         1         BOBPT      BOB1234
2      NULL       Nut      1         2         BOBPT      BOB5678
3      NULL       Bolt     NULL      NULL      NULL       NULL

So we can see that filtering these WHERE NewPartID IS NULL will simply return the entry for Bolt. Here is the final Parts table:

PartID SupplierID PartName
------ ---------- --------
1      BOBPT      Screw
2      BOBPT      Nut
1001   JIMMY      Screw
1002   JIMMY      Nut
1003   JIMMY      Bolt
2002   DIRCT      Nut
2003   DIRCT      Bolt

At this point you'd add a couple of CHECK constraints to make sure that each Orders and Parts row contains only valid SupplierIDs and PartIDs. You could also - making the likely assumption that an order only ever involves one supplier at a time - UPDATE the Orders table to contain a SupplierID but we'll leave that as an excersise for the reader :-)

Recent Posts