MySQL Cookbook Free Open Book

MySQL Cookbook

Previous Section Next Section

12.22 Identifying and Removing Unattached Records

12.22.1 Problem

You have tables that are related (for example, they have a master-detail relationship). But you suspect that some of the records are unattached and can be removed.

12.22.2 Solution

Use a LEFT JOIN to identify unmatched values and delete them by adapting the techniques shown in Recipe 12.21. Or use a table-replacement procedure that selects the matched records into a new table and replaces the original table with it.

12.22.3 Discussion

The previous section shows how to delete related records from multiple tables at once, using the relationship that exists between the tables. Sometimes the opposite problem presents itself, where you want to delete records based on the lack of relationship. Problems of this kind typically occur when you have tables that are supposed to match up, but some of the records are unattached—that is, they are unmatched by any corresponding record in the other table.

This can occur by accident, such as when you delete a parent record but forget to delete the associated child records, or vice versa. It can also occur as an anticipated consequence of a deliberate action. Suppose an online discussion board uses a parent table that lists discussion topics and a child table that records the articles posted for each topic. If you purge the child table of old article records, that may result in any given topic record in the parent table no longer having any children. If so, the lack of recent postings for the topic indicates that it is probably dead and that the parent record in the topic table can be deleted, too. In such a situation, you delete a set of child records with the explicit recognition that the operation may strand parent records and cause them to become eligible for being deleted as well.

However you arrive at the point where related tables have unmatched records, restoring the tables to a consistent state is a matter of identifying the unattached records and then deleting them:

  • To identify the unattached records, use a LEFT JOIN, because this is a "find unmatched records" problem. (See Recipe 12.6 for information about LEFT JOIN.)

  • To delete the records that have the unmatched IDs, use techniques similar to those shown in Recipe 12.21, for removing records from multiple related tables.

The examples here use the swdist_head and swdist_item software distribution tables that were used in Recipe 12.21. Create the tables in their initial state using the swdist_create.sql script in the joins directory of the recipes distribution. They'll look like this:

mysql> SELECT * FROM swdist_head;
+---------+------------+---------+------------+
| dist_id | name       | ver_num | rel_date   |
+---------+------------+---------+------------+
|       1 | DB Gadgets |    1.59 | 1996-03-25 |
|       2 | NetGizmo   |    3.02 | 1998-11-10 |
|       3 | DB Gadgets |    1.60 | 1998-12-26 |
|       4 | DB Gadgets |    1.61 | 1998-12-28 |
|       5 | NetGizmo   |    4.00 | 2001-08-04 |
+---------+------------+---------+------------+
mysql> SELECT * FROM swdist_item;
+---------+----------------+
| dist_id | dist_file      |
+---------+----------------+
|       1 | README         |
|       1 | db-gadgets.sh  |
|       3 | README         |
|       3 | README.linux   |
|       3 | db-gadgets.sh  |
|       4 | README         |
|       4 | README.linux   |
|       4 | README.solaris |
|       4 | db-gadgets.sh  |
|       2 | README.txt     |
|       2 | NetGizmo.exe   |
|       5 | README.txt     |
|       5 | NetGizmo.exe   |
+---------+----------------+

The records in the tables are fully matched at this point: For every dist_id value in the parent table, there is at least one child record, and each child record has a parent. To "damage" the integrity of this relationship for purposes of illustration, remove a few records from each table:

mysql> DELETE FROM swdist_head WHERE dist_id IN (1,4);
mysql> DELETE FROM swdist_item WHERE dist_id IN (2,5);

The result is that there are unattached records in both tables:

mysql> SELECT * FROM swdist_head;
+---------+------------+---------+------------+
| dist_id | name       | ver_num | rel_date   |
+---------+------------+---------+------------+
|       2 | NetGizmo   |    3.02 | 1998-11-10 |
|       3 | DB Gadgets |    1.60 | 1998-12-26 |
|       5 | NetGizmo   |    4.00 | 2001-08-04 |
+---------+------------+---------+------------+
mysql> SELECT * FROM swdist_item;
+---------+----------------+
| dist_id | dist_file      |
+---------+----------------+
|       1 | README         |
|       1 | db-gadgets.sh  |
|       3 | README         |
|       3 | README.linux   |
|       3 | db-gadgets.sh  |
|       4 | README         |
|       4 | README.linux   |
|       4 | README.solaris |
|       4 | db-gadgets.sh  |
+---------+----------------+

A little inspection reveals that only distribution 3 has records in both tables. Distributions 2 and 5 in the swdist_head table are unmatched by any records in the swdist_item table. Conversely, distributions 1 and 4 in the swdist_item table are unmatched by any records in the swdist_head table.

The problem now is to identify the unattached records (by some means other than visual inspection), and then remove them. Identification is a matter of using a LEFT JOIN. For example, to find childless parent records in the swdist_head table, use the following query:

mysql> SELECT swdist_head.dist_id AS 'unmatched swdist_head IDs'
    -> FROM swdist_head LEFT JOIN swdist_item
    -> ON swdist_head.dist_id = swdist_item.dist_id
    -> WHERE swdist_item.dist_id IS NULL;
+---------------------------+
| unmatched swdist_head IDs |
+---------------------------+
|                         2 |
|                         5 |
+---------------------------+

Conversely, to find the IDs for orphaned children in the swdist_item table that have no parent, reverse the roles of the two tables:

mysql> SELECT swdist_item.dist_id AS 'unmatched swdist_item IDs'
    -> FROM swdist_item LEFT JOIN swdist_head
    -> ON swdist_item.dist_id = swdist_head.dist_id
    -> WHERE swdist_head.dist_id IS NULL;
+---------------------------+
| unmatched swdist_item IDs |
+---------------------------+
|                         1 |
|                         1 |
|                         4 |
|                         4 |
|                         4 |
|                         4 |
+---------------------------+

Note that in this case, an ID will appear more than once in the list if there are multiple children for a missing parent. Depending on how you choose to delete the unmatched records, you may want to use DISTINCT to select each unmatched child ID only once:

mysql> SELECT DISTINCT swdist_item.dist_id AS 'unmatched swdist_item IDs'
    -> FROM swdist_item LEFT JOIN swdist_head
    -> ON swdist_item.dist_id = swdist_head.dist_id
    -> WHERE swdist_head.dist_id IS NULL;
+---------------------------+
| unmatched swdist_item IDs |
+---------------------------+
|                         1 |
|                         4 |
+---------------------------+

After you identify the unattached records, the question becomes how to get rid of them. You can use either of the following techniques, which you'll recognize as similar to those discussed in Recipe 12.21:

  • Use the IDs in a multiple-table DELETE statement. You'll be removing records from just one table at a time, but the syntax for this form of DELETE is still useful because it allows you to identify the records to remove by means of a join between the related tables.

  • Run a program that selects the unmatched IDs and uses them to generate DELETE statements.

To use a multiple-table DELETE statement for removing unmatched records, just take the SELECT statement that you use to identify those records and replace the stuff leading up to the FROM keyword with DELETE tbl_name. For example, the SELECT that identifies childless parents looks like this:

SELECT swdist_head.dist_id AS 'unmatched swdist_head IDs'
FROM swdist_head LEFT JOIN swdist_item
    ON swdist_head.dist_id = swdist_item.dist_id
WHERE swdist_item.dist_id IS NULL;

The corresponding DELETE looks like this:

DELETE swdist_head
FROM swdist_head LEFT JOIN swdist_item
    ON swdist_head.dist_id = swdist_item.dist_id
WHERE swdist_item.dist_id IS NULL;

Conversely, the query to identify parentless children is as follows:

SELECT swdist_item.dist_id AS 'unmatched swdist_item IDs'
FROM swdist_item LEFT JOIN swdist_head
    ON swdist_item.dist_id = swdist_head.dist_id
WHERE swdist_head.dist_id IS NULL;

And the corresponding DELETE statement removes them:

DELETE swdist_item
FROM swdist_item LEFT JOIN swdist_head
    ON swdist_item.dist_id = swdist_head.dist_id
WHERE swdist_head.dist_id IS NULL;

To remove unmatched records by writing a program, select the ID list and turn it into a set of DELETE statements. Here's a Perl program that does so, first for the parent table and then for the child table:

#! /usr/bin/perl -w
use strict;
use lib qw(/usr/local/apache/lib/perl);
use Cookbook;

my $dbh = Cookbook::connect ( );

# Identify the IDs of childless parent records

my $ref = $dbh->selectcol_arrayref (
            "SELECT swdist_head.dist_id
            FROM swdist_head LEFT JOIN swdist_item
                ON swdist_head.dist_id = swdist_item.dist_id
            WHERE swdist_item.dist_id IS NULL");

# selectcol_arrayref( ) returns a reference to a list.  Convert the reference
# to a list, which will be empty if $ref is undef or points to an empty list.

my @val = ($ref ? @{$ref} : ( ));

# Use the ID list to delete records for all IDs at once.  If the list
# is empty, don't bother; there's nothing to delete.

if (@val)
{
    # generate list of comma-separated "?" placeholders, one per value
    my $where = "WHERE dist_id IN (" . join (",", ("?") x @val) . ")";
    $dbh->do ("DELETE FROM swdist_head $where", undef, @val);
}

# Repeat the procedure for the child table.  Use SELECT DISTINCT so that
# each ID is selected only once.

$ref = $dbh->selectcol_arrayref (
            "SELECT DISTINCT swdist_item.dist_id
            FROM swdist_item LEFT JOIN swdist_head
                ON swdist_item.dist_id = swdist_head.dist_id
            WHERE swdist_head.dist_id IS NULL");

@val = ($ref ? @{$ref} : ( ));

if (@val)
{
    # generate list of comma-separated "?" placeholders, one per value
    my $where = "WHERE dist_id IN (" . join (",", ("?") x @val) . ")";
    $dbh->do ("DELETE FROM swdist_item $where", undef, @val);
}

$dbh->disconnect ( );

exit (0);

The program uses IN( ) to delete all the affected records in a given table at once. See Recipe 12.21 for other related approaches.

You can also use mysql to generate the DELETE statements; a script that shows how to do this can be found in the joins directory of the recipes distribution.

A different type of solution to the problem is to use a table-replacement procedure. This method comes at the problem in reverse. Instead of finding and removing unmatched records, find and keep matched records. For example, you can use a join to select matched records into a new table. Then replace the original table with it. Unattached records don't get carried along by the join, and so in effect are removed when the new table replaces the original one.

The table replacement procedure works as follows. For the swdist_head table, create a new table with the same structure:

CREATE TABLE tmp
(
    dist_id     INT UNSIGNED NOT NULL AUTO_INCREMENT,   # distribution ID
    name        VARCHAR(40),                            # distribution name
    ver_num     NUMERIC(5,2),                           # version number
    rel_date    DATE NOT NULL,                          # release date
    PRIMARY KEY (dist_id)
);

Then select into the tmp table those swdist_head records that have a match in the swdist_item table:

INSERT IGNORE INTO tmp
SELECT swdist_head.*
FROM swdist_head, swdist_item
WHERE swdist_head.dist_id = swdist_item.dist_id;

Note that the query uses INSERT IGNORE; a parent record may be matched by multiple child records, but we want only one instance of its ID. (The symptom of failing to use IGNORE is that the query will fail with a "duplicate key" error.)

Finish by replacing the original table with the new one:

DROP TABLE swdist_head;
ALTER TABLE tmp RENAME TO swdist_head;

The procedure for replacing the child table with a table containing only matched child records is similar, except that IGNORE is not needed—each child that is matched will be matched by only one parent:

CREATE TABLE tmp
(
    dist_id     INT UNSIGNED NOT NULL,  # parent distribution ID
    dist_file   VARCHAR(255) NOT NULL   # name of file in distribution
);

INSERT INTO tmp
SELECT swdist_item.*
FROM swdist_head, swdist_item
WHERE swdist_head.dist_id = swdist_item.dist_id;

DROP TABLE swdist_item;
ALTER TABLE tmp RENAME TO swdist_item;
    Previous Section Next Section
    Index: [SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Y][Z]


         Main Menu
    Main Page
    Table of content
    Copyright
    Preface
    Chapter 1. Using the mysql Client Program
    Chapter 2. Writing MySQL-Based Programs
    Chapter 3. Record Selection Techniques
    Chapter 4. Working with Strings
    Chapter 5. Working with Dates and Times
    Chapter 6. Sorting Query Results
    Chapter 7. Generating Summaries
    Chapter 8. Modifying Tables with ALTER TABLE
    Chapter 9. Obtaining and Using Metadata
    Chapter 10. Importing and Exporting Data
    Chapter 11. Generating and Using Sequences
    Chapter 12. Using Multiple Tables
    12.1 Introduction
    12.2 Combining Rows in One Table with Rows in Another
    12.3 Performing a Join Between Tables in Different Databases
    12.4 Referring to Join Output Column Names in Programs
    12.5 Finding Rows in One Table That Match Rows in Another
    12.6 Finding Rows with No Match in Another Table
    12.7 Finding Rows Containing Per-Group Minimum or Maximum Values
    12.8 Computing Team Standings
    12.9 Producing Master-Detail Lists and Summaries
    12.10 Using a Join to Fill in Holes in a List
    12.11 Enumerating a Many-to-Many Relationship
    12.12 Comparing a Table to Itself
    12.13 Calculating Differences Between Successive Rows
    12.14 Finding Cumulative Sums and Running Averages
    12.15 Using a Join to Control Query Output Order
    12.16 Converting Subselects to Join Operations
    12.17 Selecting Records in Parallel from Multiple Tables
    12.18 Inserting Records in One Table That Include Values from Another
    12.19 Updating One Table Based on Values in Another
    12.20 Using a Join to Create a Lookup Table from Descriptive Labels
    12.21 Deleting Related Rows in Multiple Tables
    12.22 Identifying and Removing Unattached Records
    12.23 Using Different MySQL Servers Simultaneously
    Chapter 13. Statistical Techniques
    Chapter 14. Handling Duplicates
    Chapter 15. Performing Transactions
    Chapter 16. Introduction to MySQL on the Web
    Chapter 17. Incorporating Query Resultsinto Web Pages
    Chapter 18. Processing Web Input with MySQL
    Chapter 19. Using MySQL-Based Web Session Management
    Appendix A. Obtaining MySQL Software
    Appendix B. JSP and Tomcat Primer
    Appendix C. References
    Colophone
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele