MySQL Cookbook Free Open Book

MySQL Cookbook

Previous Section Next Section

12.12 Comparing a Table to Itself

12.12.1 Problem

You want to compare records in a table to other records in the same table. For example, you want to find all paintings in your collection by the artist who painted "The Potato Eaters." Or you want to know which states listed in the states table joined the Union in the same year as New York. Or you want to know which of the people listed in the profile table have some favorite food in common.

12.12.2 Solution

Problems that require comparing a table to itself involve an operation known as a self-join. It's much like other joins, except that you must always use table aliases so that you can refer to the same table different ways within the query.

12.12.3 Discussion

A special case of joining one table to another occurs when both tables are the same. This is called a self-join. Although many people find the idea confusing or strange to think about at first, it's perfectly legal. Be assured that you'll get used to the concept, and more than likely will find yourself using self-joins quite often because they are so important.

A tip-off that you need a self-join is when you want to know which pairs of elements in a table satisfy some condition. For example, suppose your favorite painting is "The Potato Eaters" and you want to identify all the items in your collection that were done by the artist who painted it. You can do so as follows:

  1. Identify the row in the painting table that contains the title "The Potato Eaters," so that you can refer to its a_id value.

  2. Use the a_id value to match other rows in the table that have the same a_id value.

  3. Display the titles from those matching rows.

The artist ID and painting titles that we begin with look like this:

mysql> SELECT a_id, title FROM painting ORDER BY a_id;
+------+-------------------+
| a_id | title             |
+------+-------------------+
|    1 | The Last Supper   |
|    1 | The Mona Lisa     |
|    3 | Starry Night      |
|    3 | The Potato Eaters |
|    3 | The Rocks         |
|    5 | Les Deux Soeurs   |
+------+-------------------+

A two-step method for picking out the right titles without a join is to look up the artist's ID with one query, then use the ID in a second query that selects records that match it:

mysql> SELECT @id := a_id FROM painting WHERE title = 'The Potato Eaters';
+-------------+
| @id := a_id |
+-------------+
|           3 |
+-------------+
mysql> SELECT title FROM painting WHERE a_id = @id;
+-------------------+
| title             |
+-------------------+
| Starry Night      |
| The Potato Eaters |
| The Rocks         |
+-------------------+

Another solution—one that requires only a single query—is to use a self-join. The trick to this lies in figuring out the proper notation to use. The way many people first try to write a query that joins a table to itself looks something like this:

mysql> SELECT title FROM painting, painting
    -> WHERE title = 'The Potato Eaters' AND a_id = a_id;
ERROR 1066 at line 1: Not unique table/alias: 'painting'

The problem with that query is that the column references are ambiguous. MySQL can't tell which instance of the painting table any given column name refers to. The solution is to give at least one instance of the table an alias so that you can distinguish column references by using different table qualifiers. The following query shows how to do this, using the aliases p1 and p2 to refer to the painting table different ways:

mysql> SELECT p2.title
    -> FROM painting AS p1, painting AS p2
    -> WHERE p1.title = 'The Potato Eaters'
    -> AND p1.a_id = p2.a_id;
+-------------------+
| title             |
+-------------------+
| Starry Night      |
| The Potato Eaters |
| The Rocks         |
+-------------------+

The query output illustrates something typical of self-joins: when you begin with a reference value in one table instance ("The Potato Eaters") to find matching records in a second table instance (paintings by the same artist), the output includes the reference value. That makes sense—after all, the reference matches itself. If you want to find only other paintings by the same artist, explicitly exclude the reference value from the output:

mysql> SELECT p2.title
    -> FROM painting AS p1, painting AS p2
    -> WHERE p1.title = 'The Potato Eaters' AND p2.title != 'The Potato Eaters'
    -> AND p1.a_id = p2.a_id;
+--------------+
| title        |
+--------------+
| Starry Night |
| The Rocks    |
+--------------+

A more general way to exclude the reference value without naming it literally is to specify that you don't want output rows to have the same title as the reference, whatever that title happens to be:

mysql> SELECT p2.title
    -> FROM painting AS p1, painting AS p2
    -> WHERE p1.title = 'The Potato Eaters' AND p1.title != p2.title
    -> AND p1.a_id = p2.a_id;
+--------------+
| title        |
+--------------+
| Starry Night |
| The Rocks    |
+--------------+

The preceding queries use comparisons of ID values to match records in the two table instances, but any kind of value can be used. For example, to use the states table to answer the question "Which states joined the Union in the same year as New York?," perform a temporal pairwise comparison based on the year part of the dates in the table's statehood column:

mysql> SELECT s2.name, s2.statehood
    -> FROM states AS s1, states AS s2
    -> WHERE s1.name = 'New York'
    -> AND YEAR(s1.statehood) = YEAR(s2.statehood)
    -> ORDER BY s2.name;
+----------------+------------+
| name           | statehood  |
+----------------+------------+
| Connecticut    | 1788-01-09 |
| Georgia        | 1788-01-02 |
| Maryland       | 1788-04-28 |
| Massachusetts  | 1788-02-06 |
| New Hampshire  | 1788-06-21 |
| New York       | 1788-07-26 |
| South Carolina | 1788-05-23 |
| Virginia       | 1788-06-25 |
+----------------+------------+

Here again, the reference value (New York) appears in the output. If you want to prevent that, add an expression to the WHERE clause that explicitly excludes the reference:

mysql> SELECT s2.name, s2.statehood
    -> FROM states AS s1, states AS s2
    -> WHERE s1.name = 'New York' AND s1.name != s2.name
    -> AND YEAR(s1.statehood) = YEAR(s2.statehood)
    -> ORDER BY s2.name;
+----------------+------------+
| name           | statehood  |
+----------------+------------+
| Connecticut    | 1788-01-09 |
| Georgia        | 1788-01-02 |
| Maryland       | 1788-04-28 |
| Massachusetts  | 1788-02-06 |
| New Hampshire  | 1788-06-21 |
| South Carolina | 1788-05-23 |
| Virginia       | 1788-06-25 |
+----------------+------------+

Like the problem of finding other paintings by the painter of "The Potato Eaters," the statehood problem could have been solved by using a SQL variable and two queries. That will always be true when you're seeking matches for one particular row in your table. Other problems require finding matches between several pairs of rows, in which case the two-query method will not work. Suppose you want to determine which pairs of people listed in the profile table have favorite foods in common. In this case, the output potentially can include any pair of people in the table. There is no fixed reference value, so you cannot store the reference in a variable.

A self-join is perfect for this problem, although there is the question of how to identify which foods values share common elements. The foods column contains SET values, each of which may indicate multiple foods, so an exact comparison will not work:

  • The comparison is true only if both foods values name an identical set of foods; this is unsuitable if you require only a common element.

  • Two empty values will compare as equal, even though they have no foods in common.

To identify SET values that share common elements, use the fact that MySQL represents them as bit fields and perform the comparison using the & (bitwise AND) operator to look for pairs that have a non-zero intersection:

mysql> SELECT t1.name, t2.name, t1.foods, t2.foods
    -> FROM profile AS t1, profile AS t2
    -> WHERE t1.id != t2.id AND (t1.foods & t2.foods) != 0
    -> ORDER BY t1.name, t2.name;
+------+------+----------------------+----------------------+
| name | name | foods                | foods                |
+------+------+----------------------+----------------------+
| Alan | Brit | curry,fadge          | burrito,curry,pizza  |
| Alan | Fred | curry,fadge          | lutefisk,fadge,pizza |
| Alan | Mara | curry,fadge          | lutefisk,fadge       |
| Alan | Sean | curry,fadge          | burrito,curry        |
| Brit | Alan | burrito,curry,pizza  | curry,fadge          |
| Brit | Carl | burrito,curry,pizza  | eggroll,pizza        |
| Brit | Fred | burrito,curry,pizza  | lutefisk,fadge,pizza |
| Brit | Sean | burrito,curry,pizza  | burrito,curry        |
| Carl | Brit | eggroll,pizza        | burrito,curry,pizza  |
| Carl | Fred | eggroll,pizza        | lutefisk,fadge,pizza |
| Fred | Alan | lutefisk,fadge,pizza | curry,fadge          |
| Fred | Brit | lutefisk,fadge,pizza | burrito,curry,pizza  |
| Fred | Carl | lutefisk,fadge,pizza | eggroll,pizza        |
| Fred | Mara | lutefisk,fadge,pizza | lutefisk,fadge       |
| Mara | Alan | lutefisk,fadge       | curry,fadge          |
| Mara | Fred | lutefisk,fadge       | lutefisk,fadge,pizza |
| Sean | Alan | burrito,curry        | curry,fadge          |
| Sean | Brit | burrito,curry        | burrito,curry,pizza  |
+------+------+----------------------+----------------------+

Some self-join problems are of the "Which values have no match?" variety. An instance of this is the question, "Which message senders in the mail table didn't send any messages to themselves?" First, check who sent mail to who:

mysql> SELECT DISTINCT srcuser, dstuser FROM mail
    -> ORDER BY srcuser, dstuser;
+---------+---------+
| srcuser | dstuser |
+---------+---------+
| barb    | barb    |
| barb    | tricia  |
| gene    | barb    |
| gene    | gene    |
| gene    | tricia  |
| phil    | barb    |
| phil    | phil    |
| phil    | tricia  |
| tricia  | gene    |
| tricia  | phil    |
+---------+---------+

Of those pairs, several are for people that did send mail to themselves:

mysql> SELECT DISTINCT srcuser, dstuser FROM mail
    -> WHERE srcuser = dstuser;
+---------+---------+
| srcuser | dstuser |
+---------+---------+
| phil    | phil    |
| barb    | barb    |
| gene    | gene    |
+---------+---------+

Finding people who didn't send mail to themselves is a "non-match" problem, which is the type of problem that typically involves a LEFT JOIN. In this case, the solution requires a LEFT JOIN of the mail table to itself:

mysql> SELECT DISTINCT m1.srcuser
    -> FROM mail AS m1 LEFT JOIN mail AS m2
    -> ON m1.srcuser = m2.srcuser AND m2.srcuser = m2.dstuser
    -> WHERE m2.dstuser IS NULL;
+---------+
| srcuser |
+---------+
| tricia  |
+---------+

For each record in the mail table, the query selects matches where the sender and recipient are the same. For records having no such match, the LEFT JOIN forces the output to contain a row anyway, with all the m2 columns set to NULL. These rows identify the senders who sent no messages to themselves.

Using a LEFT JOIN to join a table to itself also provides another way to answer maximum-per-group questions of the sort discussed in Recipe 12.7, but without using a secondary temporary table. Recall that in that recipe we found the most expensive painting per artist as follows using a temporary table:

mysql> CREATE TABLE tmp
    -> SELECT a_id, MAX(price) AS max_price FROM painting GROUP BY a_id;
mysql> SELECT artist.name, painting.title, painting.price
    -> FROM artist, painting, tmp
    -> WHERE painting.a_id = tmp.a_id
    -> AND painting.price = tmp.max_price
    -> AND painting.a_id = artist.a_id;
+----------+-------------------+-------+
| name     | title             | price |
+----------+-------------------+-------+
| Da Vinci | The Mona Lisa     |    87 |
| Van Gogh | The Potato Eaters |    67 |
| Renoir   | Les Deux Soeurs   |    64 |
+----------+-------------------+-------+

Another way to identify the paintings and then pull out values from each of those rows is with a LEFT JOIN. The following query identifies the paintings:

mysql> SELECT p1.a_id, p1.title, p1.price
    -> FROM painting p1
    -> LEFT JOIN painting p2
    -> ON p1.a_id = p2.a_id AND p1.price < p2.price
    -> WHERE p2.a_id IS NULL;
+------+-------------------+-------+
| a_id | title             | price |
+------+-------------------+-------+
|    1 | The Mona Lisa     |    87 |
|    3 | The Potato Eaters |    67 |
|    5 | Les Deux Soeurs   |    64 |
+------+-------------------+-------+

To display the artist names, join the result with the artist table:

mysql> SELECT artist.name, p1.title, p1.price
    -> FROM (painting p1
    -> LEFT JOIN painting p2
    -> ON p1.a_id = p2.a_id AND p1.price < p2.price), artist
    -> WHERE p2.a_id IS NULL AND p1.a_id = artist.a_id;
+----------+-------------------+-------+
| name     | title             | price |
+----------+-------------------+-------+
| Da Vinci | The Mona Lisa     |    87 |
| Van Gogh | The Potato Eaters |    67 |
| Renoir   | Les Deux Soeurs   |    64 |
+----------+-------------------+-------+

Note that a given "compare a table to itself" problem does not necessarily require a self-join, even if it's possible to solve it that way. The mail table serves to illustrate this. One way to determine which senders sent themselves a message is to use a self-join:

mysql> SELECT DISTINCT m1.srcuser, m2.dstuser
    -> FROM mail AS m1, mail AS m2
    -> WHERE m1.srcuser = m2.srcuser AND m2.dstuser = m1.srcuser;
+---------+---------+
| srcuser | dstuser |
+---------+---------+
| phil    | phil    |
| barb    | barb    |
| gene    | gene    |
+---------+---------+

But that's silly. The query doesn't need to compare records to each other. It needs only to compare different columns within each row, so a non-join query is sufficient, and simpler to write:

mysql> SELECT DISTINCT srcuser, dstuser FROM mail
    -> WHERE srcuser = dstuser;
+---------+---------+
| srcuser | dstuser |
+---------+---------+
| phil    | phil    |
| barb    | barb    |
| gene    | gene    |
+---------+---------+
    Previous Section Next Section
    Index: [SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Y][Z]


         Main Menu
    Main Page
    Table of content
    Copyright
    Preface
    Chapter 1. Using the mysql Client Program
    Chapter 2. Writing MySQL-Based Programs
    Chapter 3. Record Selection Techniques
    Chapter 4. Working with Strings
    Chapter 5. Working with Dates and Times
    Chapter 6. Sorting Query Results
    Chapter 7. Generating Summaries
    Chapter 8. Modifying Tables with ALTER TABLE
    Chapter 9. Obtaining and Using Metadata
    Chapter 10. Importing and Exporting Data
    Chapter 11. Generating and Using Sequences
    Chapter 12. Using Multiple Tables
    12.1 Introduction
    12.2 Combining Rows in One Table with Rows in Another
    12.3 Performing a Join Between Tables in Different Databases
    12.4 Referring to Join Output Column Names in Programs
    12.5 Finding Rows in One Table That Match Rows in Another
    12.6 Finding Rows with No Match in Another Table
    12.7 Finding Rows Containing Per-Group Minimum or Maximum Values
    12.8 Computing Team Standings
    12.9 Producing Master-Detail Lists and Summaries
    12.10 Using a Join to Fill in Holes in a List
    12.11 Enumerating a Many-to-Many Relationship
    12.12 Comparing a Table to Itself
    12.13 Calculating Differences Between Successive Rows
    12.14 Finding Cumulative Sums and Running Averages
    12.15 Using a Join to Control Query Output Order
    12.16 Converting Subselects to Join Operations
    12.17 Selecting Records in Parallel from Multiple Tables
    12.18 Inserting Records in One Table That Include Values from Another
    12.19 Updating One Table Based on Values in Another
    12.20 Using a Join to Create a Lookup Table from Descriptive Labels
    12.21 Deleting Related Rows in Multiple Tables
    12.22 Identifying and Removing Unattached Records
    12.23 Using Different MySQL Servers Simultaneously
    Chapter 13. Statistical Techniques
    Chapter 14. Handling Duplicates
    Chapter 15. Performing Transactions
    Chapter 16. Introduction to MySQL on the Web
    Chapter 17. Incorporating Query Resultsinto Web Pages
    Chapter 18. Processing Web Input with MySQL
    Chapter 19. Using MySQL-Based Web Session Management
    Appendix A. Obtaining MySQL Software
    Appendix B. JSP and Tomcat Primer
    Appendix C. References
    Colophone
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele