MySQL Cookbook Free Open Book

MySQL Cookbook

Previous Section Next Section

12.16 Converting Subselects to Join Operations

12.16.1 Problem

You want to use a query that involves a subselect, but MySQL will not support subselects until Version 4.1.

12.16.2 Solution

In many cases, you can rewrite a subselect as a join. Or you can write a program that simulates the subselect. Or you can even make mysql generate SQL statements that simulate it.

12.16.3 Discussion

Assume you have two tables, t1 and t2 that have the following contents:

mysql> SELECT col1 FROM t1;
+------+
| col1 |
+------+
| a    |
| b    |
| c    |
+------+
mysql> SELECT col2 FROM t2;
+------+
| col2 |
+------+
| b    |
| c    |
| d    |
+------+

Now suppose that you want to find values in t1 that are also present in t2, or values in t1 that are not present in t2. These kinds of questions sometimes are answered using subselect queries that nest one SELECT inside another, but MySQL won't have subselects until Version 4.1. This section shows how to work around that problem.

The following query shows an IN( ) subselect that produces the rows in table t1 having col1 values that match col2 values in table t2:

SELECT col1 FROM t1 WHERE col1 IN (SELECT col2 FROM t2);

That's essentially just a "find matching rows" query, and it can be rewritten as a simple join like this:

mysql> SELECT t1.col1 FROM t1, t2 WHERE t1.col1 = t2.col2;
+------+
| col1 |
+------+
| b    |
| c    |
+------+

The converse question (rows in t1 that have no match in t2) can be answered using a NOT IN( ) subselect:

SELECT col1 FROM t1 WHERE col1 NOT IN (SELECT col2 FROM t2);

That's a "find non-matching rows" query. Sometimes these can be rewritten as a LEFT JOIN, a type of join discussed in Recipe 12.6. For the case at hand, the NOT IN( ) subselect is equivalent to the following LEFT JOIN:

mysql> SELECT t1.col1 FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col2
    -> WHERE t2.col2 IS NULL;
+------+
| col1 |
+------+
| a    |
+------+

Within a program, you can simulate a subselect by working with the results of two queries. Suppose you want to simulate the IN( ) subselect that finds matching values in the two tables:

SELECT * FROM t1 WHERE col1 IN (SELECT col2 FROM t2);

If you expect that the inner SELECT will return a reasonably small number of col2 values, one way to achieve the same result as the subselect is to retrieve those values and generate an IN( ) clause that looks for them in col1. For example, the query SELECT col2 FROM t2 will produce the values b, c, and d. Using this result, you can select matching col1 values by generating a query that looks like this:

SELECT col1 FROM t1 WHERE col1 IN ('b','c','d')

That can be done as follows (shown here using Python):

cursor = conn.cursor ( )
cursor.execute ("SELECT col2 FROM t2")
if cursor.rowcount > 0:     # do nothing if there are no values
    val = [ ]                # list to hold data values
    s = ""                  # string to hold placeholders
    # construct %s,%s,%s, ... string containing placeholders
    for (col2,) in cursor.fetchall ( ):  # pull col2 value from each row
        if s != "":
            s = s + ","     # separate placeholders by commas
        s = s + "%s"        # add placeholder
        val.append (col2)   # add value to list of values
    stmt = "SELECT col1 FROM t1 WHERE col1 IN (" + s + ")"
    cursor.execute (stmt, val)
    for (col1,) in cursor.fetchall ( ):  # pull col1 values from final result
        print col1
cursor.close ( )

If you expect lots of col2 values, you may want to generate individual SELECT queries for each of them instead:

SELECT col1 FROM t1 WHERE col1 = 'b'
SELECT col1 FROM t1 WHERE col1 = 'c'
SELECT col1 FROM t1 WHERE col1 = 'd'

This can be done within a program as follows:

cursor = conn.cursor ( )
cursor2 = conn.cursor ( )
cursor.execute ("SELECT col2 FROM t2")
for (col2,) in cursor.fetchall ( ):  # pull col2 value from each row
    stmt = "SELECT col1 FROM t1 WHERE col1 = %s"
    cursor2.execute ("SELECT col1 FROM t1 WHERE col1 = %s", (col2,))
    for (col1,) in cursor2.fetchall ( ): # pull col1 values from final result
        print col1
cursor.close ( )
cursor2.close ( )

If you have so many col2 values that you don't want to construct a single huge IN( ) clause, but don't want to issue zillions of individual SELECT statements, either, another option is to combine the approaches. Break the set of col2 values into smaller groups and use each group to construct an IN( ) clause. This gives you a set of shorter queries that each look for several values:

SELECT col1 FROM t1 WHERE col1 IN (first group of col2 values)
SELECT col1 FROM t1 WHERE col1 IN (second group of col2 values)
SELECT col1 FROM t1 WHERE col1 IN (second group of col2 values)
...

This approach can be implemented as follows:

grp_size = 1000             # number of IDs to select at once
cursor = conn.cursor ( )
cursor.execute ("SELECT col2 FROM t2")
if cursor.rowcount > 0:     # do nothing if there are no values
    col2 = [ ]               # list to hold data values
    for (val,) in cursor.fetchall ( ):   # pull col2 value from each row
        col2.append (val)
    nvals = len (col2)
    i = 0
    while i < nvals:
        if nvals < i + grp_size:
            j = nvals
        else:
            j = i + grp_size
        group = col2[i : j]
        s = ""                  # string to hold placeholders
        val_list = [ ]
        # construct %s,%s,%s, ... string containing placeholders
        for val in group:
            if s != "":
                s = s + ","     # separate placeholders by commas
            s = s + "%s"        # add placeholder
            val_list.append (val)   # add value to list of values
        stmt = "SELECT col1 FROM t1 WHERE col1 IN (" + s + ")"
        print stmt
        cursor.execute (stmt, val_list)
        for (col1,) in cursor.fetchall ( ):  # pull col1 values from result
            print col1
        i = i + grp_size        # go to next group of values
cursor.close ( )

Simulating a NOT IN( ) subselect from within a program is a bit trickier than simulating an IN( ) subselect. The subselect looks like this:

SELECT col1 FROM t1 WHERE col1 NOT IN (SELECT col2 FROM t2);

The technique shown here works best for smaller numbers of col1 and col2 values, because you must hold at least the values returned by the inner SELECT in memory, so that you can compare them to the value returned by the outer SELECT. The example shown here holds both sets in memory. First, retrieve the col1 and col2 values:

cursor = conn.cursor ( )
cursor.execute ("SELECT col1 FROM t1")
col1 = [ ]
for (val, ) in cursor.fetchall ( ):
    col1.append (val)
cursor.execute ("SELECT col2 FROM t2")
col2 = [ ]
for (val, ) in cursor.fetchall ( ):
    col2.append (val)
cursor.close ( )

Then check each col1 value to see whether or not it's present in the set of col2 values. If not, it satisfies the NOT IN( ) constraint of the subselect:

for val1 in col1:
    present = 0
    for val2 in col2:
        if val1 == val2:
            present = 1
            break
    if not present:
        print val1

The code shown here performs a lookup in the col2 values by looping through the array that holds them. You may be able to perform this operation more efficiently by using an associative data structure. For example, in Perl or Python, you could put the col2 values in a hash or dictionary. Recipe 10.29 shows an example that uses that approach.

Yet another way to simulate subselects, at least those of the IN( ) variety, is to generate the necessary SQL from within one instance of mysql and feed it to another instance to be executed. Consider the result from this query:

mysql> SELECT CONCAT('SELECT col1 FROM t1 WHERE col1 = \'', col2, '\';')
    -> FROM t2;
+------------------------------------------------------------+
| CONCAT('SELECT col1 FROM t1 WHERE col1 = \'', col2, '\';') |
+------------------------------------------------------------+
| SELECT col1 FROM t1 WHERE col1 = 'b';                      |
| SELECT col1 FROM t1 WHERE col1 = 'c';                      |
| SELECT col1 FROM t1 WHERE col1 = 'd';                      |
+------------------------------------------------------------+

The query retrieves the col2 values from t2 and uses them to produce a set of SELECT statements that find matching col1 values in t1. If you issue that query in batch mode and suppress the column heading, mysql produces only the text of the SQL statements, not all the other fluff. You can feed that output into another instance of mysql to execute the queries. The result is the same as the subselect. Here's one way to carry out this procedure, assuming that you have the SELECT statement containing the CONCAT( ) expression stored in a file named make_select.sql:

% mysql -N cookbook < make_select.sql > tmp

Here mysql includes the -N option to suppress column headers so that they won't get written to the output file, tmp. The contents of tmp will look like this:

SELECT col1 FROM t1 WHERE col1 = 'b';
SELECT col1 FROM t1 WHERE col1 = 'c';
SELECT col1 FROM t1 WHERE col1 = 'd';

To execute the queries in that file and generate the output for the simulated subselect, use this command:

% mysql -N cookbook < tmp
b
c

This second instance of mysql also includes the -N option, because otherwise the output will include a header row for each of the SELECT statements that it executes. (Try omitting -N and see what happens.)

One significant limitation of using mysql to generate SQL statements is that it doesn't work well if your col2 values contain quotes or other special characters. In that case, the queries that this method generates would be malformed.[2]

[2] As we go to press, a QUOTE( ) function has been added to MySQL 4.0.3 that allows special characters to be escaped so that they are suitable for use in SQL statements.

    Previous Section Next Section
    Index: [SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Y][Z]


         Main Menu
    Main Page
    Table of content
    Copyright
    Preface
    Chapter 1. Using the mysql Client Program
    Chapter 2. Writing MySQL-Based Programs
    Chapter 3. Record Selection Techniques
    Chapter 4. Working with Strings
    Chapter 5. Working with Dates and Times
    Chapter 6. Sorting Query Results
    Chapter 7. Generating Summaries
    Chapter 8. Modifying Tables with ALTER TABLE
    Chapter 9. Obtaining and Using Metadata
    Chapter 10. Importing and Exporting Data
    Chapter 11. Generating and Using Sequences
    Chapter 12. Using Multiple Tables
    12.1 Introduction
    12.2 Combining Rows in One Table with Rows in Another
    12.3 Performing a Join Between Tables in Different Databases
    12.4 Referring to Join Output Column Names in Programs
    12.5 Finding Rows in One Table That Match Rows in Another
    12.6 Finding Rows with No Match in Another Table
    12.7 Finding Rows Containing Per-Group Minimum or Maximum Values
    12.8 Computing Team Standings
    12.9 Producing Master-Detail Lists and Summaries
    12.10 Using a Join to Fill in Holes in a List
    12.11 Enumerating a Many-to-Many Relationship
    12.12 Comparing a Table to Itself
    12.13 Calculating Differences Between Successive Rows
    12.14 Finding Cumulative Sums and Running Averages
    12.15 Using a Join to Control Query Output Order
    12.16 Converting Subselects to Join Operations
    12.17 Selecting Records in Parallel from Multiple Tables
    12.18 Inserting Records in One Table That Include Values from Another
    12.19 Updating One Table Based on Values in Another
    12.20 Using a Join to Create a Lookup Table from Descriptive Labels
    12.21 Deleting Related Rows in Multiple Tables
    12.22 Identifying and Removing Unattached Records
    12.23 Using Different MySQL Servers Simultaneously
    Chapter 13. Statistical Techniques
    Chapter 14. Handling Duplicates
    Chapter 15. Performing Transactions
    Chapter 16. Introduction to MySQL on the Web
    Chapter 17. Incorporating Query Resultsinto Web Pages
    Chapter 18. Processing Web Input with MySQL
    Chapter 19. Using MySQL-Based Web Session Management
    Appendix A. Obtaining MySQL Software
    Appendix B. JSP and Tomcat Primer
    Appendix C. References
    Colophone
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele