MySQL Cookbook Free Open Book

MySQL Cookbook

Previous Section Next Section

13.8 Randomizing a Set of Rows

13.8.1 Problem

You want to randomize a set of rows or values.

13.8.2 Solution

Use ORDER BY RAND( ).

13.8.3 Discussion

MySQL's RAND( ) function can be used to randomize the order in which a query returns its rows. Somewhat paradoxically, this randomization is achieved by adding an ORDER BY clause to the query. The technique is roughly equivalent to a spreadsheet randomization method. Suppose you have a set of values in a spreadsheet that looks like this:

Patrick
Penelope
Pertinax
Polly

To place these in random order, first add another column that contains randomly chosen numbers:

Patrick   .73
Penelope  .37
Pertinax  .16
Polly     .48

Then sort the rows according to the values of the random numbers:

Pertinax  .16
Penelope  .37
Polly     .48
Patrick   .73

At this point, the original values have been placed in random order, because the effect of sorting the random numbers is to randomize the values associated with them. To re-randomize the values, choose another set of random numbers and sort the rows again.

In MySQL, a similar effect is achieved by associating a set of random numbers with a query result and sorting the result by those numbers. For MySQL 3.23.2 and up, this is done with an ORDER BY RAND( ) clause:

mysql> SELECT name FROM t ORDER BY RAND( );
+----------+
| name     |
+----------+
| Pertinax |
| Penelope |
| Patrick  |
| Polly    |
+----------+
mysql> SELECT name FROM t ORDER BY RAND( );
+----------+
| name     |
+----------+
| Patrick  |
| Pertinax |
| Penelope |
| Polly    |
+----------+ 

How Random Is RAND( )?

Does the RAND( ) function generate evenly distributed numbers? Check it out for yourself with the following Python script, rand_test.py, from the stats directory of the recipes distribution. It uses RAND( ) to generate random numbers and constructs a frequency distribution from them, using .1-sized categories. This provides a means of assessing how evenly distributed the values are.

#! /usr/bin/python
# rand_test.pl - create a frequency distribution of RAND( ) values.
# This provides a test of the randomness of RAND( ).
# Method is to draw random numbers in the range from 0 to 1.0,
# and count how many of them occur in .1-sized intervals (0 up
# to .1, .1 up to .2, ..., .9 up *through* 1.0).
import sys
sys.path.insert (0, "/usr/local/apache/lib/python")
import MySQLdb
import Cookbook
npicks = 1000       # number of times to pick a number
bucket = [0] * 10
conn = Cookbook.connect ( )
cursor = conn.cursor ( )
for i in range (0, npicks):
    cursor.execute ("SELECT RAND( )")
    (val,) = cursor.fetchone ( )
    slot = int (val * 10)
    if slot > 9:
        slot = 9        # put 1.0 in last slot
    bucket[slot] = bucket[slot] + 1
cursor.close ( )
conn.close ( )
# Print the resulting frequency distribution
for slot in range (0, 9):
    print "%2d  %d" % (slot+1, bucket[slot])
sys.exit (0)

The stats directory also contains equivalent scripts in other languages.

For versions of MySQL older than 3.23.2, ORDER BY clauses cannot refer to expressions, so you cannot use RAND( ) there (see Recipe 6.4). As a workaround, add a column of random numbers to the column output list, alias it, and refer to the alias for sorting:

mysql> SELECT name, name*0+RAND( ) AS rand_num FROM t ORDER BY rand_num;
+----------+-------------------+
| name     | rand_num          |
+----------+-------------------+
| Penelope | 0.372227413926485 |
| Patrick  | 0.431537678867148 |
| Pertinax | 0.566524063764628 |
| Polly    | 0.715938107777329 |
+----------+-------------------+

Note that the expression for the random number column is name*0+RAND( ), not just RAND( ). If you try using the latter, the pre-3.23 MySQL optimizer notices that the column contains only a function, assumes that the function returns a constant value for each row, and optimizes the corresponding ORDER BY clause out of existence. As a result, no sorting is done. The workaround is to fool the optimizer by adding extra factors to the expression that don't change its value, but make the column look like a non-constant. The query just shown illustrates one easy way to do this: Take any column name, multiply it by zero, and add the result to RAND( ). Granted, it may seem a little strange to use name in a mathematical expression, because that column's values aren't numeric. That doesn't matter; MySQL sees the * multiplication operator and performs a string-to-number conversion of the name values before the multiply operation. The important thing is that the result of the multiplication is zero, which means that name*0+RAND( ) has the same value as RAND( ).

Applications for randomizing a set of rows include any scenario that uses selection without replacement (choosing each item from a set of items, until there are no more items left). Some examples of this are:

  • Determining the starting order for participants in an event. List the participants in a table and select them in random order.

  • Assigning starting lanes or gates to participants in a race. List the lanes in a table and select a random lane order.

  • Choosing the order in which to present a set of quiz questions.

  • Shuffling a deck of cards. Represent each card by a row in a table and shuffle the deck by selecting the rows in random order. Deal them one by one until the deck is exhausted.

To use the last example as an illustration, let's implement a card deck shuffling algorithm. Shuffling and dealing cards is randomization plus selection without replacement: each card is dealt once before any is dealt twice; when the deck is used up, it is reshuffled to re-randomize it for a new dealing order. Within a program, this task can be performed with MySQL using a table deck that has 52 rows, assuming a set of cards with each combination of 13 face values and 4 suits:

  • Select the entire table and store it into an array.

  • Each time a card is needed, take the next element from the array.

  • When the array is exhausted, all the cards have been dealt. "Reshuffle" the table to generate a new card order.

Setting up the deck table is a tedious task if you insert the 52 card records by writing out all the INSERT statements manually. The deck contents can be generated more easily in combinatorial fashion within a program by generating each pairing of face value with suit. Here's some PHP code that creates a deck table with face and suit columns, then populates the table using nested loops to generate the pairings for the INSERT statements:

mysql_query ("
        CREATE TABLE deck
        (
            face    ENUM('A', 'K', 'Q', 'J', '10', '9', '8',
                        '7', '6', '5', '4', '3', '2') NOT NULL,
            suit    ENUM('hearts', 'diamonds', 'clubs', 'spades') NOT NULL
        )", $conn_id)
    or die ("Cannot issue CREATE TABLE statement\n");

$face_array = array ("A", "K", "Q", "J", "10", "9", "8",
                    "7", "6", "5", "4", "3", "2");
$suit_array = array ("hearts", "diamonds", "clubs", "spades");

# insert a "card" into the deck for each combination of suit and face

reset ($face_array);
while (list ($index, $face) = each ($face_array))
{
    reset ($suit_array);
    while (list ($index2, $suit) = each ($suit_array))
    {
        mysql_query ("INSERT INTO deck (face,suit) VALUES('$face','$suit')",
                    $conn_id)
            or die ("Cannot insert card into deck\n");
    }
}

Shuffling the cards is a matter of issuing this statement:

SELECT face, suit FROM deck ORDER BY RAND( );

To do that and store the results in an array within a script, write a shuffle_deck( ) function that issues the query and returns the resulting values in an array (again shown in PHP):

function shuffle_deck ($conn_id)
{
    $query = "SELECT face, suit FROM deck ORDER BY RAND( )";
    $result_id = mysql_query ($query, $conn_id)
        or die ("Cannot retrieve cards from deck\n");
    $card = array ( );
    while ($obj = mysql_fetch_object ($result_id))
        $card[ ] = $obj;     # add card record to end of $card array
    mysql_free_result ($result_id);
    return ($card);
}

Deal the cards by keeping a counter that ranges from 0 to 51 to indicate which card to select. When the counter reaches 52, the deck is exhausted and should be shuffled again.

    Previous Section Next Section
    Index: [SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Y][Z]


         Main Menu
    Main Page
    Table of content
    Copyright
    Preface
    Chapter 1. Using the mysql Client Program
    Chapter 2. Writing MySQL-Based Programs
    Chapter 3. Record Selection Techniques
    Chapter 4. Working with Strings
    Chapter 5. Working with Dates and Times
    Chapter 6. Sorting Query Results
    Chapter 7. Generating Summaries
    Chapter 8. Modifying Tables with ALTER TABLE
    Chapter 9. Obtaining and Using Metadata
    Chapter 10. Importing and Exporting Data
    Chapter 11. Generating and Using Sequences
    Chapter 12. Using Multiple Tables
    Chapter 13. Statistical Techniques
    13.1 Introduction
    13.2 Calculating Descriptive Statistics
    13.3 Per-Group Descriptive Statistics
    13.4 Generating Frequency Distributions
    13.5 Counting Missing Values
    13.6 Calculating Linear Regressions or Correlation Coefficients
    13.7 Generating Random Numbers
    13.8 Randomizing a Set of Rows
    13.9 Selecting Random Items from a Set of Rows
    13.10 Assigning Ranks
    Chapter 14. Handling Duplicates
    Chapter 15. Performing Transactions
    Chapter 16. Introduction to MySQL on the Web
    Chapter 17. Incorporating Query Resultsinto Web Pages
    Chapter 18. Processing Web Input with MySQL
    Chapter 19. Using MySQL-Based Web Session Management
    Appendix A. Obtaining MySQL Software
    Appendix B. JSP and Tomcat Primer
    Appendix C. References
    Colophone
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele