MySQL Cookbook Free Open Book

MySQL Cookbook

Previous Section Next Section

13.4 Generating Frequency Distributions

13.4.1 Problem

You want to know the frequency of occurrence for each value in a table.

13.4.2 Solution

Derive a frequency distribution that summarizes the contents of your dataset.

13.4.3 Discussion

A common application for per-group summary techniques is to generate a breakdown of the number of times each value occurs. This is called a frequency distribution. For the testscore table, the frequency distribution looks like this:

mysql> SELECT score, COUNT(score) AS occurrence
    -> FROM testscore GROUP BY score;
+-------+------------+
| score | occurrence |
+-------+------------+
|     4 |          2 |
|     5 |          1 |
|     6 |          4 |
|     7 |          4 |
|     8 |          2 |
|     9 |          5 |
|    10 |          2 |
+-------+------------+

If you express the results in terms of percentages rather than as counts, you produce a relative frequency distribution. To break down a set of observations and show each count as a percentage of the total, use one query to get the total number of observations, and another to calculate the percentages for each group:

mysql> SELECT @n := COUNT(score) FROM testscore;
mysql> SELECT score, (COUNT(score)*100)/@n AS percent
    -> FROM testscore GROUP BY score;
+-------+---------+
| score | percent |
+-------+---------+
|     4 |      10 |
|     5 |       5 |
|     6 |      20 |
|     7 |      20 |
|     8 |      10 |
|     9 |      25 |
|    10 |      10 |
+-------+---------+

The distributions just shown summarize the number of values for individual scores. However, if the dataset contains a large number of distinct values and you want a distribution that shows only a small number of categories, you may wish to lump values into categories and produce a count for each category. "Lumping" techniques are discussed in Recipe 7.13.

One typical use of frequency distributions is to export the results for use in a graphing program. In the absence of such a program, you can use MySQL itself to generate a simple ASCII chart as a visual representation of the distribution. For example, to display an ASCII bar chart of the test score counts, convert the counts to strings of * characters:

mysql> SELECT score, REPEAT('*',COUNT(score)) AS occurrences
    -> FROM testscore GROUP BY score;
+-------+-------------+
| score | occurrences |
+-------+-------------+
|     4 | **          |
|     5 | *           |
|     6 | ****        |
|     7 | ****        |
|     8 | **          |
|     9 | *****       |
|    10 | **          |
+-------+-------------+

To chart the relative frequency distribution instead, use the percentage values:

mysql> SELECT @n := COUNT(score) FROM testscore;
mysql> SELECT score, REPEAT('*',(COUNT(score)*100)/@n) AS percent
    -> FROM testscore GROUP BY score;
+-------+---------------------------+
| score | percent                   |
+-------+---------------------------+
|     4 | **********                |
|     5 | *****                     |
|     6 | ********************      |
|     7 | ********************      |
|     8 | **********                |
|     9 | ************************* |
|    10 | **********                |
+-------+---------------------------+

The ASCII chart method is fairly crude, obviously, but it's a quick way to get a picture of the distribution of observations, and it requires no other tools.

If you generate a frequency distribution for a range of categories where some of the categories are not represented in your observations, the missing categories will not appear in the output. To force each category to be displayed, use a reference table and a LEFT JOIN (a technique discussed in Recipe 12.10). For the testscore table, the possible scores range from 0 to 10, so a reference table should contain each of those values:

mysql> CREATE TABLE ref (score INT);
mysql> INSERT INTO ref (score)
    -> VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10);

Then join the reference table to the test scores to generate the frequency distribution:

mysql> SELECT ref.score, COUNT(testscore.score) AS occurrences
    -> FROM ref LEFT JOIN testscore ON ref.score = testscore.score
    -> GROUP BY ref.score;
+-------+-------------+
| score | occurrences |
+-------+-------------+
|     0 |           0 |
|     1 |           0 |
|     2 |           0 |
|     3 |           0 |
|     4 |           2 |
|     5 |           1 |
|     6 |           4 |
|     7 |           4 |
|     8 |           2 |
|     9 |           5 |
|    10 |           2 |
+-------+-------------+

This distribution includes rows for scores 0 through 3, none of which appear in the frequency distribution shown earlier.

The same principle applies to relative frequency distributions:

mysql> SELECT @n := COUNT(score) FROM testscore;
mysql> SELECT ref.score, (COUNT(testscore.score)*100)/@n AS percent
    -> FROM ref LEFT JOIN testscore ON ref.score = testscore.score
    -> GROUP BY ref.score;
+-------+---------+
| score | percent |
+-------+---------+
|     0 |       0 |
|     1 |       0 |
|     2 |       0 |
|     3 |       0 |
|     4 |      10 |
|     5 |       5 |
|     6 |      20 |
|     7 |      20 |
|     8 |      10 |
|     9 |      25 |
|    10 |      10 |
+-------+---------+
    Previous Section Next Section
    Index: [SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Y][Z]


         Main Menu
    Main Page
    Table of content
    Copyright
    Preface
    Chapter 1. Using the mysql Client Program
    Chapter 2. Writing MySQL-Based Programs
    Chapter 3. Record Selection Techniques
    Chapter 4. Working with Strings
    Chapter 5. Working with Dates and Times
    Chapter 6. Sorting Query Results
    Chapter 7. Generating Summaries
    Chapter 8. Modifying Tables with ALTER TABLE
    Chapter 9. Obtaining and Using Metadata
    Chapter 10. Importing and Exporting Data
    Chapter 11. Generating and Using Sequences
    Chapter 12. Using Multiple Tables
    Chapter 13. Statistical Techniques
    13.1 Introduction
    13.2 Calculating Descriptive Statistics
    13.3 Per-Group Descriptive Statistics
    13.4 Generating Frequency Distributions
    13.5 Counting Missing Values
    13.6 Calculating Linear Regressions or Correlation Coefficients
    13.7 Generating Random Numbers
    13.8 Randomizing a Set of Rows
    13.9 Selecting Random Items from a Set of Rows
    13.10 Assigning Ranks
    Chapter 14. Handling Duplicates
    Chapter 15. Performing Transactions
    Chapter 16. Introduction to MySQL on the Web
    Chapter 17. Incorporating Query Resultsinto Web Pages
    Chapter 18. Processing Web Input with MySQL
    Chapter 19. Using MySQL-Based Web Session Management
    Appendix A. Obtaining MySQL Software
    Appendix B. JSP and Tomcat Primer
    Appendix C. References
    Colophone
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele