MySQL Cookbook Free Open Book

MySQL Cookbook

Previous Section Next Section

10.36 Dealing with NULL Values

10.36.1 Problem

You're not sure how to represent NULL values in a datafile.

10.36.2 Solution

Try to use a value not otherwise present, so that you can distinguish NULL from all other legitimate non-NULL values.

10.36.3 Discussion

There's no particular standard for representing NULL values in datafiles, which makes them a bit of a problem for import and export operations. Some of the difficulty arises from the fact that NULL indicates the lack of a value, and something that's not there is not easy to represent literally in a datafile. Using an empty column value is the most obvious thing to do, but that's ambiguous for string-valued columns because there is no way to distinguish a NULL represented that way from a true empty string. Empty values can be a problem for other column types as well. For example, if you load an empty value with LOAD DATA into a numeric column, it gets stored as 0 rather than as NULL, and thus becomes indistinguishable from a true 0 in the input.

The usual strategy for dealing with this problem is to represent NULL using a value that doesn't otherwise occur in the data. This is how LOAD DATA and mysqlimport handle the issue, with \N as the value that is understood by convention to mean NULL. Based on that fact, it's sometimes helpful to convert empty fields in a datafile to \N so that LOAD DATA will interpret them as NULL. It's easy to write a script that does this:

#! /usr/bin/perl -w
# empty_to_null.pl - Convert empty input fields to \N.

# \N is the MySQL LOAD DATA convention for NULL.  Running a file
# through this script and loading the result causes NULL to be loaded
# rather than empty strings.

# Assumes tab-delimited, linefeed-terminated input lines.

use strict;

while (<>)
{
    chomp;
    my @val = split (/\t/, $_, 10000);  # split, preserving all fields
    # map empty fields to \N, write as tab-delimited output line
    print join ("\t", map { /^$/ ? "\\N" : $_ } @val) . "\n";
}

exit (0);

You might use the script like this:

% empty_to_null.pl mytbl.txt > mytbl.txt2
% mysqlimport --local cookbook mytbl.txt2

Loading a file that has been run through the empty_to_null.pl script often can produce better results for columns that allow NULL values. This is shown by the following table, which compares the values that result when you use LOAD DATA or mysqlimport to load either an empty string or \N (NULL) into various column types when those columns allow NULL values:

Column type

Result of loading empty string

Result of loading \N

CHAR

Empty string

NULL

INT

0

NULL

DATE

0000-00-00

NULL

But what happens if you load \N rather than an empty string into columns that are defined as NOT NULL? As it happens, it doesn't make any difference either way, as shown in the next table:

Column type

Result of loading empty string

Result of loading \N

CHAR

Empty string

Empty string

INT

0

0

DATE

0000-00-00

0000-00-00

This means that it's not worth it to write a smarter version of empty_to_null.pl that looks at the structure of the table you want to load the data into and converts empty strings to \N only for columns that allow NULL values.

On the other hand, one reason that it is worth writing a smarter script is that you might want to interpret values other than empty values as signifying NULL, and you might have different conventions in different columns. Consider the following datafile, has_nulls.txt:

str1    13
str2    0
Unknown 15
Unknown 0

The first column contains strings, and Unknown signifies NULL. The second column contains integers, and 0 signifies NULL. What to do? To handle that kind of file, the transfer directory of the recipes distribution contains a to_null.pl script. It provides options allowing you to specify both which columns to look in and what value to look for:

--columns= column_list

Convert values only in the named columns. By default, to_null.pl looks in all columns. If this option is given, column_list should be a list of one or more column positions separated by commas. Positions begin at 1.

--null= value

Interpret value as the NULL value indicator and convert instances of it to \N. The default is to convert empty values, like empty_to_null.pl.

--case-insensitive, -i

Perform a case-insensitive comparison when looking for the NULL value indicator.

Because the has_nulls.txt datafile has two different NULL value indicators, it's necessary to process it using two invocations of to_null.pl:

% to_null.pl --columns=1 --null=Unknown has_nulls.txt \
    | to_null.pl --columns=2 --null=0 > tmp

The resulting file, tmp, looks like this:

str1    13
str2    \N
\N      15
\N      \N

Sometimes you don't need to preprocess your input file if you can postprocess it after importing it. For example, if a datafile contains a numeric column that uses -1 to represent NULL values, you can easily convert all -1 values after loading the file using a simple UPDATE statement:

UPDATE tbl_name SET col_name = NULL WHERE col_name = -1;

The preceding discussion pertains to interpreting NULL values for import into MySQL. It's also necessary to think about NULL values when transferring data the other way—from MySQL into other programs. Here are some examples:

  • SELECT INTO ... OUTFILE writes NULL values as \N. Will another program understand that convention? If not, you'll need to convert \N to something the program will understand.

  • You can use mysql in batch mode as an easy way to produce tab-delimited output (see Recipe 10.14), but one problem with doing so is that NULL values appear in the output as instances of the word "NULL." If that word occurs nowhere else in the output, you may be able to postprocess it to convert instances of the word to something more appropriate. A script similar to empty_to_null.pl would be easy to write, or you could use a one-line sed command:

    % sed -e "s/NULL/\\N/g" data.txt > tmp

    If the word "NULL" does appear where it represents something other than a NULL value, then it's ambiguous and you should probably use a different method of exporting your data.

    Previous Section Next Section
    Index: [SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Y][Z]


         Main Menu
    Main Page
    Table of content
    Copyright
    Preface
    Chapter 1. Using the mysql Client Program
    Chapter 2. Writing MySQL-Based Programs
    Chapter 3. Record Selection Techniques
    Chapter 4. Working with Strings
    Chapter 5. Working with Dates and Times
    Chapter 6. Sorting Query Results
    Chapter 7. Generating Summaries
    Chapter 8. Modifying Tables with ALTER TABLE
    Chapter 9. Obtaining and Using Metadata
    Chapter 10. Importing and Exporting Data
    10.1 Introduction
    10.2 Importing Data with LOAD DATA and mysqlimport
    10.3 Specifying the Datafile Location
    10.4 Specifying the Datafile Format
    10.5 Dealing with Quotes and Special Characters
    10.6 Importing CSV Files
    10.7 Reading Files from Different Operating Systems
    10.8 Handling Duplicate Index Values
    10.9 Getting LOAD DATA to Cough Up More Information
    10.10 Don't Assume LOAD DATA Knows More than It Does
    10.11 Skipping Datafile Lines
    10.12 Specifying Input Column Order
    10.13 Skipping Datafile Columns
    10.14 Exporting Query Results from MySQL
    10.15 Exporting Tables as Raw Data
    10.16 Exporting Table Contents or Definitions in SQL Format
    10.17 Copying Tables or Databases to Another Server
    10.18 Writing Your Own Export Programs
    10.19 Converting Datafiles from One Format to Another
    10.20 Extracting and Rearranging Datafile Columns
    10.21 Validating and Transforming Data
    10.22 Validation by Direct Comparison
    10.23 Validation by Pattern Matching
    10.24 Using Patterns to Match Broad Content Types
    10.25 Using Patterns to Match Numeric Values
    10.26 Using Patterns to Match Dates or Times
    10.27 Using Patterns to Match Email Addresses and URLs
    10.28 Validation Using Table Metadata
    10.29 Validation Using a Lookup Table
    10.30 Converting Two-Digit Year Values to Four-Digit Form
    10.31 Performing Validity Checking on Date or Time Subparts
    10.32 Writing Date-Processing Utilities
    10.33 Using Dates with Missing Components
    10.34 Performing Date Conversion Using SQL
    10.35 Using Temporary Tables for Data Transformation
    10.36 Dealing with NULL Values
    10.37 Guessing Table Structure from a Datafile
    10.38 A LOAD DATA Diagnostic Utility
    10.39 Exchanging Data Between MySQL and Microsoft Access
    10.40 Exchanging Data Between MySQL and Microsoft Excel
    10.41 Exchanging Data Between MySQL and FileMaker Pro
    10.42 Exporting Query Results as XML
    10.43 Importing XML into MySQL
    10.44 Epilog
    Chapter 11. Generating and Using Sequences
    Chapter 12. Using Multiple Tables
    Chapter 13. Statistical Techniques
    Chapter 14. Handling Duplicates
    Chapter 15. Performing Transactions
    Chapter 16. Introduction to MySQL on the Web
    Chapter 17. Incorporating Query Resultsinto Web Pages
    Chapter 18. Processing Web Input with MySQL
    Chapter 19. Using MySQL-Based Web Session Management
    Appendix A. Obtaining MySQL Software
    Appendix B. JSP and Tomcat Primer
    Appendix C. References
    Colophone
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele