MySQL Cookbook Free Open Book

MySQL Cookbook

Previous Section Next Section

10.32 Writing Date-Processing Utilities

10.32.1 Problem

There's a given date-processing operation that you need to perform frequently, so you want to write a utility that does it for you.

10.32.2 Solution

The utilities in this section provide some examples showing how to do that.

10.32.3 Discussion

Due to the idiosyncratic nature of dates, you'll probably find it necessary to write date converters from time to time. This section shows some sample converters that serve various purposes:

  • isoize_date.pl reads a file looking for dates in U.S. format (MM-DD-YY) and converts them to ISO format.

  • cvt_date.pl converts dates to and from any of ISO, US, or British formats. It is more general than isoize_date.pl, but requires that you tell it what kind of input to expect and what kind of output to produce.

  • monddccyy_to_iso.pl looks for dates like Feb. 6, 1788 and converts them to ISO format. It illustrates how to map dates with non-numeric parts to a format that MySQL will understand.

All three scripts are located in the transfer directory of the recipes distribution. They assume datafiles are in tab-delimited, linefeed-terminated format. (Use cvt_file.pl to work with files in a different format.)

Our first date-processing utility, isoize_date.pl, looks for dates in U.S. format and rewrites them into ISO format. You'll recognize that it's modeled after the general input-processing loop shown in Recipe 10.21, with some extra stuff thrown in to perform a specific type of conversion:

#! /usr/bin/perl -w
# isoize_date.pl - Read input data, look for values that match
# a date pattern, convert them to ISO format. Also converts
# 2-digit years to 4-digit years, using a transition point of 70.

# By default, this looks for dates in MM-DD-[CC]YY format.

# Assumes tab-delimited, linefeed-terminated input lines.

# Does not check whether dates actually are valid (for example,
# won't complain about 13-49-1928).

use strict;

# transition point at which 2-digit years are assumed to be 19XX
# (below they are treated as 20XX)
my $transition = 70;

while (<>)
{
    chomp;
    my @val = split (/\t/, $_, 10000);  # split, preserving all fields
    for my $i (0 .. @val - 1)
    {
        my $val = $val[$i];
        # look for strings in MM-DD-[CC]YY format
        next unless $val =~ /^(\d{1,2})\D(\d{1,2})\D(\d{2,4})$/;

        my ($month, $day, $year) = ($1, $2, $3);
        # to interpret dates as DD-MM-[CC]YY instead, replace preceding
        # line with the following one:
        #my ($day, $month, $year) = ($1, $2, $3);

        # convert 2-digit years to 4 digits, then update value in array
        $year += ($year >= $transition ? 1900 : 2000) if $year < 100;
        $val[$i] = sprintf ("%04d-%02d-%02d", $year, $month, $day);
    }
    print join ("\t", @val) . "\n";
}

exit (0);

If you feed isoize_date.pl an input file that looks like this:

Fred    04-13-70
Mort    09-30-69
Brit    12-01-57
Carl    11-02-73
Sean    07-04-63
Alan    02-14-65
Mara    09-17-68
Shepard 09-02-75
Dick    08-20-52
Tony    05-01-60

It produces the following output:

Fred    1970-04-13
Mort    2069-09-30
Brit    2057-12-01
Carl    1973-11-02
Sean    2063-07-04
Alan    2065-02-14
Mara    2068-09-17
Shepard 1975-09-02
Dick    2052-08-20
Tony    2060-05-01

isoize_date.pl serves a specific purpose: It converts only from U.S. to ISO format. It does not perform validity checking on date subparts or allow the transition point for adding the century to be specified. A more general tool would be more useful. The next script, cvt_date.pl, extends the capabilities of isoize_date.pl; it recognizes input dates in ISO, US, or British formats and converts any of them to any other. It also can convert two-digit years to four digits, allows you to specify the conversion transition point, and can warn about bad dates. As such, it can be used to preprocess input for loading into MySQL, or for postprocessing data exported from MySQL for use by other programs.

cvt_date.pl understands the following options:

--iformat= format, --oformat= format, --format= format,

Set the date format for input, output, or both. The default format value is iso; cvt_date.pl also recognizes any string beginning with us or br as indicating U.S. or British date format.

--add-century

Convert two-digit years to four digits.

--columns= column_list

Convert dates only in the named columns. By default, cvt_date.pl looks for dates in all columns. If this option is given, column_list should be a list of one or more column positions separated by commas. Positions begin at 1.

--transition= n

Specify the transition point for two-digit to four-digit year conversions. The default transition point is 70. This option turns on --add-century.

--warn

Warn about bad dates. (Note that this option can produce spurious warnings if the dates have two-digit years and you don't specify --add-century, because leap year testing won't always be accurate in that case.)

I won't show the code for cvt_date.pl here (most of it is taken up with processing command-line options), but you can examine the source for yourself if you like. As an example of how cvt_date.pl works, suppose you have a file newdata.txt with the following contents:

name1   01/01/99    38
name2   12/31/00    40
name3   02/28/01    42
name4   01/02/03    44

Running the file through cvt_date.pl with options indicating that the dates are in U.S. format and that the century should be added produces this result:

% cvt_date.pl --iformat=us --add-century newdata.txt
name1   1999-01-01  38
name2   2000-12-31  40
name3   2001-02-28  42
name4   2003-01-02  44

To produce dates in British format instead with no year conversion, do this:

% cvt_date.pl --iformat=us --oformat=br newdata.txt
name1   01-01-99    38
name2   31-12-00    40
name3   28-02-01    42
name4   02-01-03    44

cvt_date.pl has no knowledge of the meaning of each data column, of course. If you have a non-date column with values that match the pattern, it will rewrite that column, too. To deal with that, specify a --columns option to limit the columns that cvt_date.pl attempts to convert.

isoize_date.pl and cvt_date.pl both operate on dates written in all-numeric formats. But dates in datafiles often are written differently, in which case it may be necessary to write a special purpose script to process them. Suppose an input file contains dates in the following format (these represent the dates on which U.S. states were admitted to the Union):

Delaware        Dec. 7, 1787
Pennsylvania    Dec 12, 1787
New Jersey      Dec. 18, 1787
Georgia         Jan. 2, 1788
Connecticut     Jan. 9, 1788
Massachusetts   Feb. 6, 1788
Maryland        Apr. 28, 1788
South Carolina  May 23, 1788
New Hampshire   Jun. 21, 1788
Virginia        Jun 25, 1788
...

The dates consist of a three-character month abbreviation (possibly followed by a period), the numeric day of the month, a comma, and the numeric year. To import this file into MySQL, you'd need to convert the dates to ISO format, resulting in a file that looks like this:

Delaware        1787-12-07
Pennsylvania    1787-12-12
New Jersey      1787-12-18
Georgia         1788-01-02
Connecticut     1788-01-09
Massachusetts   1788-02-06
Maryland        1788-04-28
South Carolina  1788-05-23
New Hampshire   1788-06-21
Virginia        1788-06-25
...

That's a somewhat specialized kind of transformation, though this general type of problem (converting a specific date format) is hardly uncommon. To perform the conversion, identify the dates as those values matching an appropriate pattern, map month names to the corresponding numeric values, and reformat the result. The following script, monddccyy_to_iso.pl, illustrates how to do this:

#! /usr/bin/perl -w
# monddccyy_to_iso.pl - convert dates from mon[.] dd, ccyy to ISO format

# Assumes tab-delimited, linefeed-terminated input

use strict;

my %map =       # map 3-char month abbreviations to numeric month
(
    "jan" => 1, "feb" => 2, "mar" => 3, "apr" => 4, "may" => 5, "jun" => 6,
    "jul" => 7, "aug" => 8, "sep" => 9, "oct" => 10, "nov" => 11, "dec" => 12
);

while (<>)
{
    chomp;
    my @val = split (/\t/, $_, 10000);      # split, preserving all fields
    for my $i (0 .. @val - 1)
    {
        # reformat the value if it matches the pattern, otherwise assume
        # it's not a date in the required format and leave it alone
        if ($val[$i] =~ /^([^.]+)\.? (\d+), (\d+)$/)
        {
            # use lowercase month name
            my ($month, $day, $year) = (lc ($1), $2, $3);
            if (exists ($map{$month}))
            {
                $val[$i] = sprintf ("%04d-%02d-%02d",
                                    $year, $map{$month}, $day);
            }
            else
            {
                # warn, but don't reformat
                warn "$val[$i]: bad date?\n";
            }
        }
    }
    print join ("\t", @val) . "\n";
}

exit (0);

The script only does reformatting, it doesn't validate the dates. To do that, modify the script to use the Cookbook_Utils.pm module by adding this statement after the use strict line:

use Cookbook_Utils;

That gives the script access to the module's is_valid_date( ) routine. To use it, change the reformatting section of the script to look like this:

if (exists ($map{$month})
        && is_valid_date ($year, $map{$month}, $day))
{
    $val[$i] = sprintf ("%04d-%02d-%02d",
                        $year, $map{$month}, $day);
}
else
{
    # warn, but don't reformat
    warn "$val[$i]: bad date?\n";
}
    Previous Section Next Section
    Index: [SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Y][Z]


         Main Menu
    Main Page
    Table of content
    Copyright
    Preface
    Chapter 1. Using the mysql Client Program
    Chapter 2. Writing MySQL-Based Programs
    Chapter 3. Record Selection Techniques
    Chapter 4. Working with Strings
    Chapter 5. Working with Dates and Times
    Chapter 6. Sorting Query Results
    Chapter 7. Generating Summaries
    Chapter 8. Modifying Tables with ALTER TABLE
    Chapter 9. Obtaining and Using Metadata
    Chapter 10. Importing and Exporting Data
    10.1 Introduction
    10.2 Importing Data with LOAD DATA and mysqlimport
    10.3 Specifying the Datafile Location
    10.4 Specifying the Datafile Format
    10.5 Dealing with Quotes and Special Characters
    10.6 Importing CSV Files
    10.7 Reading Files from Different Operating Systems
    10.8 Handling Duplicate Index Values
    10.9 Getting LOAD DATA to Cough Up More Information
    10.10 Don't Assume LOAD DATA Knows More than It Does
    10.11 Skipping Datafile Lines
    10.12 Specifying Input Column Order
    10.13 Skipping Datafile Columns
    10.14 Exporting Query Results from MySQL
    10.15 Exporting Tables as Raw Data
    10.16 Exporting Table Contents or Definitions in SQL Format
    10.17 Copying Tables or Databases to Another Server
    10.18 Writing Your Own Export Programs
    10.19 Converting Datafiles from One Format to Another
    10.20 Extracting and Rearranging Datafile Columns
    10.21 Validating and Transforming Data
    10.22 Validation by Direct Comparison
    10.23 Validation by Pattern Matching
    10.24 Using Patterns to Match Broad Content Types
    10.25 Using Patterns to Match Numeric Values
    10.26 Using Patterns to Match Dates or Times
    10.27 Using Patterns to Match Email Addresses and URLs
    10.28 Validation Using Table Metadata
    10.29 Validation Using a Lookup Table
    10.30 Converting Two-Digit Year Values to Four-Digit Form
    10.31 Performing Validity Checking on Date or Time Subparts
    10.32 Writing Date-Processing Utilities
    10.33 Using Dates with Missing Components
    10.34 Performing Date Conversion Using SQL
    10.35 Using Temporary Tables for Data Transformation
    10.36 Dealing with NULL Values
    10.37 Guessing Table Structure from a Datafile
    10.38 A LOAD DATA Diagnostic Utility
    10.39 Exchanging Data Between MySQL and Microsoft Access
    10.40 Exchanging Data Between MySQL and Microsoft Excel
    10.41 Exchanging Data Between MySQL and FileMaker Pro
    10.42 Exporting Query Results as XML
    10.43 Importing XML into MySQL
    10.44 Epilog
    Chapter 11. Generating and Using Sequences
    Chapter 12. Using Multiple Tables
    Chapter 13. Statistical Techniques
    Chapter 14. Handling Duplicates
    Chapter 15. Performing Transactions
    Chapter 16. Introduction to MySQL on the Web
    Chapter 17. Incorporating Query Resultsinto Web Pages
    Chapter 18. Processing Web Input with MySQL
    Chapter 19. Using MySQL-Based Web Session Management
    Appendix A. Obtaining MySQL Software
    Appendix B. JSP and Tomcat Primer
    Appendix C. References
    Colophone
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele