10.10 Don't Assume LOAD DATA Knows More than It Does
10.10.1 Problem
You think LOAD
DATA is smarter than it really is.
10.10.2 Solution
Don't assume that LOAD
DATA knows anything at all about the format of
your datafile. And make sure you yourself know what its format is. If
the file has been transferred from one machine to another, its
contents may have been changed in subtle ways of which
you're not aware.
10.10.3 Discussion
Many LOAD DATA frustrations
occur because people expect MySQL to know things that it cannot
possibly know. LOAD DATA makes
certain assumptions about the structure of input files, represented
as the default settings for the line and field terminators, and for
the quote and escape character settings. If your input
doesn't match those assumptions, you need to tell
MySQL about it.
When in doubt, check the contents of
your datafile using a hex dump program or other utility that displays
a visible representation of whitespace characters like tab, carriage
return, and linefeed. Under
Unix, the
od program can display file contents in a
variety of formats. If you don't have
od or some comparable utility, the
transfer directory of the
recipes distribution contains hex dumpers written
in Perl and Python (hexdump.pl and
hexdump.py), as well as a couple of programs
that display printable representations of all characters of a file
(see.pl and see.py). You
may find them useful for examining files to see what they really
contain. In some cases, you may be surprised to discover that a
file's contents are different than you think. This
is in fact quite likely if the file has been transferred from one
machine to another:
An FTP transfer between machines
running different operating systems typically translates line endings
to those that are appropriate for the destination machine if the
transfer is performed in text mode rather than in binary (image)
mode. Suppose you have tab-delimited linefeed-terminated records in a
datafile that load into MySQL on a Unix system just fine using the
default LOAD DATA settings. If
you copy the file to a Windows machine with FTP using a text transfer
mode, the linefeeds probably will be converted to carriage
return/linefeed pairs. On that machine, the file will not load
properly with the same LOAD
DATA statement, because its contents will have
been changed. Does MySQL have any way of knowing that? No. So
it's up to you to tell it, by adding a
LINES TERMINATED
BY '\r\n' clause to the
statement. Transfers between any two systems with dissimilar default
line endings can cause these changes. For example, a Macintosh file
containing carriage returns may contain linefeeds after transfer to a
Unix system. You should either account for such changes with a
LINES TERMINATED
BY clause that reflects the modified line-ending
sequence, or transfer the file in binary mode so that its contents do
not change.
Datafiles pasted into email messages often do not survive
intact. Mail software may wrap (break) long lines or convert
line-ending sequences. If you must transfer a datafile by email,
it's best sent as an attachment.
|
|
|