10.19 Converting Datafiles from One Format to Another
10.19.1 Problem
You
want to convert a file to a
different format to make it easier to work with, or so that another
program can understand it.
10.19.2 Solution
Use the cvt_file.pl converter script described
here.
10.19.3 Discussion
The mysql_to_text.pl script discussed in Recipe 10.18 uses MySQL as a data source and produces
output in the format you specify via the
--delim, --quote, and
--eol options. This section describes
cvt_file.pl,
a utility that provides similar formatting options, but for both
input and output. It reads data from a file rather than from MySQL,
and converts it from one format to another. For example, to read a
tab-delimited file data.txt, convert it to
colon-delimited format, and write the result to
tmp, you would invoke
cvt_file.pl like this:
% cvt_file.pl --idelim="\t" --odelim=":" data.txt > tmp
The cvt_file.pl script has separate options for
input and output. Thus, whereas mysql_to_text.pl
has just a --delim for specifying the column
delimiter, cvt_file.pl has separate
--idelim and --odelim
options to set the input and output line column delimiters. But as a
shortcut, --delim is also supported; it sets
the delimiter for both input and output. The full set of options that
cvt_file.pl
understands is as follows:
- --idelim= str, --odelim= str, --delim= str
-
Set the column delimiter sequence for input, output, or both. The
option value may consist of one or more characters.
- --iquote= c, --oquote= c, --quote= c
-
Set the column quote character for input, output, or both.
- --ieol= str, --oeol= str, --eol= str
-
Set the end-of-line sequence for input, output, or both. The option
value may consist of one or more characters.
- --iformat= format, --oformat= format, --format= format,
-
Specify an input format, an output format, or both. This option is
shorthand for setting the quote and delimiter values. For example,
--iformat=csv sets the input
quote and delimiter characters to double quote and comma.
--iformat=tab sets them to
"no quotes" and tab.
- --ilabels, --olabels, --labels
-
Expect an initial line of column labels for input, write an initial
line of labels for output, or both. If you request labels for the
output but do not read labels from the input,
cvt_file.pl uses column labels of
c1, c2, and so forth.
cvt_file.pl assumes the same default file format
as LOAD DATA and
SELECT INTO ...
OUTFILE, that is, tab-delimited lines terminated
by linefeeds.
cvt_file.pl can be found in the
transfer directory of the
recipes distribution. If you expect to use it
regularly, you should install it in some directory
that's listed in your search path so that you can
invoke it from anywhere. Much of the source for the script is similar
to mysql_to_text.pl, so rather than showing the
code and discussing how it works, I'll just give
some examples illustrating how to use it:
Read a file in CSV format with CRLF line termination, write
tab-delimited output with linefeed termination:
% cvt_file.pl --iformat=csv --ieol="\r\n" --oformat=tab --oeol="\n" \
data.txt > tmp
Read and write CSV format, converting CRLF line terminators to
carriage returns:
% cvt_file.pl --format=csv --ieol="\r\n" --oeol="\r" data.txt > tmp
Produce a tab-delimited file from the colon-delimited
/etc/passwd file:
% cvt_file.pl --idelim=":" /etc/passwd > tmp
Convert tab-delimited query output from mysql into
CSV format:
% mysql -e "SELECT * FROM profile" cookbook \
| cvt_file.pl --oformat=csv > profile.csv
|