Easily import your CSV data into MySQL with Node.js

It's convenient to import a CSV file into a spreadsheet to crunch numbers. CSV's (comma-seperated-values) and it's brethren (TSV, for tab-separated, etc) are widely used to deliver information. You'll see CSV downloads available all over the place, whether it's your stock broker or bank, or a government agency, and on and on. Spreadsheet programs like LibreOffice Calc offer direct import of these files, and then as a spreadsheet you've got a zillion tools available to crunch numbers, make graphs, etc. Sometimes, though, you need to load the CSV into a database to do other sorts of work.

What I have to show today is a simple Node.js script that automagically loads a CSV into a database, and you don't even have to run a CREATE TABLE command. This script automatically constructs a CREATE TABLE command that matches the columns in the CSV. The script can be used as a starting point if your specific need is a little different. In my case I just needed the data inside MySQL without too much worry over the column names.

Click here for more Node.js programming advice

The source code is available on gitlab: https://gitlab.com/robogeek/csv2db/blob/master/csv2mysql.js ... a nearly identical script, for importing into SQLITE3 is at https://gitlab.com/robogeek/csv2db/blob/master/csv2sqlite3.js

'use strict';

const parse      = require('csv-parse');
const util       = require('util');
const fs         = require('fs');
const path       = require('path');
const mysql      = require('mysql');
const async      = require('async');
const csvHeaders = require('csv-headers');
const leftpad    = require('leftpad');

const csvfn = process.argv[2];
const dbnm  = process.argv[3];
const tblnm = process.argv[4];

Here we're getting command line arguments - the filename for the CSV, the database name, and the name of the table to create.

new Promise((resolve, reject) => {
    csvHeaders({
        file      : csvfn,
        delimiter : ','
    }, function(err, headers) {
        if (err) reject(err);
        else resolve({ headers });
    });
})

The script is structured as a Promise chain, allowing us to string together several asynchronous operations. In this first step we use the csv-headers module to read the first line of the CSV. The script relies on there being a header line in the CSV naming each column.

Another thing is the CSV headers are returned as part of an object. In the subsequent steps we'll call that object "context" and we'll add other data as we go.

.then(context => {
    return new Promise((resolve, reject) => {
        context.db = mysql.createConnection({
            host     : '127.0.0.1',
            user     : 'user-name',
            password : 'pass-word',
            database : dbnm
        });
        context.db.connect((err) => {
            if (err) {
                console.error('error connecting: ' + err.stack);
                reject(err);
            } else {
                resolve(context);
            }
        });
    })
})
.then(context => {
    return new Promise((resolve, reject) => {
        context.db.query(`DROP TABLE IF EXISTS ${tblnm}`,
        [ ],
        err => {
            if (err) reject(err);
            else resolve(context);
        })
    });
})

In this step we connect to the MySQL database. And, we delete the table if it exists. The database connection is stored as context.db

.then(context => {
    return new Promise((resolve, reject) => {
        var fields = '';
        var fieldnms = '';
        var qs = '';
        context.headers.forEach(hdr => {
            hdr = hdr.replace(' ', '_');
            if (fields !== '') fields += ',';
            if (fieldnms !== '') fieldnms += ','
            if (qs !== '') qs += ',';
            fields += ` ${hdr} TEXT`;
            fieldnms += ` ${hdr}`;
            qs += ' ?';
        });
        context.qs = qs;
        context.fieldnms = fieldnms;
        // console.log(`about to create CREATE TABLE IF NOT EXISTS ${tblnm} ( ${fields} )`);
        context.db.query(`CREATE TABLE IF NOT EXISTS ${tblnm} ( ${fields} )`,
        [ ],
        err => {
            if (err) reject(err);
            else resolve(context);
        })
    });
})

In this step we process the CSV headers and construct a CREATE TABLE command. The field type assigned is TEXT. If a particular column name has spaces in it, the space character is translated to '_' so it works as a MySQL column name.

As we go, a couple other arrays are created which are useful in later steps.

.then(context => {
    return new Promise((resolve, reject) => {
        fs.createReadStream(csvfn).pipe(parse({
            delimiter: ',',
            columns: true,
            relax_column_count: true
        }, (err, data) => {
            if (err) return reject(err);
            async.eachSeries(data, (datum, next) => {
                // console.log(`about to run INSERT INTO ${tblnm} ( ${context.fieldnms} ) VALUES ( ${context.qs} )`);
                var d = [];
                try {
                    context.headers.forEach(hdr => {
                        d.push(datum[hdr]);
                    });
                } catch (e) {
                    console.error(e.stack);
                }
                // console.log(`${d.length}: ${util.inspect(d)}`);
                if (d.length > 0) {
                    context.db.query(`INSERT INTO ${tblnm} ( ${context.fieldnms} ) VALUES ( ${context.qs} )`, d,
                    err => {
                        if (err) { console.error(err); next(err); }
                        else setTimeout(() => { next(); });
                    });
                } else { console.log(`empty row ${util.inspect(datum)} ${util.inspect(d)}`); next(); }
            },
            err => {
                if (err) reject(err);
                else resolve(context);
            });
        }));
    });
})
.then(context => { context.db.end(); })
.catch(err => { console.error(err.stack); });

Now we use the csv-parse module to read the CSV and then do MySQL commands to add each row to the database. This will read the entire CSV into memory before getting to the stage of adding rows to the database.

On each row, we construct an Array containing the value for each column, because that's what is required by MySQL's query function. The field names and ?'s required for the INSERT INTO command were constructed in the previous step. In this step we simply have to ensure creating the value array in the same order and number of entries.

What that means is - with INSERT INTO we list field names for which to insert values, and we give a matching list of ?'s as place-holders for the inserted values. Then we need a matching array of values to insert. That's how the MySQL query function works, and plenty of other SQL database drivers work the same way. That means the CSV must have the same number of columns on each row.