Skip to main content
Import and Export CSV Files:

Introduction to Importing from CSV

29

Episode 43 · February 20, 2015

We take a look at the basics of using Ruby's CSV library to import data into your Rails application with a Rake task

CSV


Transcripts

In this episode we're going to talk about parsing CSV so that you can import data into your application. We're going to do this using a rake task because it's fantastic for doing this in development, and we can also use the rake task as a cron job if we need it in the future, so we can have this available in case that you need to do things on a schedule like importing from some external source every night. We'll have the CSV, we're going to use the CSV library here, and if you look at the ruby docs, you can see that there's various different methods for reading a CSV file, the easiest one is to use foreach, and this will open the CSV file for you, read it in, and then generate a bunch of rows for you. This is really fantastic, but you can also use it to parse, if you already have the content, or the user paste in the content of this CSV into a form field; You could just use CSV.parse to do that, it knows how to split new lines and handling codings, but you have to specify encoding, so that's maybe something that you will run into if you use something like Excel to export a CSV. We're actually just going to create a CSV manually in our text editor, and these pipes here are actually just commas, and Vim likes to separate them out with a prettier syntax. We can replace that with a comma, and it looks exactly the same, so if you type a comma here and then put stuff after it and go back to visual mode, or escape out of insert mode, it goes back to a pipe, but it's really a comma. It's a little confusing, but bear with me, we've got a users.csv file that I'm just going to create in the root our our Gorails forum, and we're going to use this to import users, so imagine we had an old forum that we're migrating from, you could export the data, and then import it using this CSV. This is going to be a rake task that we're going to build, and let's get started.

Of course, the thing we can do now is to create that rake task and we can do that by going into the lib/tasks folder and creating a new file in there. I'm going to create one called

lib/tasks/import.rake

We don't have access to the csv library by default, and we can run be rake import:users and we can see that it's going to fail, and the reason why it fails is because there's an uninitialized constant CSV. This is because we haven't required the CSV library, so here at the top, or anytime you have a file that references CSV, at the top you can simply say require 'csv', since it'snot a gem and it's not auto imported in rails, you'll have to do this at least somewhere where you CSV. If you don't want it in here, you can also open up application.rb and put it right after your require 'rails/all'. As a side note, this is where bundler requires all of the gems from your gemfile. We can put it here if we're going to use this potentially in more places in our application, so maybe we don't want to only import user's, and maybe we want to import other things like forum threads because this is our forum, so maybe we want an export and an import from some other forum system that we're migrating from. CSV might be useful if you want to use this throughout your rails application. In my case, I'm going to just paste it here in the rake task because we're only going to use CSV in here, and we'll talk about it moving this out at the end of this episode.

Now that we have CSV, if we go back to Chrome, we can see that we want this for each method that accepts a file name, and we can change this file name to match the users.csv that we just created in the root of our rails app. It's important here to note that this is a relative file name, so this will be a file in the same folder that you ran the rake command in. If you put this in the root, then you can do that, otherwise you might need to do a file.join with the rails.root. If you have not used that before, you can type Rails.root and it gives you the path name for that, and then you can say file.join Rails.root, "users.csv" This will generate the path for you, so it's pretty nifty if you need to do something like that. You can replace that simply with

lib/tasks/import.rake

require 'csv'

namespace :import do 
  desc "Import useers from csv"

  task users: :environment do 
    filename = File.joing Rails.root, "users.csv"
    CSV.foreach(filename) do |raw|
      p row
    end
  end
end 

be rake import:users

This should print out the file contents in ruby format, so we've got an array back that has a three item array, and each of those were from the columns in our CSV file, so it knows how to parse that, and actually, if we open up users.csv, we can add a new user in here. So we'll have an example user in here, and if we run this now, we can see that there should be two arrays that we get in the output, and we do. This is parsing everything just how we want it, and we can pass these things in very easily to our user model. There's various ways that you can do this. If you have a CSV without headers, then this is going to give you those columns back and it's not going to know their names, so here we can just say:

lib/tasks/import.rake

require 'csv'

namespace :import do 
  desc "Import useers from csv"

  task users: :environment do 
    filename = File.joing Rails.root, "users.csv"
    CSV.foreach(filename) do |raw|
      email, fiest, last = row
      User.create(email: email, first_name: first, last_name: last)
    end
  end
end

That would go and create our user, and then we're going to not use the bang here because we want this to silently fail, and try to create the users if they don't already exist, that will fail if the user is already taken, which is fine because they're already in our system, so they've already been imported. Here, we can just run that, and we'll get the output that we expect.

Now it's finished, but there's no output, and that's actually exactly what you want, so this User.create, we don't print out the results of this, we're actually creating the users and they're trying to get created, maybe they already exist and they're failing. There's no output of this, and you could print this out if you want, or you could add like a counter variable here and start with zero, and then you could say user equals that, and then counter += 1 if user.persisted, so you could check to see if the user is saved to the database correctly, and then add one to the counter, or you could just do that here in one line, but it's not going to be as readable to do that in one line, so that's how I like to do it. We can then print that out and say here at the bottom we can say:

puts "Imported #{counter} users"

Now if we run this, we'll probably get imported zero users because those two users should already exist in our system. One addition you might like to make here is to print out the email address and the user's full message errors if you've got any errors failing for that record, so you might want to say here print out the email and then say

lib/tasks/import.rake

require 'csv'

namespace :import do 
  desc "Import useers from csv"

  task users: :environment do 
    filename = File.joing Rails.root, "users.csv"
    CSV.foreach(filename) do |raw|
      email, fiest, last = row
      User.create(email: email, first_name: first, last_name: last)
      puts "#{email} - #{user.errors.full_messages.join(",")}" if user.errors.any?
      counter += 1 if user.persisted?
    end

    puts "Imported #{counter} users"

  end
end

That will print out the errors on one line. Now when we run this in the console, if there's any validation errors, then we'll get a message in our application, and we can easily track it back to the email and see which errors are on that record. If you, like I did, the user's csv does not have a password so you couldn't import these to devise without skipping validations. That may be something you want to do, and so you can use the create here to skip the validations or whatever you want.

Sometimes you have a CSV that has headers, and maybe it says email and first name and last name, and then this row is a special row and you don't want to import that one, we'll CSV foreach has I believe the other methods due to of course, have a headers: true option, and this gets a little bit tricky, so you have to pass in these headers as true into the csv foreach, so if you're having users pass this in, you need to make sure that they know that it's required or you need to have them check a box and say: Yes, there are headers on this file, or you need to figure out if you can auto detect that somehow, so this can definitely become a complicated feature, but once you do that, then you can run rake import:users and we'll just comment out the code that we have here currently and we'll print out the row and we'll p row["Email"], and we'll see that we'll be able to see that the row has parsed based upon those keys that were the headers, and then the values become the item on that same column, so we have the first name column and the value is example, and the last name column and the value is user. This is a way for you to do basically a hash out of those values based upon the name in the header.

This let's us of course get rid of the assignment of these variables and then we can say:

lib/tasks/import.rake

require 'csv'

namespace :import do 
  desc "Import users from csv"

  task users: :environment do 
    filename = File.joing Rails.root, "users.csv"
    counter = 0

    CSV.foreach(filename, headers: true) do |raw|
      p row 
      p row["Email"]
      #email, fiest, last = row
      user = User.create(email: row["Email"], first_name: first, last_name: last)
      puts "#{email} - #{user.errors.full_messages.join(",")}" if user.errors.any?
      counter += 1 if user.persisted?
    end

    puts "Imported #{counter} users"

  end
end

As you can see, the CSV import is quite an interesting beast, there's lots that you have to think about, and it's very imperative code, like it runs one by one by one by one and so on, and it's not very clean to read, it ends up being long files, and it's kind of the nature of what a lot of this ends up being the first time that you implement it. Now, hopefully in the future we'll be able to cover how to refactor this and make it a lot more robust, but I think what we'll do is try to handle thinks like xls documents instead of just CSVs, so there's excel spreadsheets and things like that that you probably want to figure out, how can you support. Maybe you'll take the MailChimp approach and you don't accept any of those things, and you just have the user paste in the document and to the browser, there's various ways that you can do it, and I think they support both, but it's very cool that MailChimp allows you to just copy and paste all of this from your CSV and paste it into a text field in your browser and then that handles it, so when you get down to using headers and all of that stuff, there's a lot of small bits of code that you have to write to detect if there are headers or not, or ask the user, or just tell the user that they're required, and if it looks like an email or whatever, then like you have to complain to them, and importing CSVs is actually a very complicated feature as you might now be aware. It seems very obvious, and that's kind of the beauty of it, that it appears to be simple, and to do it right, it is a lot of work to make it that simple.

CSVs are definitely a fascinating topic, and something that you almost have to build custom for every application, which is interesting, but I hope this helped you understand how you would import from a file locally, this is a great starting point for testing, so you can create and edit your CSV locally, you can import it, just run one rake task, you can test things out, you can delete your database and try it again and see how all of that goes, and then you can make your adjustments from there. Then you can start taking this code and moving it around into your user model or whatever that you want to do so that you can integrate it so that the users can use it, and we'll cover more of that in the future.

Loading...

Subscribe to the newsletter

Join 18,000+ developers who get early access to new screencasts, articles, guides, updates, and more.

By clicking this button, you agree to the GoRails Terms of Service and Privacy Policy.

More of a social being? We're also on Twitter and YouTube.