Skip to main content

26 Introduction to Importing from CSV

Episode 43 · February 20, 2015

We take a look at the basics of using Ruby's CSV library to import data into your Rails application with a Rake task

CSV


Transcripts

In this episode we're going to talk about parsing CSV so that you can import data into your application. We're going to do this using a rake task because it's fantastic for doing this in development, and we can also use the rake task as a cron job if we need it in the future, so we can have this available in case that you need to do things on a schedule like importing from some external source every night. We'll have the CSV, we're going to use the CSV library here, and if you look at the ruby docs, you can see that there's various different methods for reading a CSV file, the easiest one is to use foreach, and this will open the CSV file for you, read it in, and then generate a bunch of rows for you. This is really fantastic, but you can also use it to parse, if you already have the content, or the user paste in the content of this CSV into a form field; You could just use CSV.parse to do that, it knows how to split new lines and handling codings, but you have to specify encoding, so that's maybe something that you will run into if you use something like Excel to export a CSV. We're actually just going to create a CSV manually in our text editor, and these pipes here are actually just commas, and Vim likes to separate them out with a prettier syntax. We can replace that with a comma, and it looks exactly the same, so if you type a comma here and then put stuff after it and go back to visual mode, or escape out of insert mode, it goes back to a pipe, but it's really a comma. It's a little confusing, but bear with me, we've got a users.csv file that I'm just going to create in the root our our Gorails forum, and we're going to use this to import users, so imagine we had an old forum that we're migrating from, you could export the data, and then import it using this CSV. This is going to be a rake task that we're going to build, and let's get started.

Of course, the thing we can do now is to create that rake task and we can do that by going into the lib/tasks folder and creating a new file in there. I'm going to create one called

lib/tasks/import.rake

We don't have access to the csv library by default, and we can run be rake import:users and we can see that it's going to fail, and the reason why it fails is because there's an uninitialized constant CSV. This is because we haven't required the CSV library, so here at the top, or anytime you have a file that references CSV, at the top you can simply say require 'csv', since it'snot a gem and it's not auto imported in rails, you'll have to do this at least somewhere where you CSV. If you don't want it in here, you can also open up application.rb and put it right after your require 'rails/all'. As a side note, this is where bundler requires all of the gems from your gemfile. We can put it here if we're going to use this potentially in more places in our application, so maybe we don't want to only import user's, and maybe we want to import other things like forum threads because this is our forum, so maybe we want an export and an import from some other forum system that we're migrating from. CSV might be useful if you want to use this throughout your rails application. In my case, I'm going to just paste it here in the rake task because we're only going to use CSV in here, and we'll talk about it moving this out at the end of this episode.

Now that we have CSV, if we go back to Chrome, we can see that we want this for each method that accepts a file name, and we can change this file name to match the users.csv that we just created in the root of our rails app. It's important here to note that this is a relative file name, so this will be a file in the same folder that you ran the rake command in. If you put this in the root, then you can do that, otherwise you might need to do a file.join with the rails.root. If you have not used that before, you can type Rails.root and it gives you the path name for that, and then you can say file.join Rails.root, "users.csv" This will generate the path for you, so it's pretty nifty if you need to do something like that. You can replace that simply with

lib/tasks/import.rake

require 'csv'

namespace :import do 
  desc "Import useers from csv"

  task users: :environment do 
    filename = File.joing Rails.root, "users.csv"
    CSV.foreach(filename) do |raw|
      p row
    end
  end
end 

be rake import:users

This should print out the file contents in ruby format, so we've got an array back that has a three item array, and each of those were from the columns in our CSV file, so it knows how to parse that, and actually, if we open up users.csv, we can add a new user in here. So we'll have an example user in here, and if we run this now, we can see that there should be two arrays that we get in the output, and we do. This is parsing everything just how we want it, and we can pass these things in very easily to our user model. There's various ways that you can do this. If you have a CSV without headers, then this is going to give you those columns back and it's not going to know their names, so here we can just say:

lib/tasks/import.rake

require 'csv'

namespace :import do 
  desc "Import useers from csv"

  task users: :environment do 
    filename = File.joing Rails.root, "users.csv"
    CSV.foreach(filename) do |raw|
      email, fiest, last = row
      User.create(email: email, first_name: first, last_name: last)
    end
  end
end

That would go and create our user, and then we're going to not use the bang here because we want this to silently fail, and try to create the users if they don't already exist, that will fail if the user is already taken, which is fine because they're already in our system, so they've already been imported. Here, we can just run that, and we'll get the output that we expect.

Now it's finished, but there's no output, and that's actually exactly what you want, so this User.create, we don't print out the results of this, we're actually creating the users and they're trying to get created, maybe they already exist and they're failing. There's no output of this, and you could print this out if you want, or you could add like a counter variable here and start with zero, and then you could say user equals that, and then counter += 1 if user.persisted, so you could check to see if the user is saved to the database correctly, and then add one to the counter, or you could just do that here in one line, but it's not going to be as readable to do that in one line, so that's how I like to do it. We can then print that out and say here at the bottom we can say:

puts "Imported #{counter} users"

Now if we run this, we'll probably get imported zero users because those two users should already exist in our system. One addition you might like to make here is to print out the email address and the user's full message errors if you've got any errors failing for that record, so you might want to say here print out the email and then say

lib/tasks/import.rake

require 'csv'

namespace :import do 
  desc "Import useers from csv"

  task users: :environment do 
    filename = File.joing Rails.root, "users.csv"
    CSV.foreach(filename) do |raw|
      email, fiest, last = row
      User.create(email: email, first_name: first, last_name: last)
      puts "#{email} - #{user.errors.full_messages.join(",")}" if user.errors.any?
      counter += 1 if user.persisted?
    end

    puts "Imported #{counter} users"

  end
end

That will print out the errors on one line. Now when we run this in the console, if there's any validation errors, then we'll get a message in our application, and we can easily track it back to the email and see which errors are on that record. If you, like I did, the user's csv does not have a password so you couldn't import these to devise without skipping validations. That may be something you want to do, and so you can use the create here to skip the validations or whatever you want.

Sometimes you have a CSV that has headers, and maybe it says email and first name and last name, and then this row is a special row and you don't want to import that one, we'll CSV foreach has I believe the other methods due to of course, have a headers: true option, and this gets a little bit tricky, so you have to pass in these headers as true into the csv foreach, so if you're having users pass this in, you need to make sure that they know that it's required or you need to have them check a box and say: Yes, there are headers on this file, or you need to figure out if you can auto detect that somehow, so this can definitely become a complicated feature, but once you do that, then you can run rake import:users and we'll just comment out the code that we have here currently and we'll print out the row and we'll p row["Email"], and we'll see that we'll be able to see that the row has parsed based upon those keys that were the headers, and then the values become the item on that same column, so we have the first name column and the value is example, and the last name column and the value is user. This is a way for you to do basically a hash out of those values based upon the name in the header.

This let's us of course get rid of the assignment of these variables and then we can say:

lib/tasks/import.rake

require 'csv'

namespace :import do 
  desc "Import users from csv"

  task users: :environment do 
    filename = File.joing Rails.root, "users.csv"
    counter = 0

    CSV.foreach(filename, headers: true) do |raw|
      p row 
      p row["Email"]
      #email, fiest, last = row
      user = User.create(email: row["Email"], first_name: first, last_name: last)
      puts "#{email} - #{user.errors.full_messages.join(",")}" if user.errors.any?
      counter += 1 if user.persisted?
    end

    puts "Imported #{counter} users"

  end
end

As you can see, the CSV import is quite an interesting beast, there's lots that you have to think about, and it's very imperative code, like it runs one by one by one by one and so on, and it's not very clean to read, it ends up being long files, and it's kind of the nature of what a lot of this ends up being the first time that you implement it. Now, hopefully in the future we'll be able to cover how to refactor this and make it a lot more robust, but I think what we'll do is try to handle thinks like xls documents instead of just CSVs, so there's excel spreadsheets and things like that that you probably want to figure out, how can you support. Maybe you'll take the MailChimp approach and you don't accept any of those things, and you just have the user paste in the document and to the browser, there's various ways that you can do it, and I think they support both, but it's very cool that MailChimp allows you to just copy and paste all of this from your CSV and paste it into a text field in your browser and then that handles it, so when you get down to using headers and all of that stuff, there's a lot of small bits of code that you have to write to detect if there are headers or not, or ask the user, or just tell the user that they're required, and if it looks like an email or whatever, then like you have to complain to them, and importing CSVs is actually a very complicated feature as you might now be aware. It seems very obvious, and that's kind of the beauty of it, that it appears to be simple, and to do it right, it is a lot of work to make it that simple.

CSVs are definitely a fascinating topic, and something that you almost have to build custom for every application, which is interesting, but I hope this helped you understand how you would import from a file locally, this is a great starting point for testing, so you can create and edit your CSV locally, you can import it, just run one rake task, you can test things out, you can delete your database and try it again and see how all of that goes, and then you can make your adjustments from there. Then you can start taking this code and moving it around into your user model or whatever that you want to do so that you can integrate it so that the users can use it, and we'll cover more of that in the future.

Discussion


Gravatar
David Zhu on

thanks


Gravatar
Mike Goggin on

I had to come up with something similar at work a few years ago.

We have clients that need to mass create auction lots and send in Excel spreadsheets full of lot data. We needed a way to easily throw the XLS converted to CSV at the app and have it create the Lot objects. Supports blacklisting of columns that can be assigned from the CSV and typecasting column data.

This code is almost 3 years old and needs a serious refactor, but we don't use it enough to warrant sinking more time in to it right now.

https://gist.github.com/mgo...


Gravatar
Alex Villa (10 XP) on

(Using the end of the video to reference line numbers)

Why did you choose to use `user = User.create(...params...)` on L14, followed by `counter+=1 if user.persisted?` on L16 rather than `user = User.new(...params...)` and `counter+=1 if user.save`. I don't necessarily feel one is better than the other but I've always tended towards the latter. Is there an advantage one way or the other?

Gravatar
Chris Oliver (169,610 XP) on

Great question. Either way works just the same in this case.

The only difference is that you have to worry about where the .save happens if you use that. With create, it doesn't really matter what order the next code is in. If you did save, you'd have to make sure it came first. Also, where the "save" happens is less clear if you have it on the right side of a line that has an "if" statement tacked on it.

In the end it doesn't make much difference at all because this code is very rough anyways. Ideally this gets refactored so that responsibilities are separated in a much cleaner fashion. More on that in a future episode though.


Gravatar
Mark Hoad (10 XP) on

I'd love to see some information about how to generate some more robust dummy data as well perhaps using something like Faker (https://github.com/stympy/f...


Gravatar
Routine Tracker on

Hey @excid3:disqus. Has your caching just not caught up with the typo in the title? Seems like it should be "Introduction to Importing from CSV".

Gravatar
Chris Oliver (169,610 XP) on

Ha! Thanks for catching that. I must have been tired.


Gravatar
Andrew Cockerham on

FYI The video doesn't show up in the browser so I had to download it...Chrome on Mac

Gravatar
Chris Oliver (169,610 XP) on

Ah thanks for the heads up. That was my mistake.


Gravatar
Brent C (780 XP) on

Thanks for a great tutorial on CSV's it's just what I needed. As an aside, I was recently going through my gems and noticed FasterCSV as a dependency in a couple of them. Have you had a chance to play with it much?

Gravatar
Chris Oliver (169,610 XP) on

I believe FasterCSV got rolled into Ruby 2.x at some point so you don't need to use it anymore.

Gravatar
Brent C (780 XP) on

Fantastic! thanks :)


Gravatar
Kohl Kohlbrenner on

@excid3:disqus when you do email,first,last = row , how is each variable assigned? On each comma in the csv row?

Gravatar
Chris Oliver (169,610 XP) on

Yeah, it's the items in the row in order. I just used a Ruby trick to split the array of items into separate variables.

irb(main):001:0> a, b, c = ["a", "b", "c"]
=> ["a", "b", "c"]
irb(main):002:0> a
=> "a"
irb(main):003:0> b
=> "b"
irb(main):004:0> c
=> "c"

Gravatar
Jerome . (100 XP) on

A big difficulty is gathering the errors and viewing them in the resulting rails view...

Gravatar
Chris Oliver (169,610 XP) on

Definitely. This is where form objects tend to make a lot more sense because it's that much more complicated. An intermediate step of uploading, processing, and finding errors before doing the actual import can be a good thing for this too.

Gravatar
Jerome . (100 XP) on

my thinking is that the end-user, not the developer is going to hit these errors and need to act upon them. So it is a top-level priority. A couple of instance variables under model's `def self.import(file) rescue @counter += 1 @check_me << row end` should gather the data, but controller `def import if @counter > 0 redirect_to chosen_modes_path, notice: @check_me ` hits a

nil:NilClass error. I guess, passing from ruby library to rails is what eludes me.

Gravatar
Chris Oliver (169,610 XP) on

I believe I covered a bit more of this in a following episode, but you're absolutely right. This is where a form object of some sort makes a lot of sense. You can have it wrapping these things and then translate it to errors in the UI. This being a rake task makes that hard, but a controller variable for a form object would help that.


Gravatar
alexander kehaya (900 XP) on

Hey Chris, i'm setting up this feature for my app now. I've used the commentable episode to help me structure the app so that users can create a category and then add a startup(s) to that category. Essentially, startups in my app are startupables I'd like to be able to upload a csv of startups for a particular category. Any ideas?


Gravatar
Sophie Zheng on

I wrote a few rake tasks to import csv but i wonder if it is possible to write tests to make sure whether or not the data point in the right table and right cell...


Gravatar
milley on

Hey Chris,
I have added the functionality to import CSV files inside model. I want to rollback the transaction if any of the value is invalid in CSV file. How can i achieve that?

Gravatar
Chris Oliver (169,610 XP) on

You should be able to just wrap it in a transaction block like so: http://api.rubyonrails.org/...

Gravatar
milley on

Thanks for the link,I am having one more doubt the transaction block should start before the foreach loop.I have referred the code snippet from the link:https://richonrails.com/art...

Thanks,

Gravatar
Chris Oliver (169,610 XP) on

If you want the entire thing to roll back on any failure, you can put the transaction around the foreach.

Gravatar
milley on

Thanks Chris, It worked for me


Gravatar
James Spolsdoff (20 XP) on

Hey Chris,

Just joined your site. Thanks for all the great videos. Was working through importing a CSV and I had a quick question regarding removing headers from the file.

In my application I had to use the following code for the loop:

CSV.foreach(filename, "r:ISO-8859-1") do |row|

When I tried adding the argument to ignore the header it didn't work. Is there something I'm missing?

Gravatar
Chris Oliver (169,610 XP) on

The second argument there needs to be hash I'm pretty sure. You're passing in a string, so that's probably going to mismatch the arguments it expects. Try changing that to hash and that should fix it for you!

Gravatar
James Spolsdoff (20 XP) on

Awesome, thanks for the quick reply.

Just changed it following your directions and it works great.

CSV.foreach(filename, encoding: "ISO-8859-1", headers: true) do |row|


Gravatar
sigumagwa on

This is a savior. Thanks a lot


Gravatar
Melanie (1,520 XP) on

Hi Chris,

I'm stuck trying to import records from an excel spreadsheet.

Is there a way to convert my xls file into a csv file so I can follow along with your tutorial?

If not, please can you recommend a source of an excel importing tutorial.

In addition to trying to adapt your tutorial, I have tried each of these and can't get any of them to work:

http://www.mattmorgante.com...
https://richonrails.com/art...
https://deepakrip007.wordpr...

The problem seems to be one that other users here have encountered with their attempts. All of my attempts at solving this are outlined here: http://stackoverflow.com/qu...

Thanks
Mel

Gravatar
Melanie (1,520 XP) on

I have now found the roo, creek, spreadsheet and dullard gems. They talk about needing to convert xls files (which I have) to xlsx format. It all looks confusing. Maybe you could add another video to show how to change xls into csv so that these tutorials can be used to import data or you could do an overview of whichever of these gems is most suitable for importing data from xls files. I'm about to start trying with each of them now.


Gravatar
Melanie (1,520 XP) on

Hi Chris, I think I might have converted my xls file to a csv file and saved it to my root file. I've tried to follow this tutorial but keep getting errors when I try to run the rake command in the console. When you use the line be rake import:users, is the reference to 'users' a reference to the name of the model, the name of the task or something else? My task is called 'for_codes', my model is called Randd::Field. I've tried both in the console but keep getting errors that say: "NameError: undefined local variable or method `for_codes' for main:Object"

I've asked here to see if anyone else can see what's wrong: http://stackoverflow.com/qu...


Gravatar
Melanie (1,520 XP) on

Is anyone getting this error when trying to rake the task?

rake aborted!

ArgumentError: invalid byte sequence in UTF-8


Gravatar
Melanie (1,520 XP) on

So - it seems that saving my .xls file with a .csv extension inside my rails app (root level) didn't work to make the file eligible to be treated as a CSV file. I've now tried copying and pasting the content of my 2 column excel file into the .csv file saved in my rails app.

That pasted several hundred lines but without the | separating the cell content.

This is the first 3 lines (the first one is the header).

Is there a way to insert the pipes without having to go line by line to paste them in?

anz_referencetitle

010104pCombinatorics and Discrete Mathematics (excl. Physical Combinatorics)

010107pMathematical Logic, Set Theory, Lattices and Universal Algebra


Gravatar
Melanie (1,520 XP) on

Can anyone make sense of these instructions for selecting the first column of a long list of text? https://atom.io/packages/su...

I'm trying to manually add the "|" to separate the first and second column of the content of my xls file which is now pasted into a .csv file in my rails app. there are 1000s of lines.

I can't understand what it's telling me to do so that I can select the first column and then press paste to separate it from the second bit of content with the "|"


Gravatar
Melanie (1,520 XP) on

So - it seems that saving my .xls file with a .csv extension inside my rails app (root level) didn't work to make the file eligible to be treated as a CSV file. I've now tried copying and pasting the content of my 2 column excel file into the .csv file saved in my rails app.

That pasted several hundred lines but without the | separating the cell content.

This is the first 3 lines (the first one is the header).

Is there a way to insert the pipes without having to go line by line to paste them in?

anz_referencetitle

010104pCombinatorics and Discrete Mathematics (excl. Physical Combinatorics)

010107pMathematical Logic, Set Theory, Lattices and Universal Algebra


Gravatar
Melanie (1,520 XP) on

Actually - I tried the steps I described below with just a few lines, that I amended manually to insert a pipe between cell values.

When i try to rake the task I get an error that says:

bundle exec rake import:randd_fields --trace

** Invoke import:randd_fields (first_time)

** Invoke environment (first_time)

** Execute environment

** Execute import:randd_fields

nil

rake aborted!

NameError: undefined local variable or method `title' for main:Object

Has anyone been able to get this tutorial to work?


Gravatar
victor hazbun (350 XP) on

my client thought it was a simple feature, but I told him from the beginning it was not, now you confirm the fact. Thank you.


Gravatar
Melanie (1,520 XP) on

I am really stuck in trying to follow along with this. I moved my xls data into a google drive sheet and downloaded that as a csv saved in my root directory, but it still isn't working. Are there any other references to different tutorials that i can try. I tried those published by RichonRails and Matt Morgante but those don't work either. I tried the roo gem with the 2012 rails cast but that is out of date and I cant get that configured to work. Please can you go deeper into this so that the issue I keep running into is covered in your process. Thank you


Gravatar
kingsley chukwuma (10 XP) on

Hello Chris.
I tried to run this code but i am getting this error, can you please help me.

https://uploads.disquscdn.c... https://uploads.disquscdn.c...


Gravatar
Olaoluwa Afolabi (10 XP) on
So @Chris how to you write Test for this solution?

Login or create an account to join the conversation.