All threads / Rails csv import with associations

Ask A Question

Notifications

You’re not receiving notifications from this thread.

Rails csv import with associations

Ryan Blakely asked in Rails

I am building a rails application that associates posts with many different categories. For example, I have a Post and need to be able to assign it to the categories Sports, News and Science through csv import via rake task.

My question is how can I import an array of multiple category_ids into my Post model? I have it working where I can manually create a new Post and assign multiple categories to the post, but I am confused as to how to complete this through csv. I need help figuring out the best way to accomplish this.

Here is what I have so far:

Schema

  create_table "posts", force: :cascade do |t|
    t.string   "name"
    t.text     "description"
  end

  create_table "styles", force: :cascade do |t|
    t.datetime "created_at",  null: false
    t.datetime "updated_at",  null: false
    t.integer  "post_id"
    t.integer  "category_id"
  end

  create_table "categories", force: :cascade do |t|
    t.string   "name"
    t.datetime "created_at",        null: false
    t.datetime "updated_at",        null: false
  end

Post.rb

class Post < ApplicationRecord  
  has_many :styles
  has_many :categories, through: :styles
end

Category.rb

class Category < ApplicationRecord
  has_many :styles
  has_many :posts, through: :styles
end

Style.rb

class Style < ApplicationRecord
  belongs_to :post
  belongs_to :category
end

Rake Task

require 'csv'
require 'open-uri'
namespace :post_import do
  desc "Import posts daily"
  task posts: :environment do
    filename = File.join Rails.root, "posts.csv"
    counter = 0

    CSV.foreach(filename) do |row|
      name, category, description = row
      post = Post.create(name: name, category_ids: category, description: description)
      counter += 1 if post.persisted?
    end

    puts "Imported #{counter} [posts]"
  end
end

Sup Ryan! 👍 for formatting the markdown perfectly.

First off you'll need a join table to connect the two models. PostCategory can belongs_to :post and belongs_to :category and give you the ability for your post to has_many :categories, through: :post_categories. That will give you the ability to associate multiple categories to a post and reuse the categories on multiple posts.

The next question, of course, is how does your CSV format the categories in it? You'll basically just want to split those up, and then do a Category.find() on them to grab the categories and you can say something like post.categories << Category.find_by(name: category_name)

Thanks Chris, I have the join table created though my naming convention is not the most logical - I've called it styles rather than PostCategories.

My csv formats the categories by ID. For example a row would look like "1,2,3" with 1, 2, 3 representing the category_ids of three unique categories. Can you give me an example of how to split them up and "do a Category.find() on them to grab the categories"?

Here is what I have in my rake task:

require 'csv'
require 'open-uri'
namespace :post_import do
  desc "Import posts daily"
  task posts: :environment do
    filename = File.join Rails.root, "posts.csv"
    counter = 0

    CSV.foreach(filename) do |row|
      name, category, description = row
      post = Post.create(name: name, description: description)
      post.styles.build(category_id: category)

      counter += 1 if post.persisted?
    end

    puts "Imported #{counter} [posts]"
  end
end

This works, but only for the first category in the row. Categories 2 and 3 arent being assigned.

Oh derp. I wasn't paying attention whatsoever. :) My bad!

So if you've got the array of 1, 2, 3 and these match the ID column for your Category records, you can say

category_ids = ids # if this is a comma separated string of ids, you'll need to split(",") it into an array instead
categories = Category.where(id: category_ids)
post.categories = categories

This will automatically create the join records in the Styles table between the two and do it all in one batch because you'd be querying for the entire array of categories and then you can assign them using the association.

Of course, this assumes the categories are already in the database as a caveat.

Another thing to double check with this is to make sure that it's idemopotent, meaning if you import the same csv twice, it doesn't create duplicate categories per post. I'm not sure if assigning the categories like that will or not, but it's worth checking.

I am able to split the ids into an array successfully - but somewhere in

categories = Category.where(id: category_ids)
post.categories = categories

the output becomes quadruplicated and I dont understand why. I am using a csv with only one row of data for testing. Even when I remove multiple category ids and leave a single id, the output of post.categories is [1,1,1,1]. I dont think this is what you mean by idemoptent as this is a single csv imported once.

here is the code that I am using for my rake due to my particular schema:

a_categories = category_ids.split(",")
categories = Style.where(category_id: a_categories)
post.styles = categories

Is there anything here that could cause the quadruplication?

I'd toss a byebug in there and inspect each line to see what's causing it. Make sure a_categories is just an array of one item, then make sure categories is just an array of one Style, and so on.

Also keep in mind that you changed to operate on the Join table where I was showing querying the Category table instead. The reason for that was the query can verify those Category records exist.

Your code works differently than my example and needs to change a little if you do want to skip the verification that the Categories exist like mine did.

It's okay to do that, but instead of doing a where query, you would want to just do something like this:

a_categories.each do |category_id|
  post.styles.where(category_id: category_id).first_or_create
end

Note that this does not verify if a category ID is correct. If you put in say, category ID of 99999 and there was no Category record with that ID, you'll be inserting invalid data into the db. This might be fine, or it may be something you don't want, so I'd be careful with that.

Thanks so much for the help Chris. I'm going to spend some time reviewing all of this but I was able to get it to work with the final code bit you shared.

Here is my current (now working) rake file for those also looking for the solution :

require 'csv'
require 'open-uri'
namespace :post_import do
  desc "Import posts daily"
  task posts: :environment do
    filename = File.join Rails.root, "posts.csv"
    counter = 0

    CSV.foreach(filename) do |row|
      name, category_ids, description = row
      post = Post.new(name: name, description: description) if post == nil
      post.save 

      #separate string of category ids into array
      a_categories = category_ids.split(",")

      a_categories.each do |category_id|
        post.styles.where(category_id: category_id).first_or_create
      end

      counter += 1 if post.persisted?
    end

    puts "Imported #{counter} [posts]
  end
end
Join the discussion

Want to stay up-to-date with Ruby on Rails?

Join 33,665+ developers who get early access to new tutorials, screencasts, articles, and more.

    We care about the protection of your data. Read our Privacy Policy.

    logo Created with Sketch.

    Ruby on Rails tutorials, guides, and screencasts for web developers learning Ruby, Rails, Javascript, Turbolinks, Stimulus.js, Vue.js, and more. Icons by Icons8

    © 2020 GoRails, LLC. All rights reserved.