Rails csv import with associations
I am building a rails application that associates posts with many different categories. For example, I have a Post and need to be able to assign it to the categories Sports, News and Science through csv import via rake task.
My question is how can I import an array of multiple category_ids into my Post model? I have it working where I can manually create a new Post and assign multiple categories to the post, but I am confused as to how to complete this through csv. I need help figuring out the best way to accomplish this.
Here is what I have so far:
Schema
create_table "posts", force: :cascade do |t|
t.string "name"
t.text "description"
end
create_table "styles", force: :cascade do |t|
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.integer "post_id"
t.integer "category_id"
end
create_table "categories", force: :cascade do |t|
t.string "name"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
Post.rb
class Post < ApplicationRecord
has_many :styles
has_many :categories, through: :styles
end
Category.rb
class Category < ApplicationRecord
has_many :styles
has_many :posts, through: :styles
end
Style.rb
class Style < ApplicationRecord
belongs_to :post
belongs_to :category
end
Rake Task
require 'csv'
require 'open-uri'
namespace :post_import do
desc "Import posts daily"
task posts: :environment do
filename = File.join Rails.root, "posts.csv"
counter = 0
CSV.foreach(filename) do |row|
name, category, description = row
post = Post.create(name: name, category_ids: category, description: description)
counter += 1 if post.persisted?
end
puts "Imported #{counter} [posts]"
end
end
Sup Ryan! 👍 for formatting the markdown perfectly.
First off you'll need a join table to connect the two models. PostCategory can belongs_to :post
and belongs_to :category
and give you the ability for your post to has_many :categories, through: :post_categories
. That will give you the ability to associate multiple categories to a post and reuse the categories on multiple posts.
The next question, of course, is how does your CSV format the categories in it? You'll basically just want to split those up, and then do a Category.find() on them to grab the categories and you can say something like post.categories << Category.find_by(name: category_name)
Thanks Chris, I have the join table created though my naming convention is not the most logical - I've called it styles rather than PostCategories.
My csv formats the categories by ID. For example a row would look like "1,2,3" with 1, 2, 3 representing the category_ids of three unique categories. Can you give me an example of how to split them up and "do a Category.find() on them to grab the categories"?
Here is what I have in my rake task:
require 'csv'
require 'open-uri'
namespace :post_import do
desc "Import posts daily"
task posts: :environment do
filename = File.join Rails.root, "posts.csv"
counter = 0
CSV.foreach(filename) do |row|
name, category, description = row
post = Post.create(name: name, description: description)
post.styles.build(category_id: category)
counter += 1 if post.persisted?
end
puts "Imported #{counter} [posts]"
end
end
This works, but only for the first category in the row. Categories 2 and 3 arent being assigned.
Oh derp. I wasn't paying attention whatsoever. :) My bad!
So if you've got the array of 1, 2, 3
and these match the ID column for your Category records, you can say
category_ids = ids # if this is a comma separated string of ids, you'll need to split(",") it into an array instead
categories = Category.where(id: category_ids)
post.categories = categories
This will automatically create the join records in the Styles table between the two and do it all in one batch because you'd be querying for the entire array of categories and then you can assign them using the association.
Of course, this assumes the categories are already in the database as a caveat.
Another thing to double check with this is to make sure that it's idemopotent, meaning if you import the same csv twice, it doesn't create duplicate categories per post. I'm not sure if assigning the categories like that will or not, but it's worth checking.
I am able to split the ids into an array successfully - but somewhere in
categories = Category.where(id: category_ids)
post.categories = categories
the output becomes quadruplicated and I dont understand why. I am using a csv with only one row of data for testing. Even when I remove multiple category ids and leave a single id, the output of post.categories is [1,1,1,1]. I dont think this is what you mean by idemoptent as this is a single csv imported once.
here is the code that I am using for my rake due to my particular schema:
a_categories = category_ids.split(",")
categories = Style.where(category_id: a_categories)
post.styles = categories
Is there anything here that could cause the quadruplication?
I'd toss a byebug in there and inspect each line to see what's causing it. Make sure a_categories is just an array of one item, then make sure categories is just an array of one Style, and so on.
Also keep in mind that you changed to operate on the Join table where I was showing querying the Category table instead. The reason for that was the query can verify those Category records exist.
Your code works differently than my example and needs to change a little if you do want to skip the verification that the Categories exist like mine did.
It's okay to do that, but instead of doing a where query, you would want to just do something like this:
a_categories.each do |category_id|
post.styles.where(category_id: category_id).first_or_create
end
Note that this does not verify if a category ID is correct. If you put in say, category ID of 99999 and there was no Category record with that ID, you'll be inserting invalid data into the db. This might be fine, or it may be something you don't want, so I'd be careful with that.
Thanks so much for the help Chris. I'm going to spend some time reviewing all of this but I was able to get it to work with the final code bit you shared.
Here is my current (now working) rake file for those also looking for the solution :
require 'csv'
require 'open-uri'
namespace :post_import do
desc "Import posts daily"
task posts: :environment do
filename = File.join Rails.root, "posts.csv"
counter = 0
CSV.foreach(filename) do |row|
name, category_ids, description = row
post = Post.new(name: name, description: description) if post == nil
post.save
#separate string of category ids into array
a_categories = category_ids.split(",")
a_categories.each do |category_id|
post.styles.where(category_id: category_id).first_or_create
end
counter += 1 if post.persisted?
end
puts "Imported #{counter} [posts]
end
end