CSV Upload Form to Import Records Discussion
Awesome job on explaining imports in a simple way. Some of the most complex chunks of code I have worked on are imports. Especially if the data is inconsistent and needs to have all the edge cases accounted for, or be massaged into the correct formatting. One thing I would suggest for anyone implementing an import like this is a max record limit for each import. So a user cannot just shove 50,000 records down the app's throat. That kind of thing can crash an app server really quick.
That's a really great point. What do you think is a good way to handle that case? Do you let them attempt it, count the lines in the file, and then return an error telling them to contact support if it is over a certain number?
I would usually handle it like this, either you tell them a limit number like "you may only upload 500 or 1000 at a time" on the upload page. Typically that's a safe-ish number. So they would need to break their file into multiple smaller files. Or If they uploaded a file too large I would send back an error like "Opps your file appears to exceed the maximum records allow per upload, please split your file into smaller uploads of 500, or contact support.". I would normally double up on this kind of thing and do both.
Beyond that I have run into some cases where you need to handle massive data and that requires significant engineering to manage thousands of records flooding in. One of the largest imports I worked on had well over 1.5 million data points to be processed through a single import.
Yeah It was fun to work on. Taught me a lot about performance and mass data processing.
Hey Jared, how did you go about importing that much data? In the past I have iterated over csv files but it takes way too long. At the moment I am using a bash script with iconv, awk and other things to manipulate the data so that it can be imported into Rails using the Postgresql COPY command. Works super fast. Now my biggest issue is dealing with the 150MB files by zipping them first. Does this sound like how you have been dealing with it?
It depends on the technology stack I have available, I have dropped into command line utilities for working with large CSV's in ruby in the past. They are just so much faster to work with in raw CLI. The other option is to push the task off to a service with a language that is much more suited to massive data crunching. It also depends on the servers I have to work with, with sufficiently beefy compute heavy servers you ruby can churn through data at a pretty decent clip.
As long as the client is ok with it I tend to push this processing into delayed jobs and allow it to run in the background. The larger the file the less likely I am to make it real time. Most users are fine with a message saying "we are currently processing your request, we will send you an email when its done." or even just a check back later kind of message. That way its ok if your process takes 10 or 20 minutes to churn through the data.
Enjoying this tutorial, and everything is working fine for me up to about 15 minute in, but then when I should finally be getting the notice: X users uploaded on the browser after clicking "Upload" I get a `NoMethodError in UsersController#import` with `undefined method 'path' for "csvname.csv":String` Any thoughts on why this might be happening?
Hi Chris, please can you expand this series to show how to upload excel files. I'm struggling to make your process work with an excel file instead of a csv file in my app. I've moved on to trying all the tutorials I linked in this post - and now just have a mess. http://stackoverflow.com/qu...
I seem to be getting the same error as Anderson Evans (below). Please could you do an example with an excel file. Thanks so much, Mel
in a pry session, i was curious to try this out. Ive imported a CSV with headers: true and header_converters: :symbol. when i convert the first item to a hash and attempt use slice i get a
NoMethodError: undefined method `slice' for #<hash:0x007fba2f6a6300>
ive used both .to_h and .to_hash on the row item