Skip to main content

Importing datasets. Recommendations?

Databases • Asked by Brent C

I have a few datasets that I need to import into my rails app. They vary from 100 records to 5000 which will span across a few models with various associations.

Q.1: I can cleanse the data prior to importing... Any recommendations of data type: CSV, Excel, Tab delimitated, comma separated, etc?

Q.2: Any gems, or techniques that are common use or best practice or should I write a custom rake/chron task?


Comma separated CSV is usually pretty good standard to go with. Sometimes Excel saves with weird encodings though so it makes it less simple.

I think roo is pretty good for parsing files like these. I'd write a rake task to test it against a local file. The rake task can then just call a method a model so that it's in the correct place for use in your Rails app. Plus if you create a rake task, you can set it up to run nightly with a cron job if you ever had the need for that.



I also recommend CSV and try to make sure the CSV file is in UTF-8. as Excel sometimes encodes the files as UTF-16 or other weird things. Also saving an excel file as CSV doesn't always mean it is comma separated. Excel generally defaults to tab separated. To fix all these things I run bash scripts and use tools such as iconv and awk

iconv -f utf-16 -t utf-8 sales.csv > sales_utf8.csv

will convert your utf-16 to utf-8.


This is the bash script I use to convert an Cognos tab separated files into straight csv (comma separated).

    awk 'BEGIN{FS="\t";OFS=",";Q="\""}
          {for (i=1;i<=NF;++i)
             if ($i ~ /[",]/)
               $i = Q gensub(/"/,Q Q,"g",$i) Q
          }
          {$1 = $1}
     1' sales_utf8.csv > sales_utf8_awk.csv

Oooh Evil Cognos... lol.

I'm having to deal with data translation and import so the way I did it was to write a simple rake task using the CSV class that imports my data into the appropriate models. I gleamed some of the code from Chris' CSV upload tutorial he posted on GoRails recently. If you need some help or example code, let me know. I love this sort of stuff.


Login or Create An Account to join the conversation.

Subscribe to the newsletter

Join 27,623+ developers who get early access to new screencasts, articles, guides, updates, and more.

    By clicking this button, you agree to the GoRails Terms of Service and Privacy Policy.

    More of a social being? We're also on Twitter and YouTube.