Ask A Question

Notifications

You’re not receiving notifications from this thread.

Importing datasets. Recommendations?

Brent C asked in Databases

I have a few datasets that I need to import into my rails app. They vary from 100 records to 5000 which will span across a few models with various associations.

Q.1: I can cleanse the data prior to importing... Any recommendations of data type: CSV, Excel, Tab delimitated, comma separated, etc?

Q.2: Any gems, or techniques that are common use or best practice or should I write a custom rake/chron task?

Reply

Comma separated CSV is usually pretty good standard to go with. Sometimes Excel saves with weird encodings though so it makes it less simple.

I think roo is pretty good for parsing files like these. I'd write a rake task to test it against a local file. The rake task can then just call a method a model so that it's in the correct place for use in your Rails app. Plus if you create a rake task, you can set it up to run nightly with a cron job if you ever had the need for that.

Reply

Perfect!

Reply

I also recommend CSV and try to make sure the CSV file is in UTF-8. as Excel sometimes encodes the files as UTF-16 or other weird things. Also saving an excel file as CSV doesn't always mean it is comma separated. Excel generally defaults to tab separated. To fix all these things I run bash scripts and use tools such as iconv and awk

iconv -f utf-16 -t utf-8 sales.csv > sales_utf8.csv

will convert your utf-16 to utf-8.

Reply

This is the bash script I use to convert an Cognos tab separated files into straight csv (comma separated).

    awk 'BEGIN{FS="\t";OFS=",";Q="\""}
          {for (i=1;i<=NF;++i)
             if ($i ~ /[",]/)
               $i = Q gensub(/"/,Q Q,"g",$i) Q
          }
          {$1 = $1}
     1' sales_utf8.csv > sales_utf8_awk.csv
Reply

Oooh Evil Cognos... lol.

I'm having to deal with data translation and import so the way I did it was to write a simple rake task using the CSV class that imports my data into the appropriate models. I gleamed some of the code from Chris' CSV upload tutorial he posted on GoRails recently. If you need some help or example code, let me know. I love this sort of stuff.

Reply
Join the discussion
Create an account Log in

Want to stay up-to-date with Ruby on Rails?

Join 84,387+ developers who get early access to new tutorials, screencasts, articles, and more.

    We care about the protection of your data. Read our Privacy Policy.