Ask A Question

Notifications

You’re not receiving notifications from this thread.

Importing datasets. Recommendations?

Brent C asked in Databases

I have a few datasets that I need to import into my rails app. They vary from 100 records to 5000 which will span across a few models with various associations.

Q.1: I can cleanse the data prior to importing... Any recommendations of data type: CSV, Excel, Tab delimitated, comma separated, etc?

Q.2: Any gems, or techniques that are common use or best practice or should I write a custom rake/chron task?

Reply

Comma separated CSV is usually pretty good standard to go with. Sometimes Excel saves with weird encodings though so it makes it less simple.

I think roo is pretty good for parsing files like these. I'd write a rake task to test it against a local file. The rake task can then just call a method a model so that it's in the correct place for use in your Rails app. Plus if you create a rake task, you can set it up to run nightly with a cron job if you ever had the need for that.

Reply

Perfect!

Reply

I also recommend CSV and try to make sure the CSV file is in UTF-8. as Excel sometimes encodes the files as UTF-16 or other weird things. Also saving an excel file as CSV doesn't always mean it is comma separated. Excel generally defaults to tab separated. To fix all these things I run bash scripts and use tools such as iconv and awk

iconv -f utf-16 -t utf-8 sales.csv > sales_utf8.csv

will convert your utf-16 to utf-8.

Reply

This is the bash script I use to convert an Cognos tab separated files into straight csv (comma separated).

    awk 'BEGIN{FS="\t";OFS=",";Q="\""}
          {for (i=1;i<=NF;++i)
             if ($i ~ /[",]/)
               $i = Q gensub(/"/,Q Q,"g",$i) Q
          }
          {$1 = $1}
     1' sales_utf8.csv > sales_utf8_awk.csv
Reply

Oooh Evil Cognos... lol.

I'm having to deal with data translation and import so the way I did it was to write a simple rake task using the CSV class that imports my data into the appropriate models. I gleamed some of the code from Chris' CSV upload tutorial he posted on GoRails recently. If you need some help or example code, let me know. I love this sort of stuff.

Reply
Join the discussion
Create an account Log in

Want to stay up-to-date with Ruby on Rails?

Join 76,990+ developers who get early access to new tutorials, screencasts, articles, and more.

    We care about the protection of your data. Read our Privacy Policy.

    Screencast tutorials to help you learn Ruby on Rails, Javascript, Hotwire, Turbo, Stimulus.js, PostgreSQL, MySQL, Ubuntu, and more. Icons by Icons8

    © 2023 GoRails, LLC. All rights reserved.