Ask A Question


You’re not receiving notifications from this thread.

how scrabe data every day

corrado tuccitto asked in General

how can I schedule to gather data with nokogiri and store it(only what has changed) in a db every morning?


That's a really open question. Can you give us an example of what you would want to scrape?


Err double post. Disregard.


I have a table with fixed items but variables as min price, med price, and max price change everyday.
For each item I want to scrabe every day min,med,max and store it on db.
the name of the item is always the same, but can be added/removed new/old items so it's important every day to check if there are new items.

here is an example of web site that you can scrabe

many thanks in advance


This isn't efficient nor is it beautiful, but it'll at least get you started. You can use nokogiri as well, but I just used regex as a quick and (emphasis on) dirty solution. Plus I haven't had any caffeine yet today. Here:

require 'mechanize'

page = ""

products = page.body.scan(/<tr class="(odd|even).+?">(.+?)<\/tr>/m).map{|thisproduct| thisproduct.last.to_s.scan(/<td data-title="(.+?)" class=".+?" >(.+?)<\/td>/m).map{|key,val| [key,CGI.unescapeHTML(val).gsub(/(<[^>]*>)|\n|\t/s){" "}.strip.chomp]}}

# products.count == 160

# products.first.each{|key,val| puts "#{key} => #{val}"}
# P. Min => 1,10
# P. Pre => 1,15
# P. Max => 1,20
# Specie => ARANCE
# Calibro => 70-80 (6)
# Cat. => I
# Presentazione => A PIU' STRATI
# Marchio => &nbsp;
# Origine => SUD AFRICA
# Confezione => &nbsp;
# Unita misura => &nbsp;
# Altre => &nbsp;
# Gruppo => AGRUMI

That should get you started with getting the data. What you do with it after that is up to you. :)


First of all thanks for your reply.
I have already developed a solution, but I'd like to see your implementation through threads. By video


Oh, I don't run GoRails. I'm just another subscriber like you.

I'm sure he'll see this, though. :)


python has much better eco system for scrapping data
but mechanize will work as well as


You guys are awesome. :)

Mechanize is good. I've used that before. I've never checked out Scrapy or Wombat, but I'm sure they are awesome too.

Using whenever to schedule this script every morning will make that part easy.

I'll see about adding this to the list of screencasts to cover!

Join the discussion
Create an account Log in

Want to stay up-to-date with Ruby on Rails?

Join 83,038+ developers who get early access to new tutorials, screencasts, articles, and more.

    We care about the protection of your data. Read our Privacy Policy.