Skip to main content
Background Jobs:

Batching Backgrounds Jobs with Sidekiq

28

Episode 256 · July 31, 2018

Sidekiq Pro or sidekiq-batch lets you run a set of background jobs in parallel and then a callback once they're finished. This is perfect for building more advanced workflows for your background jobs in Rails.

Sidekiq Background Jobs


Transcripts

Batching Backgrounds Jobs with Sidekiq.

This is really cool because it allows you to take a set of background jobs that are running and then batch them together so when they are finished you can call a callback and actually go from.

serial execution to parallel back to serial execution
normally when you are using background jobs, you are going from rails which is running in serial to parallel and then you are done, but this is important for complex workflows because alot of times when you are building something like hatchbox

Example.

Lets say you have to set up 2 webservers with a load balancer
Your load balancer needs the IP addresses of the webservers, so you will have to wait for the web servers to be created before you can set up the load balancer and this is where sidekiq batches come in.

So in this episode we are going to talk about sidekip batches which is a PRO feature on sidekiq that comes with some other features on sidekiq PRO.

For this episode we will use the open source version of sidekiq batches so you can clone this repo and work on it, but on production you will probably use sidekip PRO.

So batches are really cool
you basically just write a little configuration like this.

batch = Sidekiq::Batch.new                                   #line 1
batch.description = "Batch description (this is optional)"   #line 2
batch.on(:success, MyCallback, :to => user.email)            #line 3
batch.jobs do                                                #line 4
  rows.each { |row| RowWorker.perform_async(row) }           #line 5
end                                                          #line 6
puts "Just started Batch #{batch.bid}"
  1. #line 1 you create a new batch
  2. #line 2 you set a description if you want
  3. #line 3 is the most important part where you set a callback The reason the callback being in another class where the callbacks are defined is because, this is all running in background workers than can run across different machines and you are not going to have access to where this was originally run. By the time its successful and calls the callback you are not going to have the original rails request or model instance anymore, that might be on a whole separate machine. This therefore requires you to create a new class instance to go and set it up.

from line 4 to line 6 we use the jobs block to setup your new workers and thats it. Your jobs will run in the batch and upon success will instantiate MyCallback class or whatever you defined it as and pass in any extra options that you may want to add and that's it.

Example

gem "sidekiq-batches"

to add the sidekiq batches gem and then create a jobs
NB With sidekiq batches you want to make sure you stick to sidekiq itself and interact only with it, and not ActiveJob because you dont want to have any ActiveJob jobs interfering with that and it will work much easier with sidekiq. This will therefore mean that you have to migrate all your old jobs to sidekiq and you can do so by following the steps below.

  • include Sidekiq::Worker
  • Do not inherit from ActiveJob
  • call perform_async instead of perform_later

so lets create 2 jobs

You can clone the github repo or create a new rails project first, then you create the jobs.

 #first job
 # create_server_job.rb
class CreateServerJob
  include Sidekiq::Worker 

  def perform(id)
    # Do your work in parallel
    i = rand(1..10)
    puts "Creating server #{id}..."
    sleep 1
    puts "Created server #{id}..."
  end
end

#second job
# create_cluster_job.rb
class CreateClusterJob
  include Sidekiq::Worker 

  def perform 
    batch = Sidekiq::Batch.new
    batch.description = "creating Cluster"
    batch.on(:success, CreateClusterJob::Created, { cluster_id => 999 }) # --- line 3
    batch.jobs do
      5.times { |i| CreateServerJob.perform_asyn(1) }
    end
  end

  class Created 
    def on_success(status, options) #status is the sidekiq status and options will be the hash that is passed in at line 3
      puts "------"
      puts status, options
      puts "Created cluster"
    end

    def on_complete #optional, you can choose between the two, on_success or on_complete
    end
  end
end

line 3 above we can pass in a string "CreateClusterJob::Created" and it will automatically constantize that for us.

The Created on_success code is run only a single time when the batch jobs have finished, so we should see it in the logs only one time. When Created on_succes is run, we no longer have access to instance variables or other local variables because we will have a brand new instance of the Created class when on_success runs. This is why we have the option to pass in the options hash, so that we can determine which cluster to mark as active as the batch jobs are being run.

so to try this out

  1. start up sidekiq sidekiq
  2. Then open the rails console rails console In the rails console we now ruby CreateClusterJob.new.perform and we should see it qeued up five jobs(seen in the array of the job ids), then we can check on the sidekip terminal and we will see that we have jobs running.

So we see that when the 5 jobs are done then we see that the Created on_success callback is run only once at the end of the jobs.

This is cool because it allows us to go from serial - single thing to parallel - multiple thing and then back to serial - single thing

This is very common to use in threading, where we have a single machine spin up multiple threads, do some parallel work and then join and wait untill all those threads are finished and then do stuff in serial again.

But in sidekiq we can do this distributed across our sidekiq workers servers. This allows to scale up across multiple machines

You can do this with the gem but i will recommend sidekiq pro as it will get improved and is more efficient, with better support.

This feature is kind of a requirement as you get to build more advanced backround jobs as you have a workflow you need your code to run, and this helps with those workflows.
e.g
Your Created Callback can start another batch and trigger another one. This can be used to chain batches as I in hatchbox, and is also a complex feature that is required for some applications.

I hope you enjoyed this episode and if you want to see more advanced background workers or sidekiq stuff let me know in the comments below and i will talk to you guys in the next episode.

Loading...

Subscribe to the newsletter

Join 24,647+ developers who get early access to new screencasts, articles, guides, updates, and more.

    By clicking this button, you agree to the GoRails Terms of Service and Privacy Policy.

    More of a social being? We're also on Twitter and YouTube.