Searchkick: Reindex on model in multitenancy through default scope app
Has anyone had any success indexing their multitenant data with searchkick? I followed the suggested article in the readme (https://www.tiagoamaro.com.br/2014/12/11/multi-tenancy-with-searchkick/) but this results in an index for each tenant/model combination which will become expensive and is not scalable.
Therefore, I am trying to create one index per model, for all tenants (ie one index for the model rather than 100 indicies if I have 100 users). When I try to run reindex
I run into an issue because the default scope is applied and no data is returned where tenant_id
is null.
I can get around the default scope issue by using something like Product.unscoped.reindex(accept_danger:true)
, however, the default scope is still called when loading associated data. So rather than:
class Product < ActiveRecord::Base
belongs_to :department
def search_data
{
name: name,
department_name: department.name,
on_sale: sale_price.present?
}
end
end
I need to use:
class Product < ActiveRecord::Base
belongs_to :department
def search_data
{
name: name,
department_name: Department.unscoped.find(self.department_id).name,
on_sale: sale_price.present?
}
end
end
Can anyone suggest a better way of using reindex
with this multitenancy setup?
Hey Mark,
Yeah, so I think the one thing with multitenancy is that the goal is to truly separate out all your data between users so they never intermingle. Most people don't actually want or need that, but some do for security reasons. Sounds like in your case you don't really need it.
I'm not sure of a better way of structuring this for you because regardless you're going to be stuck within the tenant. What if you don't use tenants and instead make sure that you scope all your queries to the current user or organization?
Thanks for replying Chris. I actually scope my queries to business which is similar to organization, I just used tenant as the example because I thought this was the common terminology. I'm going to keep trying my current implementation as per above but I'm going to try using Cloud Front in production as they don't have artificial limits on indices and shards as some other providers do.
Probably a good plan. Yeah the thing is that most times "multi-tenancy" is more for when you truly want separate databases and everyone's stuff separated out. I think it's a common misconception and one that's kinda hard to make clear at times. Sounds like a decent plan and you can always go back and change things up later, it just may take a little longer with production data later on which isn't that bad.
Rails has a bug with scoping where unscoped is not applied to the block. This is discussed here, with a solution here. This was also back ported to Rails 4.2 in the stable branch https://github.com/rails/rails/pull/25232
I've tried to use stable with gem 'rails', :git => 'https://github.com/rails/rails.git', :branch => '4-2-stable'
, however, the bug still seems to exist for me. I've tried to search the rails code to see if the code for the patch is present but I can't find it. I was influenced by this comment
Any suggestions on how I can make sure I'm running 4-2-stable with the commit I need?
Hmm, I don't see what query block you're referring to? The unscoped method you use doesn't have a block and the issue more just stems from indexing the database where the tenant isn't set.
It looks like you've got the correct url for using the gem from github, although I don't think that's your problem. I don't see anywhere this is calling a block on the query, and your real issue is still probably the same as before. You're basically indexing but no tenant was ever set.
I think the solution for you is to build your own index rake task. You'd loop through each tenant, set the Apartment Tenant, and then index each of the records inside of it (rather than in bulk). Not sure why I didn't think of that before.
Based off https://www.tiagoamaro.com.br/2014/12/11/multi-tenancy-with-searchkick/ you could do something like this:
namespace :searchkick do
desc 'Reindex all models on all tenants'
task reindex_tenants: :environment do
Rails.application.eager_load!
Apartment::Tenant.each do |schema|
Apartment::Tenant.switch!(schema)
Searchkick.models.each do |model|
puts "Reindexing #{model.name} on #{schema}"
model.reindex
end
end
end
end
This isn't modified from his code, except that you're not specifying separate indexes. This will set the tenant for each, it will find all the indexed models, and then it will also go and query for those records that are available. It'll do this once for each Tenant, which means it will find different sets of Product and Department records each time.
You will probably want to go back to using department_name: department.name,
because you'll be in side the tenant this way. I believe this should do what you need because it's properly setting the tenant. Curious to see if that works for you.
Thanks for taking the time to reply.
I previously implemented multitenancy with scopes following this railscast
Where you said:
You'd loop through each tenant, set the Apartment Tenant, and then index each of the records inside of it (rather than in bulk)
...
except that you're not specifying separate indexes.
Seeing as I'm not specifying separate indexes, then I believe when I change the tenant (business for me) and reindex
then the index will not contain results for both businesses, only the one that I most recently reindexed for.
For example, product.rb:
default_scope { where(business_id: Business.current_id) }
searchkick index_name: -> { [ model_name.plural, Rails.env].join('_') }, settings: {number_of_shards: 1, number_of_replicas: 1}
custom rake task:
Business.current_id = 1
Product.redindex
Business.current_id = 2
Product.reindex
If we perform a search after the custom rake task then it will only have products for business.id = 2
The reason I was looking into the rails bug with scoping was because I want to use a join
within my search_data
:
def search_data
{
column_name: Parent.unscoped.joins(:grandparent).where(id: parent_id).pluck("grandparent.column_name")[0].presence || "",
}
end
With my current version of Rails (4.2.7) the unscoped
is not applied properly. I believe the team decided that is correct behaviour but not when used in a block, so:
Grandparent.unscope do
Parent.unscoped.joins(:grandparent).where(id: parent_id).pluck("grandparent.column_name")
end
Should ignore the scope with the patch applied, but for me I can't get it to work with 4-2-stable (though I can get it to work on my Rails 5 test branch)
Yeah, I guess if the reindex on a model clears the index before adding in the records, then that won't work.
However, then you should be able to go through each record and call reindex on it individually. That I know won't clear the index and so you could compile a full index of all the product records after looping through the tenants. It might be a tad slower on the initial index, but that's only going to happen the first time you index the full database. From then on, you'll be indexing things inside the app when changes are made so it should stay in sync just fine. And you won't need to do any of that department unscoped querying either. You can just access through the record directly.
On the unscoped part, you aren't calling a block there so I don't think you won't be running into that Rails bug.
So, for example, I could do something like:
Business.current_id = 1
Products.all.each do |product|
product.reindex
end
Business.current_id = 2
repeat above
????
I didn't know you could call reindex
on each record. I'll give that a try.
Regarding unscoped, when I change my code to use a block it should then therefore ignore the scope, but it doesn't. So I think I am running into the bug when I'm trying to use a workaround with a block (I could be wrong).
Yeah in theory. I'm guessing that Business.current_id sets the tenant?
The callbacks for when you update a record are basically what you'd be tying into here. Anytime you update or delete a record, it needs to update or delete that item in the index. Rather than doing a bulk insert, you'll do them one by one so you can control the tenant stuff, which was the problem with the bulk imports because they couldn't handle the individual records.
You should keep your code then as if it were always in the proper tenant, so your model should look like it normally would:
class Product < ActiveRecord::Base
belongs_to :department
def search_data
{
name: name,
department_name: department.name,
on_sale: sale_price.present?
}
end
end
The reason for that is because this way you'll always be in the correct tenant, so you'll always be able to look up the department just fine.
Yes, Business.current_id
sets the tenant.
It's working now using reindex
on the individual records like you suggested. The callbacks for updating the item in the index are also working. Elasticsearch (with Sidekick) is quite impressive to see it up and running when it's working. Thank you so much for all of the time and effort you have given.
Are there any GoRails episodes that you would recommend for learning how to use gems in general? By this I mean, unless functionality is specified in the readme, I struggle to understand how it works. I often look in http://www.rubydoc.info/gems/ without much success. I occasionally download the source code for the gem.
An example of a problem is the code you previously referenced: Searchkick.models.each do |model|
I tried Searchkick.models
but nothing was returned. So I looked in the usual places (readme and rubydoc.info) and couldn't find any helpful information on Searchkick.models
. The link you gave that supplied the sample code also stated Searchkick.models method is available on versions 0.8.6+
, I'm using a version later than that so that shouldn't be the problem. Are there any episodes that could help me improve this type of learning of gems and their functionality?
So there isn't really anything specific other than I read the source code of gems. The undocumented details and important stuff is almost always hidden away inside the source for the gem unless it's a very popular gem.
For example to learn about Searchkick.models
, I would just search the repository for models
which I would assume would be a method somewhere inside the gem or a class variable and I'd start poking around that. The key with that would be figuring out what it does, how it's used, etc.
Today I was poking around the source code for docusign_rest to learn how it worked because I couldn't get some options passed over the API correctly. 15 seconds of looking at the source later, and I knew exactly what was wrong.
The thing with gems is realizing they're not a black box, they're just regular ruby code you would have written, but they're packaged up nicely for people to reuse, so you should always feel comfortable reading the source for that. It feels daunting at first, but honestly all the code in the gems is generally pretty much what you would have had to write to make the feature work if they didn't do it for you. Almost every time it's pretty logical when you dive into it, especially when you're curious about very specific bits like Searchkick.models
as you don't have to understand how things work completed, just the small piece.
Thanks for the informative and reassuring response. Gem code does feel daunting to me at this stage but remembering that it is just regular ruby code does help. I will continue to look at the source and expand that comfort zone.
Thank you for all of the help and thank you for GoRails.
You're welcome and I'll definitely try to see about doing this as an episode moving forward. I think it'll be kinda tough to figure out a good example for this that really showcases the idea, but if you have any ideas I'm all ears! I tried doing this a couple times before but it's just one of those things I think you learn over time and kinda hard to appreciate until you've done it a few times.
Maybe you could do a short episode on reading the gem source for searchkick to determine what the reindex
method does? I don't know if that's too specific to be useful to your whole customer base.
With reindexing by the individual record I found out that I needed to use Product.reindex(import: false)
to create the index first as using simply record.reindex
wouldn't apply the Searchkick settings (ie search_data
, index name, etc). Discussed here.
I guess reindexing by the record does have the disadvantage that every time I:
- install or upgrade searchkick
- change the search_data method
- change the searchkick method
I'm going to need to recreate my indices again with Product.reindex(import: false)
, and then loop through the records (with ActiveRecord) to reindex each record individually. So it's obviously not an ideal way of doing things, but the only other alternative is using unscoped
in a block with the patch applied (which works in Rails 5). I would assume that Model.reindex
could be significantly faster than looping through with record.reindex
You're definitely far deeper into Searchkick than I've ever been at this point. :) I agree, some advanced searchkick usage like aggregates and geosearch would be really great to cover. Maybe some custom index stuff like what you're up to would be valuable as well.
So that import: false
basically just tells it to create a blank index and ignore all the records in the database right?
Actually... you might check out what the Model.reindex
code does. You might be able to pull some chunks from that to create your own method to do the bulk reindex and not clear the index each time. That might let you build a custom method for indexing that could take advantage of any bulk indexing they might have as well as support your multi-tenant application.
So that
import: false
basically just tells it to create a blank index and ignore all the records in the database right?
Yes, that's what it appears to do in my testing. A blank index with your specific Searchkick settings applied (whereas record.reindex
will not create the index correctly if it hasn't been previously created).
I agree that it's a good idea for me to see what Model.reindex
does, and use that code for my own method, I'm just struggling a little to read the gem code and that's why I thought it would be a great idea for a short episode (well for me anyway). How to understand how to read a gem and figure out where a method is, how it's called and what it's doing. But again I understand that this may be too specific for a general episode to be useful to a large number of people. I'm sure I'll figure it out with some perseverance.
As an update, I wouldn't advise reindexing by the individual record when you have a large amount of data. My custom rake task has been running for approximately 18 hours and it's still not finished. This approach does not allow for zero downtime reindexing either, which isn't a problem if you don't plan on changing the Searchkick mappings/structure, but if you do, you'll need to write some custom code to try and perform zero downtime with using import: false
. So far for me, creating the custom task is taking a lot of time and doesn't seem worth it.
I'm upgrading my app to Rails 5 at the moment which includes the scope
patch, so I will go back to using the default Searchkick methods and scope my search_data
, ie:
class Product < ActiveRecord::Base
belongs_to :department
def search_data
{
name: name,
department_name: Department.unscoped.find(self.department_id).name,
grandparent_column: Grandparent.unscope {Parent.unscoped.joins(:grandparent).where(id: parent_id).pluck("grandparent.column_name")}
}
end
end
For future reference, after pulling code from the Searchkick gem, my custom rake task (that I am currently moving on from) began to look like the below, though I haven't applied tenant/business scoping yet:
#scope = searchkick_klass
searchkick_index = Searchkick::Index.new(Department.searchkick_index.name, Department.searchkick_options)
searchkick_index.clean_indices
index = create_index(index_options: Department.searchkick_klass.searchkick_index_options)
# check if alias exists
if searchkick_index.alias_exists?
# import before swap
Department.searchkick_klass.find_in_batches batch_size: 1000 do |records|
if records.any?
event = {
name: "#{records.first.searchkick_klass.name} Import",
count: records.size
}
ActiveSupport::Notifications.instrument("request.searchkick", event) do
super(records)
end
end
end
end
# get existing indices to remove
searchkick_index.swap(index.name)
searchkick_index.clean_indices
index.refresh
```
The performance on that is probably similar to writing and committing records one by one on a csv import vs writing a transaction and committing everything at once. You'll have a lot more speedups writing everything in bulk.
Are you sure that the scope
thing is actually going to solve your problem in Rails 5? I thought we determined that wasn't going to help as it wasn't related to your tenant issue?
writing a transaction and committing everything at once. You'll have a lot more speedups writing everything in bulk.
I was trying to figure out how to do that in my custom task before I accepted that I'm probably wasting too much time and should just put a workaround in place to use the defaults offered by the gem
With regards to the scope
, in the latest release of Rails 4, without my tenant/business set, if I run:
Grandparent.unscoped { Parent.unscoped.joins(:grandparent).where(id: self.parent_id).pluck("grandparent.column_name") }
then the business scope
is still applied and no result is returned (the query contains AND "grandparents"."business_id" IS NULL
wherease, in Rails 5, if I run the same command the business scope
is not applied and this time I will receive the result I expect (the query does not contain AND "grandparents"."business_id"
So I figure if I setup my search_data
to use unscoped
in blocks as per what works in Rails 5, then I will be able to use the standard Searchkick Model.reindex
rather than record.reindex
and avoid needing to create my own custom task that allows for zero downtime reindexing when using (import: false)