Skip to main content

28 Rails Counter Caches

Episode 172 · February 6, 2017

Learn how to use counter caches in your Rails app

Performance


Transcripts

What's up guys, this is a little bonus episode where we're going to talk about Rails Counter Caches, how you can add them in at the very beginning of your Rails app, and also how you can add them in later on for performance improvements. So what we're going to do is basically create a little example of a forum so users will have threads. Users will have the number of threads that they created, so we'll have a count on them, and ForumThreads of course will have a count as well for the number of forum posts that are in the database. So that is going to be our simple example. This could apply to anything you might possibly want to do. Twitter is a great example, where you go to a user's profile and it shows how many people are following that user, how many they are following, and how many tweet's that they've made. So there's three associations for Twitter, and they definitely want to cache that value because if you were to query, you know, my 30 or 40 thousand tweets or whatever ridiculous number that is, and count those up in the database one by one, that's ridiculous, and that would be way way way way too slow for building a site like Twitter, and this is a really really simple feature to add to your Rails app. This came up from a suggestion on the forum, so thank you for that, and let's dive into our example. So we're going to create a new Rails app, and let's just call it

rails new forum_example

keep it nice and simple, this is going to have three models, remember? we're going to have users, we're going to have forum threads and forum posts. So if we cd into our forum_example, we can then generate our mode for User, they'll probably need a name or email whatever, you know, you might create this with Devise or something, or you might just roll this from scratch, we're going to not do anything other than a blank model with name, and we'll also add forum_treads_count as an integer. This is going to be the column that the counter cache will update, and we'll make it tweak to the migration to default the value of this integer column to the number zero, so that it always has an integer that we can work with and display.

rails g model User name forum_threads_count:integer

So that is all we're going to need on the user model, this will go ahead and create our migrations for that, and once we're done creating all of our model migrations, we'll go edit those and add in those defaults.

So here we go, we will have our model for ForumThread, this is probably going to need a name for the thread, we will also want our user's references. This is going to generate a user_id column that is an integer, and it will point to the user. That's the column that we will monitor, and anytime that you create a thread we will look up that user, and then update their forum threads count whenever we create a forum thread or whenever we destroy one, we can decrement that forum threads count. So here we're also going to want to add forum post count as an integer, and that way the forum thread can keep track of how many posts it has, you might want to display that number, so we will add that here as well, and we can generate that model

`rails g model ForumThread name user:reference forum_posts_count:integer

and we can also generate our model for ForumPost, and it's going to need a body as texts, but it's also going to need a user references because every post comes from a user, we just need to display, you know, who posted that, and so we'll have that user references, but notably we did not add the forum post count to the user, we're going to add that in afterwards, and show you how to add this to an existing database. So you will already have data, and you will need to go populate that column that you just add, and we'll show you how to do that after we do this basic example. So this one also needs forum_thread:references so that that can be the association that goes and updates this forum_post_count, and that should be it.

rails g model ForumPost body:text user:references forum_thread:references

So if we open up MacVim now, and go into our database/migrations, we can add a default of zero, and I'm going to also say null is false, so that we always have some sort of integer for these columns. So that's going to be good, and we can paste that in as well on forum_post_count, and that will set up our database. Now we can go ahead and migrate that in our console, so let's run

rake db:migrate

and once that is done, we will be able to test this out just after we go and make a counter_cache option as true on our association. So the reason this belongs to the belongs_to is because whenever you create a forum post, that is actually the trigger for a callback that counter cache true will go and execute. So whenever you create one, whenever you delete one, this is going to go look at the belongs to, and say hey: You have this column, I want you to go and increment or decrement that number, so that we can display that counts and keep it in sync, and so the reason it belongs on the children, or the child model is because that is where the action is really happening, so it's either getting created or destroyed, and then it goes and updates the parent who has the cache. So that is all we need on forum posts:

app/models/forum_post.rb

class ForumPost < ApplicationRecord 
    belongs_to :user 
    belongs_to :forum_thread, counter_cache: true 
end 

and ForumThreads is going to need the same thing on :user.

Now before we go test this on the Rails console, we need to make sure that we get the other side of the association set up, so we have has_many :forum_posts here, on the forum thread, and the user has_many :forum_treads, but they also has_many :forum_posts because we created that association as well. So we have those, the user to a forum thread, and the user to a forum post, and we also have the user thread to the forum post. So we have kind of three associations going on there, and they're all kind of interconnected, so that is kind of what you would need for a forum. All of that is good, and we can run our Rails console, we can create our user that we want to test all this with, let's call him "Bob". Bob is going to create a forum thread, so we can create from the association, and automatically set the user_id there if we use that association, and create that first thread like so. And you will notice that the forum_post_count is automatically set to zero, and that works great, and we got that insert, and it set it to zero, and all of that, on a database leve, you'll notice that none of the parameters passed into the insert into was actually that zero, nothing that we had to do controlled that value, so the database level is setting that default to the column, which is nice, but you will also notice that we issued an update to that user, and we took that forum_threads_count` and we added one to it with this (COALESCE("forum_treads_count", 0) + 1). so in the database, it actually took this, coalesced the count of that column, and said: Well if it's null, let's set it to zero, and then let's also add one to it. And all of this is done on a database level so there's no ruby involved for incrementing or decrementing that column, which is really neat. So this then gives us: Bob with a *forum_treads_count of one. So we have that automatically working, we can then do the other type of ForumThread.create, where we go directly through the model. We then need to set of course user to Bob, and the name of that could be "Second thread", and we can create this, it's going to do the exact same thing, and sets up Bob with forum_treads_count as two, so it's automatically updating that variable that we have as well on our user variable. The letter u is our variable for Bob, and that's keeping that up to date on him as well, so that is all working super well, and giving us exactly what we want.

If we were to create a third one, let's go up here and create a third thread, we can then go and do that. Bob is going to add forum_treads_count as three, but if we then go and destroy that.

ForumThread.last.destroy

We will get the delete, and we'll also get an update where it decrements that form_threads_count, so it's keeping track of those when we create and when we destroy automatically, and it's setting those callbacks by simply adding that counter_cache = true option. This of course all works as well with the forum threads and their associated forum posts, so let's create a couple of those. Let's grab that forum_thread_first and it creates one of a forum_post on there, so we'll say:

ForumThread.first.forum_posts.create(user: u, body: "test")

Let's go do that a couple times for that thread, so I've created a bunch of those, and we'll use that to go back in later, and make sure we can then add a counter cache for Bob, all the users that we can keep track of how many posts they've made. So we've created those in the database, but we don't have a counter cache, and we want to add it afterwards, and so this is where a lot of people will be adding counter caches.

So this is the situation where you might have an existing database already, and you want to add counter cache to it, so that you can have performance improvements, for keeping track of that, and displaying that number for each user. But one of the problems is that when you create the forum_post count column on the user, it's going to default to zero, but as we know, Bob already has a bunch of records, so if he deleted one of his forum posts, that would actually go to negative one, because it would be zero by default. So our migration this time needs to be slightly different where we add the column, but we also go and update the value for every user in the database so that we go and count that user's number of posts, and we go set that value to the correct value so that once that is done, everybody has the correct number of forum posts. So let's dive into that.

This is going to look really similar to th way we did it before, we're going to generate a migration instead of a model, and we're going to add

rails g migration AddForumPostsCountToUsers forum_post_count:integer

We can generate the migration, and then we're going to edit it and we're going to do an extra little bit this time, so we're going to go to Vim and open up that new migration, this time we're going to add default: 0, null: false, and we need t go through all of the existing records in the database and update them with the current count. So Rails actually provides a reset_counter method, that you can pass in the id for a record, and then the counter, which would be forum posts, and it will go ahead and update that column for you. So I’ll show you that example, but that can be a lot slower than another option that we'll talk about after that. So let's try this first one out to see how that goes. What we need here, is to find each of the users, and we can take them, and then reset the counters on it. So this syntax is a little odd, you actually go and say User.reset_counters, and we pass in the user id, and the association that we want to count, so forum_posts, and that will loop through each of those users and set that, now remember, we need to go and set up that count on the user's forum post. This does not go on the user (model), this goes not the forum object, so we go to app/models/forum_post.rb, and we put that counter cache on here, so we want to make sure it's on the user side there, and this should loop through each of your users and reset the counter.

Of course we should try this, and see if it works, and let's run

rake db:migrate, and this should take a second to load those users out of the database after it adds that column and to write those values to it. So now if we run

rails console

We should be able to load up our first user, and here we can see that the forum_post_count is actually the value of six. So earlier, I created six forum posts for Bob, so that is updating that correctly to the value that is set in the database, and that works well. Now I'm going to roll back this migration, because I want to show you another option that you can use to make this a little faster. Now in our situation here, this is pretty fast as it is, we only have one user in our database and we only have six forum posts in the database as well. So we have a small database.

If you have a database that's larger, like a hundred thousand records, or, you know, a million records or something like that, you can use raw SQL here to actually improve the performance of this, so that you're not loading all those records into Ruby land, and updating the count, so you can do that all in SQL instead. So let's take a look at how we do that.

I'm going to clear this out, and pull up the browser, and Ryan McGeary wrote a really great post on this, so I want to give him a shout out. And the example that he uses, we can grab the code from that and dump it into our migration. So his example here actually executes a raw SQL update, and it does a nested SELECT, so it will select the count for, in this case the comments on a post, we will change that to work with our forum posts and forum threads or users, and we will go through and update the users, set the forum_post_count and so on, so we'll go make our changes for our models, but we can use this example in order to make a massive improvement in performance. So for example, with 25,000 posts, and 100,000 comments, this took the ruby version of it, 144 seconds, but doing it in regular old SQL took 1.3 seconds in his example. So that's pretty, pretty big performance difference, and something you might want to consider if you're adding this to an already-large application.

So the set-up for his is a little bit different, we are going to pass this in, and you will notice that he uses this reversible method, and this is actually for migrations that are reversible. So you might remember that there used to be a self.up and self.down in your migrations, but that got changed to this change method, so we only have one, and the reason for that is when you migrate up, you want to add columns, and when you migrate down, you want to be able to remove columns or kind of revert your changes, and so change now actually knows if it's going down, it knows the opposite of add_column would be remove_column, and so if you have something that you only want to do up one time, and you don't want to do it going down, you can use this reversible, and you can tell it: Well, if we're going up, then call this code, and if we're going down you could add that as well with dir.down, and you could set up that however you want.

So we want this code to only run when it's going up, and not down, because obviously we'd be trying to update a column that didn't exist, and that wouldn't make much sense.

Here, we can go modify his comments and posts, and we can say: well, we actually want to update users, and we want to do users.id, and our column associating them is user_id, and for comments, we actually want to replace that with forum_post, let's replace all references with the word comments with the word forum_post and do that. So this is now going to update the users, set the forum_post_count column to the result of this subquery. This SELECT, and so that's going to select the count for the forum_posts where the user_id equals the current user. So it's going to go and update every single record in the database and do that subquery, and do that all in one command which will be super nice.

Now is the moment of truth, we can run this in our console, run

rake db:migrate

because we already rolled back we got rid of that other column, and you can see here that this actually includes an execute, and that is running that SQL. So we can see that it is running that SQL command for us, and our migration completed in almost 0.02 seconds, so it's really quick but our database is really really small, so you're not going to see hardly any performance difference between these two, but you would if you had 100,000 comments or 25,000 posts, so that would be a much bigger database to operate on. So this is another option, if you would like to use that to update your counter caches to be set with the correct values as you're adding that column.

The other thing that I want to point out before we leave this episode is that when you use these columns, and when you call count, you should make sure that you actually reference this column every time from now on. So for example, if we want to find out, for the user’s profile, and we maybe want to print out Bob's number of forum posts, what we could do is say

u.forum_posts.count

This is actually going to issue a SELECT query, and we don't want to do that, the reason why we added this column is we want to avoid making this SELECT count queries, so what you want to make usre that you do every single time in your entire codebase from now on is reference those counter cache columns, so always always always use those columns that you added everywhere in your application, so that you're not still selecting counts for these records, otherwise, you're not getting any benefit from it, and there's no reason you added that counter cache.

Here you can see that we ran this, and it did not hit the database whatsoever this time around, that's exactly what we want, and you can use these then in your views and print out that value, or wherever else in your Rails app that you want to display that information, or query with it or whatever. Always use those columns, and you'll get the benefit of that in performance.

That is it for this episode, talked way more about this than I thought I might, but hopefully that was useful and you get your head fully wrapped around it, and you can take a look at the raw SQL update which I really liked, and I thought it was really great to be able to include this episode. So I will talk to you in the next one. Peace v

Transcript written by Miguel

Discussion