Skip to main content

28 ActiveRecord Explain

Episode 173 · February 16, 2017

Learn how to use ActiveRecord Explain to analyze your SQL queries generated by ActiveRecord

ActiveRecord


Transcripts

What's up guys? I want to give you a quick introduction to ActiveRecord explain method which you've probably never used before, this method can be run on any sort of ActiveRecord relation, so anytime that you make a query that returns multiple results, you can use the explain method at the end of it, and it will go ahead and break down that query using your database's explain query, and that will actually take your query, and tell you exactly how your database is going to interpret it, how it's going to plan out that query, and then go and execute it. That way, you can learn about what your database is doing behind the scenes to optimize that and make it more efficient. This is something I don't see people using or talking about very often, so I wanted to talk about that.

This application that I have here, I have two things, I have a model for our users, users have many tweets, tweets belong to a user. Simple as that, our database schema file is just those two tables, no indexes on that notably, and we're going to talk about that and see how the explain method can give you some insights on to your queries. So if we go into our rails console, there's a couple queries I'm going to run, first off, I'm going to say

User.where(id: 1).joins(:tweets).explain

If we call explain to this, it's actually going to run that query, notice it takes about 91 miliseconds, and our query plan is printed out for us, we did a nested loop, and it's going to do an index scan on the user's primary key, so every time that you query on the id, that's going to automatically be indexed, so we do have an index on that one, and that's goint to be cost=0.15 start up time, and the estimated cost is about 8.17, it predicts that it's going to return around one row back, and that row is going to be about 20 bytes in lenght. Then it's going to go and do a sequence scan for all of the tweets where filter is going to match the user_id = 1, and a sequence scan is going to go through every single record on the database because it's not indexed. This is going to take a start up time of about 0, but the cost of the entire sequence scan is predicted to be 8300 which is very very long compared to this 8. So we know that this is about 1000 times slower than doing that index scan for the user where the id = 1, and it's predicting to return about 40,000 rows back, and that's a lot fo rows compared to one. Of course it's going to take longer, but 1,000 times longer? That's very very slow in comparison. Let's do another one

User.first.tweets.explain

It's just going to query the tweet's table, and you'll see that this is going to do pretty much the exact same thing as that sequence scan. Now the width before was 4, and the width now is 34 because we're actually pulling the tweets from the database. Before, we were just using the user columns, and we weren't selecting anything from the tweets column.

Now neither of these are indexed on the tweet's user_id column, and you'll notice that both times we did a WHERE, and the tweets user_id column was referenced. So it would be beneficial for us to index that, and that we can tell because it's doing a sequence scan instead of an index on that. If it had an index, then this would be a lot faster, and we see there's an index scan instead of a sequence scan. Let's run

rake db:migrate

Where I have a migration to run that index to there, and I can show you that, it's just your typical migration where you would add an index, and anyways add_index :tweeets, :user_id Simple as that, that's all we've written. Going back, we have added that index to the tweets, and now if we run rails console, we can do these two queries again and compare those results. This time, it took about 53 miliseconds, and if you were to run these a couple times, you would see some better avarages of the exact time that that takes, and you notice this is maybe getting close to being about half as long with the index version, and that's because it's hitting two indexed now in comparison to before. Plus this time, the cost is actually 1475, as opposed to 8300, which is significantly less of a cost. Now these cost don't really mean anything, they're not miliseconds or anything like that, they're not time measurements, they're estimates on the costs on the calculations it might have to do, so they're kind of relative numbers that you can think of. You can learn more about what all that means by reading into the explain docs. So let's try our other query here.

User.first.tweets.explain

This time, we're going to get that 50 ms boost as well, so instead of 83 ms, we get a pretty significant improvement here, and we also get an index scan on this as well. So this is making a big improvement for us on a milisecond basis, but we also get to confirm that we're hitting those indexes using that explain command. So where is this explain command very useful? It's not super useful in situations like this, although it can tell you if you're missing out on indexes. This is useful for situations where you have really complex queries and you're joining two tables, or if your'e doing nested selects or something like that. Then, you can have postgreSQL, or MySQL or sqlite or whatever database you're using go and explain exactly what it's doing behind the scenes. And if you want to learn more about this stuff, these costs, and rows, and widths, are all explained in the database docs for the explain command. So here it tells us that this first number cost is 0 is the startup time, time to do sorting, something like that if you're doing ordering. Estimated total costs, and this one is 458. This is stated on the assumption that the plan node is run to completion of course, in practice, a node's parent node might stop short of reading all available rows, if you had like LIMIT or something like that. Then, you can see the estimated number of rows it might return, and the width and byte of those rows.

So there's all kind of information you can gain from this, and especially if you inherit a big rails application that's doing some complex queries, if you're not sure how to improve those, and they're slow or whatever, explain can be very very useful for diving into those, and learning how to improve their performance, or it can just be useful to poke around and understand more about what your database is doing behind the scenes.

I would highly reccomend reading these docs, for postgreSQL, if you're using postgreSQL, or the MySQL one, or the sqlite, or whatever database you're using, there's probably equivalence for MongoDB as well, and you can read through this and start to learn about what's the difference between as sequence scan and an index scan, and there's some other ones. Like what happens when you do limits and filters, and what is a bit map heap scan. All kinds of things happens behind the scenes in your database, and this is really really useful to learn about those complexities because you want to understand your database well if you're trying to build an efficient rails application. You can optimize your ruby stuff all you want, but if you're not using your database efficiently, then you're not going to get near the the performance out of your application as you want. So this is something that can be very useful to learn more about how you're using the database and how you can use it better.

I highly encourage you to look at the sqlite, MongoDB, MySQL, postgreSQL, whatever database you're using, take a look at their explain documentation, and see what you can learn about how your database works. It's always useful just to understand this information, and I don't see hardly anyone going and looking at that. So I wanted to make this episode to show you guys that you can learn a lot about how your database works to improve your knowledge of rails. And in the next episode, I'd like to cover a gem called pg_hero which is a performance dashboard for rails and for postgreSQL, that can suggest you indexes you might want to add, slow performing queries, and kind of use this similar knowledge from postgreSQL and puts it in a nice dashboard that you can put into your rails app to improve your queries and your database set up. So until then, I will talk to you later. Peace v

Transcript written by Miguel

Discussion