What's up guys? I apologize if I'm off a little bit this episode, I'm recovering from a cold, but yesterday I tweeted out a tweet that said basically: Hey, pay attention and make sure that you're using database indexes or indeces on your database for the queries that you do, because on a somewhat small database I had, it's about 500 megabytes, a query on that that just said: Give me the records in this table, and match these two columns. That would take three seconds in our database, which is really slow, and if you add database indexes to it, it went down to an average something like 2.3 to 3 ms instead, so it was basically 1000 times improvement, which is really impressive. This is one of those features that was not covered enough in most of the tutorials and things, so it's kind of assumed that if you're going to get deeper into SQL stuff, that's not where rails are going to talk about this stuff. You can go learn about SQL on your own, but I want to talk about that, because it's important to actually making fast rails applications. This isn't something that you're going to have to care about for a long time because most of your applications aren't going to hit these performance problems, it requires a fairly good chunk of data in your system, but it can really really improve your query speed. In this episode, we're going to open up a really simple rails application that I've built. There's really nothing to it, let's open up the schema file, and we'll see what we've got here. I have a user's table with an email and a sign-in account and that's it. There's nothing more to it, and I've gone and used the faker gem to create somewhere over two million number of user records. If we open this database file up, this is just our SQLite database, that's about 200MB of user data, so the two million records kind of transfer into a 201MB file in sqlite. You can see that we're already starting to push of pretty large amount of data at two million records, and if we open up this in our console, we can see that the user account is 2,231,077 records, so that's quite a fiew records, and I have an email address that I generated here, that we can go look up the first user, and we'll look up this other user to see what our performance is like for our queries. You can see here that when I did a count, a simple count on the user table, it took 97 ms, and normally if you're doing this in your application, you're probably doing this in development with a small database and you'll notice that that's like 0.1 ms, something super fast, but even grabbing the first record here is actually pretty fast, it's 0.2 ms, so that's really really quick, but if we were to go and do queries on this database, then it becomes a little bit tougher to do performance-wise for the database, let me grab this and paste that in, and so if we try to find the user where the email is morgan, we're going to see some pretty large query times. My example yesterday was that mine were about 10 times slower than this, and I had about a three times larger database. This even gets impacted a little bit more when you also query for multiple things and say: Well, I only want where the sign-in count is 47 or whatever. You will have different tables, you're querying against different columns you're querying against, but the more of these you kind of add in there, the more complicated your queries become, the slower your database ends up being, and indexes are really designed here to solve this problem. It shouldn't take you a third of a second in order to grab the user's that match this email address. That's pretty slow, and it's going to make for a very slow web application, because you're probably going to be doing a handful of queries, and if they're all taking a third of a second, your website is going to take at least a second to generate before it even gets sent back and forth to the user's browser, and that's unbelievably slow. Before we go implement database indexes in our example, let me mention real quick that you're going to want to track your production SQL queries and monitor their performance when this becomes a problem. Number one you can probably just look in your rails logs, if you're tracking the SQL queries in your logs, you can just look at those and check those once in a while and just see how slow they're going, but if you're using something like Heroku, skylight, new relic, any of those tools that actually monitor your database query performance, you can just use those and they often give you tools or easy ways to say: Give me the slowest database queries, and then you can go there and start focusing your time and knocking those of the list. Our example is really simple, we need to index the user table here, and we're probably going to normally be doing queries where the email address needs to be matched, but maybe we're also doing multiple column queries, so we're saying: Where the email is this and the sign-in account is whatever. This is just an example, but if you're searching for different things, you're probably doing blog posts that were published at anytime after this time, and then they're also not in draft status or something like that. You're probably in most cases searching on multiple columns, so we're going to talk about how to add an index for multiple columns and how that benefits you in your database queries.
Rails provides an add_index method that you can call in your database migrations. You probably have seen this in there in a couple places like your devise migrations, but for normal scaffolds and stuff that you do, you probably haven't automatically added those. You don't really need to add indexes, until you know exactly how your application is going to function so you can add the proper indexes in. If you add too many indexes, every time you have an index, it has to keep updating the main table, it has to update the index, anytime you insert a record it has to do it multiple times or update a record as well, so you actually have to keep an eye on how many indexes you're addin because number one, you're eating up more storage space, and number two you're slowind down your inserts and your updates in your database. While this improves query performance reads, it actually slows downs writes a little bit. You're getting the tradeoff there, and in most cases, you're doing heavier on the reads, so it's ok to add some indexes in there, just don't go too overboard with it unless you know exactly what you need. The add_index method really just requires two things. Number one is the table name and number two is the columns you want to index. If you're querying on a single field all the time like an id in the database, that's automatically going to be index. So the primary keys are automatically indexed for your records, but if you query on something like an email address because your users log in with their emai, then you want to add an index for say the user's table and the email column. If you're using user names to log in, you would want to index the username column, or if you allow both, you might want to add separate indexes for each one so you could query those as well. If there's cases where you actually query for two columns at the same time and you say: where the email address is this and active is true, then you can actually create an index that includes both of those, and in the second parameter you would pass in an array of the multiple column names. This would create an index on two columns, and it would allow you to make those queries faster. In our example here, if we did the query on the email and sign in count all the time, then we could acually say: Let's index both of these together and then that will make that query much much faster. This example here also has the unique option turned on, and that basically is going to say: If you applied this to your user model or table, and you had the email column is unique, then it would allow you to never be able to insert the same email into the database multiple times. If you ever tried to sign up twice with the same email address that was already in the database, it's going to give you an error on the database level, and then you can also do the rails validation for uniqueness on top of that, so you can enforce your data on validations inside the database. They're just a little bit more simple in your database than they would be in rails, but it's always good to be able to do that when you know for sure that you never ever want duplicate for example emails in your users table.
Transcript written by Miguel
Hey Chris, this is exactly the sort of thing I would love to see more of. There are so many resources out there for building a rails app in development but very few resources like this to give us tips on production. Keep them coming! Thanks
Oh really good topic! I usually try to avoid it as much as I can, but there's always uses for it because you'll never be able to fully express SQL with an ORM.
I was just reading this post on doing trending posts in Postgres and it was one of those great places where you'd want to drop down to raw SQL to create those functions and things that you could reference later with ActiveRecord: http://sorentwo.com/2013/12...
I'll have to definitely do a screencast on this.
What do you guys think about the lol_dba?
I found it got all the basic relations I needed and then I built on top of. My apps production performance is crazy fast now!