Skip to main content

25 Memoization

Episode 111 · March 30, 2016

Use memoization to your advantage to cache results in memory to improve speed


Transcripts

What's up guys? This episode, We're doing a little bit different, and we're talking about just plain old ruby functionality called Memoization, we're doing this on request of a subscriber, which I think is going to turn into a really cool episode if you're not familiar with memoization. If you are, this is pretty standard ruby stuff, but it actually is really interesting if you've never used this before. What we're going to talk about is this really simple example here, where you have a model, or a class or whatever, and it does some sort of calculation, so for example, in our mailbox or code we have a name for the user, and every single time that you call this name method, it has to create a new string in memory, and then insert the id into it, and then finally return that. This is pretty easy, but it actually has to generate this name every single time that you call this method. This is really fast, so this is not going to provide a performance benefit for us to memoize this, but this is an example of one of those times where you are doing a calculation that you could actually save for later so that it would be faster, so let's take a look at what this would look like with a momoized version of it.

For example, the memoized version would look like this

def name 
    @name ||= "#{first_name} #{last_name}"
end

Whatever this calculation is on the right side doesn't really matter, so long as you save it to an instance variable here. The trick with this is that you're using an instance variable, which is actually defaults to nil, and we can actually take a look at how this works in irb inside of our terminal and we can say @name is nil, and this is going to work the same way wether we're inside of a class here or not, this is just a regular instance variable inside ruby, so when you say @name, it returns nil, but if you were to use a regular variable, it would say "undefined local variable", but with instance variables, they don't throw an exception, which is kind of nice, because that allows us to write a method like this that's memoized, so when we know that the instance variable is nil, we ca say @name || "First Last" and this is going to say: Well, if the name variable is something, return that. Otherwise, return the "First Last" string, so if we were to say @name = "Chris Oliver", then we can say the same thing, and we're going to get "Chris Oliver" returned because it short circuits, and it says: Ok there is a response for name, so let's just return that, and that is an important feature of ruby that allows us to do something where we say: Let's reset name to nil so we get nil again, and we can use that nul equals operator, which says: Do the same thing as we did right here, so we'll say: Well let's check to see if name has a value, and we'll return that. Otherwise, we'll set it to a new value, so the null equals operator is just kind of like an or equals combined. So that you could say: @name ||= "First Last", and what you're going to get here is you're going to get this "First Last" just like we did this first time, however the next time you access the @name variable, you're going to get that string. So this is going to do that or, the first time, but it's also going to assign the result to the name instance variable, and that means that you can do this calculation on the right side where you interpolate strings together, any sort of work that you want to do on the right side, you can cache that into a variable and so the next time you call the name method, it just simply returns name if there's a value, and then it doesn't do this complicated calculation on the right side. Now for name, this is kind of whatever. This is very very fast to interpolate strings, but you will save a little bit of time because ruby doesn't have to go look up those variables every time, then insert them into a string and then save that and return it. That's an introduction into memoization, and how it works. You'll see that anytime that you use or you see an instance variable and the ||= operator. This is probably the only time you're going to see the ||= operator in most cases. Sometimes you'll see this is useful if you're trying to look up a record or create a new one if that original one didn't exist, that type of thing, but most of the time, with an instance variable, you're going to see this used for memoization.

Where is this actually useful though? Let's say that you have a method called spammer?, and this actually needs to do a bunch of work to analyze the user's messages in our messaging system, and determine if the user is a spammer or not. This is a case where you probably have to do a bunch of work to do that. You have to go reference all of the messages, and it does a ton of work, and so we can simulate that just by saying sleep 5, let's imagine that that's querying the database and it's running through some algorithms and calculating weather or not the user is probably a spammer or not, and then at the very end it returns true or false, so let's say this returns true. If we were to do this and save this, let's go back to our terminal, and let's go into the rails console this time, and let's say: Let's grab that first user, and let's say: user.spammer? and you'll notice that this takes a really long time because we have to wait for 5 seconds before it returns true and if you're google and you're building Gmail or something like that, or even just building your own method, you don't want to wait five seconds in order to generate your entire html page or do any work, this should probably happen one time where you cache this results, and that is exactly what memoization is great for. If we're actually able to memoize this, we would be able to run that slow query a single time, and then cache that result to make referencing that faster than next time, so let's check this out.

If we were to memoize this, and the trick here is usually you want to name your instance variables very very similar to the name of the method because these instance variables are shared throghout the instance, obviously, you don't want to overlap any of those names and accidentaly override one of the other values. Here we could actually say

def spammer? 
    @spammer ||=    begin 
                        sleep 5 
                        true 
                    end
end   

In the console:

reload!

`user = User.first

user.spammer?

We have to wait our five seconds because this is the first time that it goes through that calculation, we get true, but if we run it again, you'll notice that it instantly returns the true value, and that's because it didn't have to run this block again, it's already saved the value of true into the spammer? variable and we don't have to run this block a second time, or a third time or a fourth time or a fifth time. You can run this method the first time, calculate the result, and then just leave it alone and as long as that value doesn't need to change, then you're safe to continue to cache that. This is nifty, and it's a useful thing anytime you need to do these calculations in a request or in real time, and then save them, but you don't want to save them permanently. If you actually wanted to save this, you would probably save this to the database, but in this case, there's a lot of situations where you just want to cache that variable.

Another example of this that you'll probably be familiar with is any time that you want to connect to a third party API, so for example, if we went to the Twitter gem, let's look that up, and let's take a look at the code for setting up the twitter client. So normally you end up going with a block of code like this, so you would put that inside of a method here, and you'd say maybe let's call this Twitter, and you would create a Twitter client reference like this, and obviously you would change out your keys and all of that, but every single time that you would want to reference the Twitter client, which might be a bunch of times from the user, you actually have to instantiate a new Twitter client, so this is fine, but it actually has to do some work to set this up, you have to look up your keys, you have to save those into the config, it has to run this block. There's quite a few method calls happening behind the scenes just to set up this Twitter client every single time you access Twitter, and so you could say @twitter ||= Twitter::REST::Client.new do |config| and that is going to cache this so that you can implement the initialization just a single time, and then when you want to interact with their Twitter client, you can just access that, it will build it the one time, and then it won't create it every single time and you'll be able to cache that and reuse it a bunch. So memoization adds the ability to add a little in-memory cache that you can build with plain old ruby code, and just a single variable, that's pretty nifty, it's really really simple code to implement, and it actually works just as you would imagine regular ruby code to work, but you're effectively building your own little cache. So this is really nifty, and while you're not going to get major major performance benefits out of it, anytime you do something like this where it takes more than a second or whatever to run a calculation, if you cache that variable in memory, that's going to save you time if you reference it multiple times. For example, if you're displaying a user's name, 30 or 100 times in a page in a conversation, you show their name for every single message, saving this name is actually going to save a handful of miliseconds on that page generation just because you're going to be otherwise, if you weren't memoizing, you're going to be calculating their full name every single time. So two caveats to be aware of. Number one is that for situations like this, memoization only runs the very first time that you access this. So for example in rails you're doing a request, and you're instantiating a new user every single page. You're either looking it up, or you're creating a new user or updating a record or whatever. Every single time you're making a new instance of user, so this will have to run the sleep 5 every single pageview. You can call this method multiple times, and it will only sleep for five seconds one time, instead of sleeping for five seconds every time you call that method. So in a case like this, if you actually want this to store permanently, you're going to want to save that into your database, or redis or some other location. Memoization is really useful for stuff that's in memory that you want to cache and reference again quickly in memory, so between page views, memoization isn't going to add any speed improvements, but it will, every single time you reference a thing more than once. And another situation like this name method, if you did something like this, which chances are you won't, but if you said: Let's create a new user user = User.new(first_name: "Chris", last_name: "Oliver") and you say user.name, what you're going to get is your result is "Chris Oliver", but if you were to say: user.update(first_name: "John"), this time when you call user.name, you're actually going to get the same result as before because this is cached in the instance variable, and there's no good way to update this instance variable nor should you add a way for that, because memoization should only be really used for those things that you want to save permanently. This is a weird case, but sometimes you'll use memoization in places that it shouldn't be, for example like this where you want to change the result, memoization is bad for those situations. Don't do that if you're doing things like this, but almost always in rails, with the name, you're going to do this stuff in the controller, and you'll never call the name method until it's already settled, so you'll be able to memoize and display the name a whole bunch of times and each one of these in this case would now be John Oliver, and it would just look like that in your views, so this could be code that you would run in your views and that would be totally fine. If you're mixing those method calls and changing these variables behind the scenes, then you're going to get some odd results or sort of unexpected results, so you just have to be careful that you are memoizing things that should stay permanently that way, at least for the lifetime of the instance that you're working on. So keep that stuff in mind, it makes a lot of sense for these that are going to be used a bunch of times, so for example in the background, maybe you have a twitter sync that happens, and you need to make a bunch of API requests to twitter, absolutely memoize that, because then you can reference this a whole bunch of times, you can make all your API requests and never have to instantiate the twitter clients more than once, which is super useful for that. Same probably goes with name, normally I'll do this with name, because I'm never doing that odd use case of updating the name after I've looked at it. So that is an introduction to memoization, be aware of the caveats. It can occasionally cause some weird results if you're doing it wrong and you want things to change, don't use memoization for those cases, but in other situations, it makes perfect sense, and it actually can speed up your code quite a lot, because you don't have to do all of this work on the right side of the ||= operator. Use this as you see fit, try it out, see how it improves your code. This is going to make the biggest impact in things like background jobs, rake tasks or whatever that need to do a lot of somewhat repetitive work that actually reference a method a bunch of times. I hope you enjoyed this episode and I will talk to you next week.

Transcript written by Miguel

Discussion