Skip to main content

Sentiment Analysis with the Sentimental Gem

15

Episode 117 · May 5, 2016

Learn how to use sentimental analysis against text inside your application and cache it so you can query against it later

Gems


Transcripts
Loading...

What's up guys? This episode, we're going to talk about sentiment analysis, so we're going to load up some tweets into a rails application, we're going to take the text of the tweets, and then we're going to check to see how positive or how negative those tweets are based on the words that were written in it. Sentiment analysis is basically like: Is this person saying they love or they hate this thing, or are they neutral about it, and so you can take any text and analyze it and estimate how positive or negative that is. Now this is something that we're going to do a very rudimentary version using the sentimental gem, but there's a lot of stuff that you could learn about this stuff, because in language, language is a very complex thing and so people might be sarcastic and something like this may not be able to detect that, but from a rough standpoint, we're going to have a fairly good accuracy based on the words that are in these tweets. For fun, you might search something like "Microsoft", and you might check to see how many people say: I hate Microsoft or I love Microsoft, and you'll be able to tell, based on the word "love" and "hate", you'll be able to determine: Ok, well this word means positive, this word is negative, and so there's different scales of all of those words. This sentimental gem actually comes with a word list, and you can look this up, it's a pretty large file, but basically has all these words, and their values scale from 1.0 to -1.0, and so this is just a way of going about and saying how positive or how negative that is based on the words in the text that you're analyzing, so for example if you say love, it's a 0.925% positive word, and it's kind of weighted based on the usage of it, the language, so there is times when people will say love and it's maybe sarcastic or something, but it's rare for people to say epic or good, or upright or whatever in some sort of negative fashion, so you have all these weighings of the words, and you scroll through this, and you can make adjustments to this if you'd like or add your own words just by downloading this file, and then uploading your own version, loading that into the gem when you load the library. We're just going to use the built-in word list, and we're going to analyze some text, so we'll probably connect this to Twitter, and then load up some searches and determine what are the sentiment in general for each tweet that may be an average across all the tweets. Let's dive into implementing this gem.

Let's create a new rails application, we'll build a model, and then we'll sync tweets into our database using the Twitter gem, and then we'll just hit the local copies of the tweets in order to be able to test this out a lot easier. rails new sentimental_tweets is the name of our rails application, and so we'll install this, we'll install the Twitter gem next, and we'll go ahead and plug in the Twitter client, and then go ahead and create a model for all of that. Let's dive into this, and let's open up the Gemfile, and so here let's get the Twitter gem, and we'll grab the latest version of that, and let's go ahead and create a model so that we can save copies of those tweets. Let's go ahead and do that once the bundle command is finished.

Now that that's finished installing, let's hop back over to the README and take a look and see what we're going to want to save as attributes in our tweet model. This gem actually parses the text that you give it, and then it gives you either a positive, negative or neutral string that you can get back, so you can see that when you call sentiment on some text, that will give you a symbol back. We can save that, and that would allow for easier searching in our database, but the actual detail of the score might be useful in a lot of cases, so you might want to analyze this and be able to adjust the queries based on the score rather than just the very broad, generic, positive, negative and neutral. This score is probably going to be the most important piece to actually cache, and we're going to want to do this and cache that sentiment score and possibly the sentiment in general so that we have this important thing of basically like: We only do this one time, we only make that calculation one time so we'll save it into the database, and then we can look it up because the text shouldn't be changing for these tweets. We will set it up so that anytime the text changes in the tweet, we'll go ahead and rerun the sentiment analysis, and then save those attributes on the model. Let's generate our model, and then we'll go and actually implement the sinking from Twitter, and then sentiment analysis itself. Let's generate a scaffold called Tweet, rails g scaffold Tweet body:text sentiment score:decimal, rake db:migrate We will now just implement the Twitter client in order to sync some tweets down, we'll do that really simply using the Twitter gem and the search that they provide. We'll just search like "Microsoft" or something. "Rails", "JavaScript", whatever, and then we'll sync those tweets to the database and then we will do sentiment analysis on those. If you head over to the configuration section in the Twitter README, it gives you an example on how to set up the client. I'm actually going to put this inside of the Tweet model, and we'll have a sync method, and this will just be what we use in order to sync this over. We're going to swap all of these strings instead of hard-coding them. I'm going to use Rails application secrets.

app/models/tweet.rb

class Tweet < ApplicationRecord 
    def sync 
        client = Twitter::REST::Client.new do |config|
            config.consumer_key         =  Rails.application.secrets.consumer_key
            config.consumer_secret      =  Rails.application.secrets.consumer_secret
            config.access_token         =  Rails.application.secrets.access_token
            config.access_token_secret  =  Rails.application.secrets.access_token_secret
        end
    end
end

The access tokens are going to be the ones that come from your users who have OAuth. I'm going to hard-code some in in the secrets file for myself without doing OAuth because if you go into the apps section on Twitter you can just grab that string there that they give you to meke testing their API a little bit easier, but in the future you'd actually want to pull this like from the user model or wherever you saved the OAuth tokens that you would get back from like OmniAuth.

I just pasted those off-screen, those keys in my secrets.yml, you can do the same or put those from the user model or whatever you would like to do, and then I'm also going to change this to be a class method so that we can access this, and then we'll have a query that we pass in so that we can do client.search(query) That way, you can paste in and use this generic method to sync any tweets of any type of query, and we're not going to worry about duplicates or anything like that, we're just going to sync, sync, sync and you can go and tweak this to work however you would like. We just need some data in our application to test out the sentiment analysis part. If we run rails c now, we should be able to say Tweet.sync("rails") That should load up a whole bunch of tweets, and it does, and now we just need to go through each one of these tweets and save them into our database. Here, we're just going to say

app/model/tweets.rb

class Tweet < ApplicationRecord 
    def sync 
        client = Twitter::REST::Client.new do |config|
            config.consumer_key         =  Rails.application.secrets.consumer_key
            config.consumer_secret      =  Rails.application.secrets.consumer_secret
            config.access_token         =  Rails.application.secrets.access_token
            config.access_token_secret  =  Rails.application.secrets.access_token_secret
        end
        client.search(query).each do |tweet|
            create(body: tweet.text)
        end
    end
end

Let's rerun the rails console with that new code, so we'll say Tweet.sync("rails") This is going to go insert a whole bunch of these into our database, which is perfect. That does exactly what we want, and now we need to be able to go through those tweets, and then actually parse them for sentiment analysis.

We haven't installed the sentimental gem yet, so let's open up our Gemfile, and add that to the bottom, let's get the latest version of this, and let's go ahead and paste that in, and run bundle to install it. Now that that's installed, let's jump back to the README for sentimental, and take a look at what we're going to need to implement into our application. We might actually want to say: Let's build a global sentiment analyzer because if we were to instantiate this and load the default every time that we run this, we might load the dictionary database into memory a handful of times, which wouldn't be ideal. Depends on actually how this is implemented behind the scenes, it might not do that, but you might actually be loading that word list in a memory every new time that you create an analyzer, which could use a lot of memory, and probably is the case because it allows you to customize the sentiment file based on what you've loaded. Chances are that this would duplicate that memory, you have to read the source code of this to double check on that, but what we're going to do is load this globally, and then have that ready to go anytime that we want to use it to analyze text inside our models. We're going to put this analyzer, we're going to put this inside an initializer, so we'll have

config/initializers/sentimental.rb

$analyzer = Sentimental.new 
$analyzer.load_defaults

That will always be ready to go in order to analyze these anytime that we decided to sync our sentiment on our Tweets. Let's actually just run the sentiment analysis in the console to check and make sure that it's working. Let's grab the last tweet, and grab the body from that, and we can just pass it into the analyzer and ask for either the sentiment or the score. Here, we pass in the last tweet body, and it has marked it as positive, but if you want to know why it's positive or how positive is it, then you can go and call the score method, and that is going to give us basically 62.5% positive for that. Now, how should we go about adding these to our models in order to set this up properly? There's a handful of different ways that you can go about this. We've built a sync method here that allows you to sync these tweets but that's not always the case. When you're syncing the tweet, ok. You can simply say: Let's set the sentiment immediately, and let's also set the score immediately. The thing is, that when you're doing that in something like a scaffold, that's not going to make as much sense because you're not going to always take the params, text and pass that in to sentiment analysis thing. You might want to do that in a callback and say: Well, if the body changed, let's go ahead and run this. There's various ways that you could do about this too because if you go for a more complex sentiment analysis method, or one that takes time, then you might actually want to say: Let's run that in a background job so that we can run that, and if it takes a longer time then this gem does, we can run that behind the scenes. I wouldn't say to add this to your create methods or update methods, because that doesn't make for a pretty seamless experience. You want the capturing of the sentiment to be behind the scenes, so that whenever you interact with these objects, it automatically updates when something changes. That's one of those cases where before-- Well a callback makes sense here before_save :set_sentiment, if: body_changed? Basically, we would set this so that we're only going to want to save the sentiment if the body changed, and then that way, if you go and update this, all you have to worry about is setting the body on the tweet, and then this should automatically run and update that. Now this code here can either immediately update the sentiment, or it could launch a background to go and update the sentiment if this takes longer and you have a different implementation. That would make a lot more sense for the mayority of cases, but in this case, we're going to keep it simple and just say: We'll set the sentiment there, and we're doing this before_save in order to set thsoe values, so that when it does save and does successfully set that, then these values will automatically be included. Here we can say:

def set_sentiment 
     self.sentiment = $analyzer.sentiment(body)
     self.score = $analyzer.score(body)
end 

If we save this and we go and say: Let's reload our rails application, and do Tweet.sync this time, and of corse, this time we actually need to actually make sure we actually pass in a search, so this time let's search "ruby", and let's see what happens. As you can see here, we're inserting a whole bunch to the database, but what you're getting the sentiment and the scores being added automatically to the database. Anytime it inserts one of those, it's going to check and say: Hey, that body is new, so we're going to go and update that, and if you ever edit the text body of that, it will rerun it through the sentiment analysis. Here we can get the last tweet, we can say Tweet.last.sentiment and we can find out that it's positive, and if we want to get the score for it, we get the score, and you can convert that to a float. So our float value for that was 28.13% Here we can say, if we want to double check that, let's grab the body from that and have the analyzer rerun that and give us the score, and we'll see it gets the exact same value, so we know that it's working right, it's saving those to the database, and we're able to then use this inside our application, so we can go and create scopes around those, and we could say: Let's create a scope for only positive ones, and we'll keep that query really simply around the sentiment one and we'll only look for the positive ones:

    scope :positive, ->{ where(sentiment: :positive) }
    scope :neutral, ->{ where(sentiment: :neutral)}
    scope :negative, ->{ where(sentiment: :negative)}

If we go back to the console we can say: Tweet.positive.count so we have almost 500 tweets that are positive, we have 120 tweets that are neutral and we have 180 negative ones. Overall, ruby has a higher percentage of positive tweets than negative or neutral ones, which is awesome, and like that, we're able to go in and add basically sentiment analysis to determine are people happy about this? Are they not? You can apply it to any text that you want to possibly give that. Tweets are a fun one to analyze, but you could do this with forum posts or status updates or anything that you can imagine. Any user input, it's pretty fun to play with this and kind of see-- Maybe you're GitHub and you're wondering, out of all of the issues, how many comments on the issues are positive or negative. It's a lot of interesting stuff. If you support things like emojis and all that, you might actually go and edit your database of positive and negative words, and include the text for the emojis, and then you can have those and say: If it's a :thumbs_up:, then that is also a positive indicator or whatever. You can actually go and modify a lot of the stuff in order to adapt to what your users are doing, because something like this only has a database of it's own words that it's currently aware of, and while this doesn't get updated a huge amount, it's going to depend on your application, so you might actually start here, and then upgrade into using something like your own database that you can easily manage and adjust, or your own machine learning or something like that in order to take this to the next level. This is a great introduction for sentiment analysis. It does a pretty good job for the basic implementation that it does, and it's going to do more than enough for the majority of use cases unless this becomes your core product or something. That's the sentimental gem, I hope you enjoyed this episode, it's pretty fun to play with this stuff, and if you want to see more of this in the future, let me know in the comments below. Talk to you in the next one. Peace

Transcript written by Miguel

Loading...

Subscribe to the newsletter

Join 18,000+ developers who get early access to new screencasts, articles, guides, updates, and more.

By clicking this button, you agree to the GoRails Terms of Service and Privacy Policy.

More of a social being? We're also on Twitter and YouTube.