<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Andy Gnias</title>
    <description>The latest articles on DEV Community by Andy Gnias (@agnias47).</description>
    <link>https://dev.to/agnias47</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F223899%2Fb0e0fc08-43dc-40ba-9768-82c6458bc5e2.jpg</url>
      <title>DEV Community: Andy Gnias</title>
      <link>https://dev.to/agnias47</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/agnias47"/>
    <language>en</language>
    <item>
      <title>Interpreting a Basic .vimrc File</title>
      <dc:creator>Andy Gnias</dc:creator>
      <pubDate>Sun, 03 May 2020 13:32:19 +0000</pubDate>
      <link>https://dev.to/agnias47/interpreting-a-basic-vimrc-file-42cn</link>
      <guid>https://dev.to/agnias47/interpreting-a-basic-vimrc-file-42cn</guid>
      <description>&lt;p&gt;If you're anything like me, you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Like Vim a lot&lt;/li&gt;
&lt;li&gt;Know that a .vimrc file can make Vim work the way &lt;strong&gt;you&lt;/strong&gt; want it to&lt;/li&gt;
&lt;li&gt;Copy-pasted a bunch of commands into your .vimrc to make it functional but don't really know what they all do&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Today I decided to clean up my .vimrc to create something that's pretty bare bones, but does what I want and need it to do. Even better, I actually know what each line does!&lt;/p&gt;

&lt;h1&gt;
  
  
  TL;DR: The File
&lt;/h1&gt;

&lt;p&gt;If you're just looking for a starter .vimrc with some comments, copy-paste this file and run away.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;" From: https://github.com/AGnias47/UtilityScripts/blob/master/bash/vimrc

" Don't make efforts to make Vim VI-compatible
set nocompatible

" Turn on filetype detection
:filetype on

" Turn on syntax highlighting if more than 1 color is available
if &amp;amp;t_Co &amp;gt; 1
    syntax enable
endif

" Turn on auto-indentation for C-syntax languages
:au FileType c,cpp,java set cindent

" Show matching brackets
set showmatch

" Set one depending on terminal type
set background=dark
" set background=light

" Makes backspace behave as expected
set backspace=2

"Set the tab key to 4 spaces
set tabstop=4
set softtabstop=4
set shiftwidth=4
set expandtab
set smarttab

" Turn on visual wrapping
set wrap

"Wrap at 120 characters
set textwidth=120

" Turn on highlighting for searching
set hlsearch

" Show cursor line and column position
set ruler
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Details
&lt;/h1&gt;

&lt;p&gt;Below are some of the finer details of what all these commands actually do.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vi-Compatibility
&lt;/h2&gt;

&lt;p&gt;A staple for any .vimrc file is &lt;code&gt;set nocompatible&lt;/code&gt;. It basically gives you some of the more useful features of Vim at the cost of making it less  compatible with Vi. If you've never used Vi and / or are not sure why you would want this setting, add this to your .vimrc.&lt;/p&gt;

&lt;h2&gt;
  
  
  Syntax
&lt;/h2&gt;

&lt;p&gt;Most of the syntax settings are explained clearly enough with the comments provided (Anything preceded by a &lt;code&gt;"&lt;/code&gt; is a comment). The only outlier here is the &lt;code&gt;if&lt;/code&gt; statement wrapping &lt;code&gt;syntax enable&lt;/code&gt;. It's based on the condition that &lt;code&gt;&amp;amp;t_Co&lt;/code&gt; is greater than 1. &lt;code&gt;t_Co&lt;/code&gt; represents the number of terminal colors, so we're saying, "turn on syntax highlighting as long as we have more than 1 color to work with".&lt;/p&gt;

&lt;h2&gt;
  
  
  Tabs, spaces, and backspaces
&lt;/h2&gt;

&lt;p&gt;Tabs can be a controversial subject. Personally, I find it easiest to go with the most common paradigm, at least in 2020, and use 4 space characters as my indentation marker. This .vimrc is set up so that hitting the tab key will auto-generate 4 spaces and not a tab character. The settings that make that happen are included below. The Vim help guide and &lt;a href="https://tedlogan.com/techblog3.html"&gt;this article&lt;/a&gt; really helped me understand them better.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tabstop - Number of columns that a tab displays as&lt;/li&gt;
&lt;li&gt;softtabstop - Number of columns to indent when hitting the tab key&lt;/li&gt;
&lt;li&gt;shiftwidth - Number of columns to indent when using shift operations&lt;/li&gt;
&lt;li&gt;expandtab - Convert tab key to specified number of spaces&lt;/li&gt;
&lt;li&gt;smarttab - Insert blanks when hitting tab in front of a line&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Vim help guide does make a note that setting &lt;code&gt;tabstop&lt;/code&gt; to a value other than 8 can "make your file appear wrong in many places (e.g., when printing it)", but personally I have not had issues with using a different tabstop value.&lt;/p&gt;

&lt;p&gt;Last thing to note here is the &lt;code&gt;set backspace=2&lt;/code&gt;. In some instances, the backspace and delete keys will not work as you expect them to. Fortunately, this can typically be resolved through some troubleshooting with the help of the &lt;a href="https://vim.fandom.com/wiki/Backspace_and_delete_problems"&gt;Vim Backspace and delete problems Wiki&lt;/a&gt;. Working on Ubuntu, I had some issues using the backspace key not erasing at the end of a line, and enabling this setting fixed that issue for me. A more general setting as suggested in the guide, such as &lt;code&gt;set backspace=indent,eol,start&lt;/code&gt; should be good enough to resolve most issues for most users.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Obviously there's much more you can do to with this file to improve your Vim experience. Many Vim users have provided some fantastic, and detailed, .vimrc files. My only recommendation in using someone else's settings is to understand what their commands are actually doing, which can usually be found out through a Google search or by using the help command in Vim (ex. &lt;code&gt;:help some_weird_setting_I_dont_understand&lt;/code&gt;). If you're just starting out, or have been using and loving Vim for a while, hopefully this gives you a greater appreciation of what a basic .vimrc can do for you and how it does it.&lt;/p&gt;

</description>
      <category>vim</category>
    </item>
    <item>
      <title>Analyze Data with Russian Trolls</title>
      <dc:creator>Andy Gnias</dc:creator>
      <pubDate>Mon, 30 Mar 2020 01:45:09 +0000</pubDate>
      <link>https://dev.to/agnias47/analyze-data-with-russian-trolls-3ebk</link>
      <guid>https://dev.to/agnias47/analyze-data-with-russian-trolls-3ebk</guid>
      <description>&lt;p&gt;About two years ago, FiveThirtyEight released a database of about 3 million &lt;a href="https://fivethirtyeight.com/features/why-were-sharing-3-million-russian-troll-tweets/"&gt;Russian Troll Tweets&lt;/a&gt;. Being really excited about the possibility of working with data that was relevant in the news, I &lt;a href="https://github.com/fivethirtyeight/russian-troll-tweets"&gt;forked the repository&lt;/a&gt;, where it sat in my GitHub for almost 2 years.&lt;/p&gt;

&lt;p&gt;Recently, I've been getting into some data science related projects, and realized I now had the potential to actually do something with this data. Nothing in this post is as impressive as anything that &lt;a href="https://fivethirtyeight.com/features/what-you-found-in-3-million-russian-troll-tweets/"&gt;other FiveThirtyEight readers were able to do after a week&lt;/a&gt;. However, I think this post could be a good starting point for beginners with some intermediate Python skills to do some cool things with data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dependencies
&lt;/h2&gt;

&lt;p&gt;To run the code, you'll need the following 3rd party repositories&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;matplotlib==3.2.1
wordcloud==1.6.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install them by running&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip3 install matplotlib wordcloud
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Reading the data
&lt;/h2&gt;

&lt;p&gt;The data I used in this project is available in the fivethirtyeight Git repo at this link: &lt;a href="https://github.com/fivethirtyeight/russian-troll-tweets"&gt;https://github.com/fivethirtyeight/russian-troll-tweets&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can clone the repository and copy the CSV files to a local directory, or fork the repository and modify it within your own GitHub account.&lt;/p&gt;

&lt;p&gt;Since the data is in CSV format, I used the CSV library to read the data in. You can read in all the files, or just pick 1 and still get some interesting results. Running on a machine with 8GB of RAM and no GPU, my computer crashed when I tried to read all the files. Not ideal. I read in 5 as a compromise, but may access something with more processing power in the future if I want to expand this project. For me, I put the files I was using in a directory called "data/" and read the files like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import csv

host_dir = "data/"
data = list()
for filename in os.listdir(host_dir):
    with open(host_dir + filename, "r") as f:
            raw = csv.DictReader(f, delimiter=",")
            for row in raw:
                data.append(row)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reads in the tweets as a list of dicts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Object Oriented Tweets
&lt;/h2&gt;

&lt;p&gt;To make things simpler for me, I created a Tweet class to handle the data. Basically, I created attributes for each dict key of the tweets, so not too complex. This isn't necessary, but the code referenced in these examples will be using this tweet class. It wraps each key-value pair with the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class Tweet:
    def __init__(self, csv_dict):
        self.external_author_id = csv_dict.get("external_author_id")
        self.author = csv_dict.get("author")
        self.content = csv_dict.get("content")
        self.region = csv_dict.get("region")
        self.language = csv_dict.get("language")
        self.publish_date = csv_dict.get("publish_date")
        self.harvested_date = csv_dict.get("harvested_date")
        self.following = csv_dict.get("following")
        self.followers = csv_dict.get("followers")
        self.updates = csv_dict.get("updates")
        self.post_type = csv_dict.get("post_type")
        self.account_type = csv_dict.get("account_type")
        self.retweet = csv_dict.get("retweet")
        self.account_category = csv_dict.get("account_category")
        self.new_june_2018 = csv_dict.get("new_june_2018")
        self.alt_external_id = csv_dict.get("alt_external_id")
        self.tweet_id = csv_dict.get("tweet_id")
        self.article_url = csv_dict.get("article_url")
        self.tco1_step1 = csv_dict.get("tco1_step1")
        self.tco2_step1 = csv_dict.get("tco2_step1")
        self.tco3_step1 = csv_dict.get("tco3_step1")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The attribute descriptions for each key can be found in the &lt;a href="https://github.com/fivethirtyeight/russian-troll-tweets"&gt;fivethirtyeight/russian-troll-tweets repository's ReadMe&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The data is then converted from a list of dicts to a list of Tweet objects with the following list comprehension:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tweets = [Tweet(tweet) for tweet in data]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Analysis Mode
&lt;/h2&gt;

&lt;p&gt;Now that we've loaded and processed the data, we can start doing something useful with it! Let's start by getting some statistics on the languages used in the tweets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Languages
&lt;/h3&gt;

&lt;p&gt;We can use a Counter object to easily see what languages are used in the tweets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from collections import Counter
languages = Counter([t.language for t in tweets])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us keys indicating all the languages used in the tweets, and an associated value showing how many tweets were written in that language. We can get a count of all the languages used from the keys:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(f"\nTotal languages used: {len(languages.keys())}")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And also a frequency plot of the most commonly used languages using the "most_common" function, which returns a list of the n most common occurrences in the Counter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib.pyplot as plt

languages_to_plot = 6
most_common_languages = languages.most_common(languages_to_plot)
language, count = zip(*most_common_languages)
figure, axes = plt.subplots()
axes.bar(language, count)
plt.title("Languages Used in Tweets")
plt.xlabel("Language")
plt.ylabel("Number of Tweets")
for i, v in enumerate(count):  # Used to plot values onto bars; centering imperfect
    plt.text(i - 0.25, v + (max(count) * 0.01), str(v))
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us the following plot&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--prDB1i1l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/jhfwwlx5xpsr7di85y9j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--prDB1i1l--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/jhfwwlx5xpsr7di85y9j.png" alt="Alt Text" width="407" height="278"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Followers
&lt;/h3&gt;

&lt;p&gt;Let's take a look at how much influence Russian trolls had on Twitter. We can do this by simply sorting the list based on the followers associated with the accounts that sent out each tweet. Because the tweets are stored as Tweet objects, we'll use a lambda function to specify how to sort the tweets. The lambda function will simply return the followers attribute of the tweet cast as an integer, which will allow the sort method to arrange the tweets based on this metric. Note that I set "reverse=True" to get the accounts with the most followers first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tweets.sort(key=lambda x: int(x.followers), reverse=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now I can just pull the first tweet in the list to find the account with the most followers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;highest_followed = tweets[0]
print(f"\nMax followers reached: {highest_followed.followers}")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I can also use this list to make a bar plot of the troll accounts with the most followers. Instead of using a Counter, I'm just appending the author and their follower count to two separate lists that will have the same index location. Note that I check to make sure the author is not already part of the list before adding them. For example, the top 5 tweets could all be written by the same author, so further perusing is needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;authors = list()
followers = list()
for t in tweets:
    if len(authors) == 5:
        break
    if t.author not in authors:
        authors.append(t.author)
        followers.append(int(t.followers))
figure, axes = plt.subplots()
axes.bar(authors, followers)
plt.xticks(rotation=45)
plt.title("Top Followed Accounts")
plt.xlabel("Accounts")
plt.ylabel("Number of Followers")
for i, v in enumerate(followers):  # Used to plot values; centering imperfect
    plt.text(i - 0.25, v + (max(followers) * 0.01), str(v))
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us the following plot&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RIP-2H5N--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/njrimpecct0nu9gilpko.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RIP-2H5N--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/njrimpecct0nu9gilpko.png" alt="Alt Text" width="407" height="343"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  English language tweets
&lt;/h3&gt;

&lt;p&gt;As you can probably tell by this post, I speak English. Therefore, I'm going to filter out the English language tweets and do some analysis on them. I'll use a filter object to do this. My lambda function will check that the language attribute is equal to "English", and I'll perform this check on each Tweet in the tweets list. Since filter() returns a filter object, I'll cast this into a list by wrapping the operation as a list().&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;english_tweets = list(filter(lambda x: x.language == "English", tweets))
print(f"\nPerforming analysis on {len(english_tweets)} English tweets")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I want to do some basic analysis on the content of these tweets, so let's first use a list comprehension to get just the tweets themselves.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tweet_content = [t.content for t in english_tweets]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, I'm going to split each tweet up by whitespace so that I can get the individual words used in the tweets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;words = list()
for tweet in tweet_content:
    for word in tweet.split(" "):
        words.append(word)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is great, as I now have a list containing each word used in this list. However, there's probably a bunch of common words in there which I don't really care about, like "a", "an", "the", etc. Also, I need to account for differences in capitalization, and punctuation for words at the end of a tweet. Let's do some data cleaning to account for all these factors.&lt;/p&gt;

&lt;p&gt;First, let's put each word in lowercase using a list comprehension&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;words = [word.lower() for word in words]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, let's clean out some words that we don't care about. I can probably find an external library to do this for me, but this project is small enough that I'm just going to create a set of words that I want to purge from my list. I'm using a set over a list because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Running&lt;br&gt;
&lt;br&gt;
&lt;code&gt;for i in set()&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
is quicker than running&lt;br&gt;
&lt;br&gt;
&lt;code&gt;for i in list()&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I only need 1 occurrence of each word, so it more closely resembles the formal definition of a set than a list. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is the set I'm using in its entirity. You can add or remove words for your own purposes as you see fit, and I'll describe why I left some words in later in this post.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;common_words = {
    "-", "~", "&amp;amp;amp;", "a", "an", "the", "on", "to", "is", "for", "and", "of", "you",
    "in", "that", "should", "be", "from", "when", "have", "has", "was", "with", "at",
    "are", "this", "by", "it", "i", "my", "not", "your", "as", "will", "about", "all",
    "who", "they", "are", "his", "out", "but", "up", "our", "like", ":", "\|", "people",
    "he", "just", "new", "me", "get", "can", "more", "so", "what", "i'm", "do", "if",
    "or", "via", "their", "&amp;amp;", "don't", "no", "one", "over", "how", "these", "day", "2",
    "want", "back", "still", "only", "some", "says", }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's use this set to filter out those words&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;words = list(filter(lambda x: x not in common_words, words))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can purge punctuation with another list comprehension&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;words = [w.replace("?", "").replace(".", "").replace("!", "") for w in words]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And finally purge out any zero length strings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;words = list(filter(lambda x: len(x) &amp;gt; 0, words))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that we have our words, we can do meaningful things with them. Let's make a bar plot of the top 10 words used. We'll filter out hashtags as they have special significance in Twitter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;words_nonhashtags = list(filter(lambda x: x[0] != "#", words))
word_count = Counter(words_nonhashtags)
most_common_words = word_count.most_common(10)
word, word_count_int = zip(*most_common_words)
word = [w.title() for w in word]
figure, axes = plt.subplots()
axes.bar(word, word_count_int)
plt.xticks(rotation=30)
plt.title("Most Frequenty Used Words in English Tweets")
plt.xlabel("Word")
plt.ylabel("Number of Occurrances")
for i, v in enumerate(word_count_int):  # Used to plot values; centering imperfect
    plt.text(i - 0.1, v + (max(word_count_int) * 0.01), str(v))
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us the following plot&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2dnygdwe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/bah1n17yusj8uzml3o1a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2dnygdwe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/bah1n17yusj8uzml3o1a.png" alt="Alt Text" width="402" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most of these words have some relation to American politics. Note "Rt", or "RT", is Twitter slang for "Retweet", which usually asks users to retweet their tweet and thus further spread their message. Other words, like "Her" could be innocuous, or could refer to Hillary Clinton, or her "I'm with her" campaign slogan.&lt;/p&gt;

&lt;h3&gt;
  
  
  Word Cloud Generation
&lt;/h3&gt;

&lt;p&gt;Bar graphs are cool, but word clouds are even cooler! Or outdated and tacky, but whatever, let's make one anyway!&lt;/p&gt;

&lt;p&gt;There's a fantastic &lt;a href="https://github.com/amueller/word_cloud"&gt;wordcloud library&lt;/a&gt; readily available, so we can pretty quickly and easily spin up a method to present our data in a way that's easy for those non-technically inclined to understand.&lt;/p&gt;

&lt;p&gt;The cloud itself is generated with the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import wordcloud
cloud = wordcloud.WordCloud(width = 1000, height = 500, max_words=50, background_color="white").\
generate_from_frequencies(word_count)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this function, the width and height set give a nice horizontal word cloud for a standard 8.5x11 page. I'm giving the function my entire list of words, but limiting it to 50 with the "max_words" parameter, and I'm setting the background as white with the "background_color" parameter. Then, I pass in my string, or in this case Counter object, to the "generate_from_frequencies" function.&lt;/p&gt;

&lt;p&gt;Finally, I plot my word cloud as a regular figure and turn the axis off:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.figure(figsize=(15,8))
plt.imshow(cloud)
plt.axis("off")
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And I get the following&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pdCISUpP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/czfw7u0a1j10rro1kr3x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pdCISUpP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/czfw7u0a1j10rro1kr3x.png" alt="Alt Text" width="851" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pretty neat! With hashtags being so important in Twitter, let's do the same thing we did for words in the previous section with the hashtags included in each tweet. Code is very similar, save for the initial filtering step. This code&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hashtags = list(filter(lambda x: x[0] == "#", words))
hashtag_count = Counter(hashtags)
most_common_hashtags = hashtag_count.most_common(10)
hashtag, hashtag_count_int = zip(*most_common_hashtags)
figure, axes = plt.subplots()
axes.bar(hashtag, hashtag_count_int)
plt.xticks(rotation=45)
plt.title("Most Frequenty Used Hashtags in English Tweets")
plt.xlabel("Word")
plt.ylabel("Number of Occurrances")
for i, v in enumerate(hashtag_count_int):  # Used to plot values; centering imperfect
    plt.text(i - 0.1, v + (max(hashtag_count_int) * 0.01), str(v))
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gives us this graph&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--S2MBKzMF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/j3sbsoi5nz0opvly6bon.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--S2MBKzMF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/j3sbsoi5nz0opvly6bon.png" alt="Alt Text" width="401" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And this code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hashcloud = wordcloud.WordCloud(width = 1000, height = 500, max_words=50, background_color="white").\
generate_from_frequencies(hashtag_count)
plt.figure(figsize=(15,8))
plt.imshow(hashcloud)
plt.axis("off")
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gives us this word cloud&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--77MyzelB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/nf7zsufl2f7uakbyxm91.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--77MyzelB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/nf7zsufl2f7uakbyxm91.png" alt="Alt Text" width="851" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Although nothing that was done here involved any in-depth data mining, hopefully this gives you a brief overview of how much can be done with 3rd party plotting tools and some intermediate Python functions. I do plan on taking this process a bit further. I'd like to get some insight on why certain words were included, such as determining the context of the tweets. This could be done by utilizing the "account_type" attribute, which would give a description such as "Right wing troll", or, to a more advanced degree, using natural language processing tools.&lt;/p&gt;

&lt;p&gt;You can see any future progress I make, and my current Jupyter Notebook, in my Git repository &lt;a href="https://github.com/AGnias47/russian-troll-tweets"&gt;here&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
    </item>
  </channel>
</rss>
