<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lorena</title>
    <description>The latest articles on DEV Community by Lorena (@lorena).</description>
    <link>https://dev.to/lorena</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F588430%2F879f28ef-f2fc-41da-a044-68989cfd22f7.JPG</url>
      <title>DEV Community: Lorena</title>
      <link>https://dev.to/lorena</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lorena"/>
    <language>en</language>
    <item>
      <title>How hyperautomation will transform business operations</title>
      <dc:creator>Lorena</dc:creator>
      <pubDate>Fri, 11 Feb 2022 16:02:51 +0000</pubDate>
      <link>https://dev.to/n8n/how-hyperautomation-will-transform-business-operations-39jd</link>
      <guid>https://dev.to/n8n/how-hyperautomation-will-transform-business-operations-39jd</guid>
      <description>&lt;p&gt;Are you trying to convince your manager to invest (more) in automation? Or are you the manager who went a bit over budget with automation tools and you're wondering if it was worth it? Or maybe you're just exploring the vast space of workflow automation tools and wondering: "Is this the future of real work, or is it just fantasy?"&lt;/p&gt;

&lt;p&gt;In this post we’ll share with you the &lt;strong&gt;key facts you need to know about hyperautomation&lt;/strong&gt;: what it is, why it is important, what are examples of hyperautomation tools, how businesses can use them, and how hyperautomation is predicted to evolve in the next few years.&lt;/p&gt;

&lt;h4&gt;
  
  
  Table of Contents
&lt;/h4&gt;

&lt;p&gt;What is hyperautomation?&lt;br&gt;
How are businesses leveraging hyperautomation?&lt;br&gt;
What is the future of hyperautomation?&lt;br&gt;
     1. Orchestrated automation processes&lt;br&gt;
     2. Automation marketplaces&lt;br&gt;
     3. Vendor-agnostic hyperautomation&lt;br&gt;
     4. Infrastructure automation&lt;br&gt;
Start automating!&lt;/p&gt;

&lt;h2&gt;
  
  
  What is hyperautomation?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.gartner.com/en/information-technology/glossary/hyperautomation" rel="noopener noreferrer"&gt;Gartner defines&lt;/a&gt; &lt;strong&gt;hyperautomation&lt;/strong&gt; as "a business-driven, disciplined approach that organizations use to rapidly identify, vet and automate as many business and IT processes as possible. Hyperautomation involves the orchestrated use of multiple technologies, tools or platforms."&lt;/p&gt;

&lt;p&gt;Examples of &lt;strong&gt;hyperautomation tools&lt;/strong&gt; are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no-code/low-code application platforms (N/LCAP)&lt;/li&gt;
&lt;li&gt;workflow automation tools (WAT)&lt;/li&gt;
&lt;li&gt;robotic process automation (RPA)&lt;/li&gt;
&lt;li&gt;Artificial Intelligence (AI) and Machine Learning (ML) &lt;/li&gt;
&lt;li&gt;chatbots and conversational agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://www.statista.com/statistics/1234927/worldwide-hyperautomation-enabling-software-market/" rel="noopener noreferrer"&gt;The hyperautomation-enabling software market has been rising in the past two years and is expected to reach $596bn in 2022.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How is hyperautomation different from automation?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Automation&lt;/strong&gt; refers to the accomplishment of a specific task without manual or human intervention. For example, you can use a no-code &lt;a href="https://n8n.io/workflows/791" rel="noopener noreferrer"&gt;workflow that creates tickets from form submissions&lt;/a&gt; automatically, instead of doing this manually. Automation is well-suited for &lt;a href="https://n8n.io/features-of-tasks-that-can-be-automated" rel="noopener noreferrer"&gt;repetitive, boring, regular, rule-based, software-based, and time-consuming tasks&lt;/a&gt; at small scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hyperautomation&lt;/strong&gt; refers to the combination and connection of several automated workflows, thus creating an orchestrated automation. This orchestration feature takes automation to the &lt;em&gt;hyper&lt;/em&gt; level, allowing businesses to scale individual processes. Taking the example above a step further, you can add this workflow alongside an ML model or service that detects the sentiment of user reviews, a chatbot that assists customers, an application that processes text from invoices, and a database synchronization to keep the information up-to-date -- all these forming a hyperautomated business.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why do we need hyperautomation?
&lt;/h2&gt;

&lt;p&gt;The examples above highlight the two main benefits of hyperautomation: &lt;strong&gt;increased productivity&lt;/strong&gt; and &lt;strong&gt;seamless scaling of business operations&lt;/strong&gt;. Without automation orchestration, business departments risk working out of sync, thus impacting the overall progress and costs of the organization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.gartner.com/en/webinars/4007544/the-gartner-2022-predictions-hyperautomation-inclusive-of-rpa-low-code-" rel="noopener noreferrer"&gt;Gartner&lt;/a&gt; points out that hyperautomation is driven by two forces: &lt;strong&gt;operational excellence&lt;/strong&gt; and &lt;strong&gt;digital acceleration&lt;/strong&gt;. Operational excellence is reflected in profits (businesses being able to deliver faster or cheaper), whereas digital acceleration is reflected in adoption (attracting more customers at a faster pace).&lt;/p&gt;

&lt;p&gt;To get to that point, organizations can go two ways. Traditionally, they can increase productivity by hiring department-specific people or IT-skilled engineers who can set up automations. IT teams become fusion teams, where employees with different skills can directly contribute to the automation processes needed in their department.&lt;/p&gt;

&lt;p&gt;In short, hyperautomation helps organizations to save costs, increase efficiency, and overall improve their business model. On an individual level, employees whose tedious tasks are automated have more time to focus on meaningful and creative work, which in turn increases their job satisfaction.&lt;/p&gt;

&lt;h2&gt;
  
  
  How are businesses leveraging hyperautomation?
&lt;/h2&gt;

&lt;p&gt;According to a &lt;a href="https://www.gartner.com/en/documents/4006716-gartner-s-2021-digital-business-acceleration-survey-the-speed-of-the-game-has-increased" rel="noopener noreferrer"&gt;Gartner study&lt;/a&gt;, &lt;strong&gt;businesses have on average 4 automation processes.&lt;/strong&gt; This number seems low even for small businesses, considering how many individual tasks are on the to-do lists of employees in every department. However, 80% of senior business executives say they will spend more on digital initiatives in 2022, aiming to accelerate their business (65%) and go to market faster (71%).&lt;/p&gt;

&lt;p&gt;Businesses in all industries can leverage the power of hyperautomation. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;E-Commerce&lt;/strong&gt; can automate almost the entire journey, from announcing product launches, sending and analyzing emails, issuing invoices, tracking packages, running inventories, and notifying customers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IT&lt;/strong&gt; can automate DevOps and SecOps use cases like contributions to a repository, critical incident response, or vulnerability disclosure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is the future of hyperautomation?
&lt;/h2&gt;

&lt;p&gt;Gartner foresees four trends in hyperautomation in the next few years.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Orchestrated automation processes
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"By 2024 diffuse (siloed) approach to hyperautomation initiatives will drive up initiative specific total cost of ownership by 40-fold, making adaptive governance a differentiating factor in financial performance."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Think of how many different apps and services you use in your daily job, and how many more your colleagues from other departments use as well. In fact, 78% of business professionals use three tools from different categories to accomplish their daily tasks, which is not really practical or efficient.&lt;br&gt;
In the next few years, businesses will try to turn these disparate (disconnected) automations into orchestrated (connected) hyperautomation workflows. For example, you can have one workflow that synchronizes data between the Sales Pipedrive and Marketing Hubspot.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Automation marketplaces
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"By 2024, growth of “automation marketplaces” will propel 80% of the large enterprises to pivot to principles of composability to minimize operational interdependencies and maximize value of hyperautomation initiatives."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Businesses will change the way they deliver their digital products, replacing packaged applications (like individual projects or products) with composed applications (in the style of catalogs or markets). Think of these "automation marketplaces" as curated, interactive exhibitions. &lt;/p&gt;

&lt;p&gt;For example, in a workflow automation marketplace, you would not only see a list of integrations, but also sort them by industry or function, try out automation templates, and learn from supportive content––maybe even tailored to your personal role.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Vendor-agnostic hyperautomation
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"By 2024, the lack of standardization and uniformity in vendor pricing structures will continue driving 40% of clients to increase hyperautomation vendor-agnostic business capabilities."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It's hard to find the one tool that ticks all the boxes: affordable price, powerful functionality, intuitive UI, blazing speed. Commonly, businesses are willing to compromise on some features for the sake of simplicity (keep their processes on one platform––at the risk of vendor lock-in.&lt;/p&gt;

&lt;p&gt;Hyperautomation tools can diminish this risk, since they make it possible to interconnect apps services. As a consequence, in the future businesses will move away from a commitment-based model to a consumption-based model, preferring to combine features of different tools until it's a match for their use case. &lt;/p&gt;

&lt;h3&gt;
  
  
  4. Infrastructure automation
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"By 2024, 40% of organizations will use managed service provider hyperautomation offerings to fill infrastructure operations gaps fortifying a foundation for TCO and scaled automation."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The COVID-19 pandemic has forced many companies to go into remote work mode. But with employees working around the world, it's challenging to ensure smooth connectivity, data security, incident response management, timely decision-making, and ongoing support and maintenance.&lt;/p&gt;

&lt;p&gt;To create a solid infrastructure for these processes, businesses will rely on hyperautomation tools. For example, you can build no-code workflows for automatic &lt;a href="https://n8n.io/blog/learn-to-automate-your-factorys-incident-reporting-a-step-by-step-guide/" rel="noopener noreferrer"&gt;incident response&lt;/a&gt; or &lt;a href="https://n8n.io/blog/database-monitoring-and-alerting-with-n8n/" rel="noopener noreferrer"&gt;database monitoring&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start automating!
&lt;/h2&gt;

&lt;p&gt;Going back to the questions in the introduction, we hope the information in this post helps you make a compelling case for automation in your workplace, rest assured that your investment in automation tools is worth every penny, and dare to explore the hyperautomation space.&lt;/p&gt;

&lt;p&gt;One more thing: 3% of hyperautomation professionals characterize their organization as having a high impact on hyperautomation governance. Are you going to be among them?&lt;/p&gt;

</description>
      <category>automation</category>
      <category>nocode</category>
    </item>
    <item>
      <title>Building a dockerized ETL pipeline for streaming tweets in Slack</title>
      <dc:creator>Lorena</dc:creator>
      <pubDate>Fri, 04 Feb 2022 10:19:01 +0000</pubDate>
      <link>https://dev.to/lorena/building-a-dockerized-etl-pipeline-for-streaming-positive-tweets-2ngh</link>
      <guid>https://dev.to/lorena/building-a-dockerized-etl-pipeline-for-streaming-positive-tweets-2ngh</guid>
      <description>&lt;p&gt;One of the projects in my Data Science Bootcamp was about creating a database of tweets, along with their sentiment score, and post positive tweets in a Slack channel. This pipeline had to be orchestrated with &lt;a href="https://docs.docker.com/compose/" rel="noopener noreferrer"&gt;Docker Compose&lt;/a&gt;. The pipeline looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46volx0ia8yp5ie24js1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46volx0ia8yp5ie24js1.png" alt="schema ETL pipeline" width="800" height="223"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this post, I'll show you how I set up each step.&lt;/p&gt;

&lt;h2&gt;
  
  
  0. Prerequisites &amp;amp; tech stack
&lt;/h2&gt;

&lt;p&gt;Here's an overview of the apps, services, and libraries I used in this project:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Apps &amp;amp; Databases&lt;/th&gt;
&lt;th&gt;Python libraries&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Twitter&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.tweepy.org/" rel="noopener noreferrer"&gt;&lt;code&gt;tweepy&lt;/code&gt;&lt;/a&gt; &amp;amp; &lt;a href="https://pypi.org/project/vaderSentiment/" rel="noopener noreferrer"&gt;&lt;code&gt;vader&lt;/code&gt;&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Slack&lt;/td&gt;
&lt;td&gt;&lt;a href=""&gt;&lt;code&gt;slackclient&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://account.mongodb.com/account/register" rel="noopener noreferrer"&gt;MongoDB&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://pymongo.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;&lt;code&gt;pymongo&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://www.postgresql.org/download/" rel="noopener noreferrer"&gt;PostgreSQL&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.psycopg.org/docs/install.html" rel="noopener noreferrer"&gt;&lt;code&gt;psycopg2-binary&lt;/code&gt;&lt;/a&gt; &amp;amp; &lt;a href="https://www.sqlalchemy.org/" rel="noopener noreferrer"&gt;&lt;code&gt;sqlalchemy&lt;/code&gt;&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://docs.docker.com/compose/" rel="noopener noreferrer"&gt;Docker-Compose&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  1. Collecting tweets
&lt;/h2&gt;

&lt;p&gt;To collect tweets, I used the &lt;a href="https://developer.twitter.com/en/docs/twitter-api" rel="noopener noreferrer"&gt;Twitter API&lt;/a&gt; along with the &lt;code&gt;tweepy&lt;/code&gt; library.&lt;/p&gt;

&lt;p&gt;First, I &lt;a href="https://developer.twitter.com/en/docs/apps/overview" rel="noopener noreferrer"&gt;created an app on Twitter&lt;/a&gt; and got my credentials (API key and Access Token). Then, I wrote the &lt;a href="https://github.com/lorenanda/tweets-docker-pipeline/tree/main/docker-compose/tweet_collector" rel="noopener noreferrer"&gt;Python code for streaming live tweets&lt;/a&gt;, using &lt;code&gt;tweepy&lt;/code&gt; with my Twitter credentials. I chose to stream the hashtag &lt;em&gt;#OnThisDay&lt;/em&gt; (thought it would be interesting to get a daily notification of what happened years ago) and collected the tweet text and user handle.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tweepy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OAuthHandler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;API&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tweepy.streaming&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StreamListener&lt;/span&gt;
&lt;span class="n"&gt;tweet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;screen_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;stream_listener&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StreamListener&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tweepy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stream_listener&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;track&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;OnThisDay&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Storing tweets in MongoDB
&lt;/h2&gt;

&lt;p&gt;After collecting the tweets, I had to store them in MongoDB, a non-relational (NoSQL) database that stores data in JSON-like documents. Since the tweet data is collected as key-value pairs (JSON format), MongoDB is a good way to store this information.&lt;/p&gt;

&lt;p&gt;First, I had to create a MongoDB instance, set up a cluster, and create a database and a collection within it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Create a MongoDB account&lt;/li&gt;
&lt;li&gt; Set up a cluster: &lt;em&gt;cloud.mongodb.com &amp;gt; Clusters &amp;gt; Create New Cluster&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt; Create a database: &lt;em&gt;Cluster &amp;gt; Collections &amp;gt; Create Database&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt; Create a collection: &lt;em&gt;Cluster &amp;gt; Collections &amp;gt; Database &amp;gt; Create Collection&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt; Create a field: &lt;em&gt;Collection &amp;gt; Insert document &amp;gt; Type the field &lt;code&gt;text&lt;/code&gt; below &lt;code&gt;_id&lt;/code&gt;&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt; Allow access to the database: &lt;em&gt;Project &amp;gt; Security &amp;gt; Network Access &amp;gt; IP Access List &amp;gt; Add your IP address.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt; Connect to the database from your terminal:\
&lt;code&gt;mongo "mongodb+srv://YourClusterName.mongodb.net/YourDatabaseName" --username YourUsername&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Second, I wrote the &lt;a href="https://github.com/lorenanda/tweets-docker-pipeline/tree/main/docker-compose/tweet_collector" rel="noopener noreferrer"&gt;Python code for storing tweets in MongoDB&lt;/a&gt; using the &lt;code&gt;pymongo&lt;/code&gt; library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pymongo&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pymongo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MongoClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mongo_container&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;27018&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tweets_db&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;warning_log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;TWEET: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;just tweeted: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collections&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;onthisday&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The host &lt;code&gt;mongo_container&lt;/code&gt; is one of the Docker containers, explained in section 5.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Performing ETL job
&lt;/h2&gt;

&lt;p&gt;The ETL (Extract, Transform, Load) job involves three actions: extracting tweets from MongoDB, analyzing their sentiment, and storing them into a new Postgres database. Here is the &lt;a href="https://github.com/lorenanda/tweets-docker-pipeline/tree/main/docker-compose/etl_job" rel="noopener noreferrer"&gt;Python code for the ETL job&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1. Extracting tweets from MongoDB
&lt;/h3&gt;

&lt;p&gt;To extract the tweet texts from MongoDB, I used again the &lt;code&gt;pymongo&lt;/code&gt; library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_tweets&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;tweets&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;onthisday&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tweets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Random tweet: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.2. Transforming tweets with sentiment scores
&lt;/h3&gt;

&lt;p&gt;To analyze the sentiment of the tweets, I used the &lt;code&gt;VADER&lt;/code&gt; library , which returns (among others) a compound sentiment score.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;vaderSentiment.vaderSentiment&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SentimentIntensityAnalyzer&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transform_tweets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="n"&gt;tweet_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\'&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;sia&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SentimentIntensityAnalyzer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;tweet_sia&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sia&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;polarity_scores&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet_text&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;compound&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tweet_sia&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.3. Loading tweets into PostgreSQL
&lt;/h3&gt;

&lt;p&gt;To load the tweets with their sentiment scores into a Postgres database, first you need––well, a Postgres database. I installed Postgres, then created a database and a table for tweets right from the terminal:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Connect to Postgres: &lt;code&gt;psql&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Create a database: &lt;code&gt;createdb twitter&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Go into the created database: &lt;code&gt;psql twitter&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Create columns in the database: &lt;code&gt;CREATE TABLE tweets (text varchar(280), score numeric(4,3));&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then, I wrote the Python code for inserting tweets into the &lt;code&gt;tweets&lt;/code&gt; table, using the &lt;code&gt;sqlalchemy&lt;/code&gt; library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_tweets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sentiment&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="n"&gt;insert_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    INSERT INTO tweets VALUES (&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{tweet[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;]}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, {tweet_sia});
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;insert_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Tweet &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; loaded into Postgres.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Extracting tweets from Postgres
&lt;/h2&gt;

&lt;p&gt;After having a database of tweets and their sentiment score in place, I had to select and extract &lt;em&gt;some&lt;/em&gt; tweets, that would be sent to Slack.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT text FROM tweets ORDER BY sentiment DESC LIMIT 1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;NEW TWEET! &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; just tweeted: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Sentiment score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;blob_score&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Posting tweets with a Slackbot
&lt;/h2&gt;

&lt;p&gt;The last step in the pipeline is posting tweets in a Slack channel. To do this, first I &lt;a href="https://slack.com/intl/en-de/help/articles/115005265703-Create-a-bot-for-your-workspace" rel="noopener noreferrer"&gt;created a Slackbot&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Then, I wrote the Python code for posting tweets in a Slack channel, including the code from the previous step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;slack&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_engine&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;

&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PG_ENGINE&lt;/span&gt;
&lt;span class="n"&gt;webhook_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WEBHOOK_SLACK&lt;/span&gt;


&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Positive tweet:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT text FROM tweets ORDER BY sentiment DESC LIMIT 1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;NEW TWEET! &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; just tweeted: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Sentiment score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;blob_score&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;webhook_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And 🎉 –– here's the tweet that was posted in Slack:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.com%2Fassets%2Fimg%2Ftweetbyslackbot.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.com%2Fassets%2Fimg%2Ftweetbyslackbot.webp" alt="" width="800" height="400"&gt;&lt;/a&gt;Tweet posted by Slackbot&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Creating the Docker Compose pipeline
&lt;/h2&gt;

&lt;p&gt;The final touch of this project is &lt;em&gt;orchestration&lt;/em&gt;. The individual Python scripts for each step work when you run them manually, but the goal is to run this pipeline from beginnning to end with only one command. This is where &lt;strong&gt;Docker Compose&lt;/strong&gt; comes in.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Each of the five previous steps (or the rectangles in my messy schema) represents a &lt;a href="https://www.docker.com/resources/what-container" rel="noopener noreferrer"&gt;&lt;strong&gt;Docker container&lt;/strong&gt;&lt;/a&gt;, so in my &lt;code&gt;docker_compose.yml&lt;/code&gt; file I had five containers (services): &lt;code&gt;tweet_container&lt;/code&gt;, &lt;code&gt;postgres_container&lt;/code&gt;, &lt;code&gt;mongo_container&lt;/code&gt;, &lt;code&gt;etl_container&lt;/code&gt;, and &lt;code&gt;slackbot_container&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;For the two database containers, I used &lt;a href=""&gt;Docker images&lt;/a&gt;, since they didn't depend on custom code stored in my project folders. For the other three containers, I referefenced the respective code location (&lt;code&gt;build&lt;/code&gt;) and their dependencies (&lt;code&gt;depends_on&lt;/code&gt;) (for example, the &lt;code&gt;tweet_collector&lt;/code&gt; depends on &lt;code&gt;postgres&lt;/code&gt; and &lt;code&gt;mongo&lt;/code&gt;, since the tweets are stored in these databases).&lt;/p&gt;

&lt;p&gt;I also used &lt;a href="https://docs.docker.com/storage/volumes/" rel="noopener noreferrer"&gt;Docker &lt;code&gt;volumes&lt;/code&gt;&lt;/a&gt; to keep the data when the containers are stopped (data persistence).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3'&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;tweet_container&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tweet_collector/&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;postgres_container&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;mongo_container&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./tweet_collector/:/app&lt;/span&gt;

  &lt;span class="na"&gt;postgres_container&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgresdb&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres:13.0&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;5555:5432&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_USER=your_user&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_PASSWORD=your_password&lt;/span&gt;

  &lt;span class="na"&gt;mongo_container&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mongodb&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mongo&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;27018:27018&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./mongodb:/app&lt;/span&gt;

  &lt;span class="na"&gt;etl_container&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;etl_job/&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;postgres_container&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;mongo_container&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./etl_job/:/app&lt;/span&gt;

  &lt;span class="na"&gt;slackbot_container&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;slackbot/&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;mongo_container&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;postgres_container&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./slackbot/:/app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, here some of the CLI commands I used for managing the Docker containers (you can find more in &lt;a href="https://docs.docker.com/engine/reference/commandline/cli/" rel="noopener noreferrer"&gt;their docs&lt;/a&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docker images&lt;/code&gt; to list all the used images (postgres and mongo)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docker ps -a&lt;/code&gt; to list all my containers&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docker -v&lt;/code&gt; to mount volumes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docker build&lt;/code&gt; to build an image from a Docker file&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docker run&lt;/code&gt; to run the containers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And that was it: my very first dockerized ETL pipeline –– a week's work and a few hours writing packed in a 6-minute blog post.&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>python</category>
      <category>datascience</category>
      <category>database</category>
    </item>
    <item>
      <title>How to create animated scatterplots with seaborn and imageio</title>
      <dc:creator>Lorena</dc:creator>
      <pubDate>Sun, 23 Jan 2022 18:20:02 +0000</pubDate>
      <link>https://dev.to/lorena/how-to-create-animated-scatterplots-with-seaborn-and-imageio-23bk</link>
      <guid>https://dev.to/lorena/how-to-create-animated-scatterplots-with-seaborn-and-imageio-23bk</guid>
      <description>&lt;p&gt;In this quick tutorial, I'll show you how to create an animated scatterplot using the libraries &lt;code&gt;matplotlib&lt;/code&gt; or &lt;code&gt;seaborn&lt;/code&gt; and &lt;code&gt;imageio&lt;/code&gt;. Here's the &lt;a href="https://github.com/lorenanda/animated-scatterplot/" rel="noopener noreferrer"&gt;GitHub repo of this project&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The scatterplot illustrates the relationship between life expectancy and fertility rate of world's countries from 1960 to 2015, based on the &lt;a href="https://www.gapminder.org/tag/download-data/" rel="noopener noreferrer"&gt;Gapminder data set&lt;/a&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;country&lt;/th&gt;
&lt;th&gt;year&lt;/th&gt;
&lt;th&gt;population&lt;/th&gt;
&lt;th&gt;life_expectancy&lt;/th&gt;
&lt;th&gt;fertility_rate&lt;/th&gt;
&lt;th&gt;continent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Afghanistan&lt;/td&gt;
&lt;td&gt;1800&lt;/td&gt;
&lt;td&gt;3280000.0&lt;/td&gt;
&lt;td&gt;28.21&lt;/td&gt;
&lt;td&gt;7.0&lt;/td&gt;
&lt;td&gt;Asia&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The animated scatterplot is basically made of several overlapping static plots. Here's how to create the animation step-by-step:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Create static scatterplots for each year in the data set.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The scatterplots depict &lt;code&gt;life_expectancy&lt;/code&gt; on the x axis and &lt;code&gt;fertility_rate&lt;/code&gt; on the y axis. To make the plots even more insightful, the size of the points illustrates the &lt;code&gt;population&lt;/code&gt; number and their color illustrates the &lt;code&gt;continent&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt; 
&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;matplotlib&lt;/span&gt; &lt;span class="n"&gt;inline&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;

&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scatterplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;life_expectancy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fertility_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;hue&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;continent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;population&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;sizes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;legend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;gapminder_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;gapminder_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;palette&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Set2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;center&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;black&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontweight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bold&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Life expectancy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Fertility rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Export the scatterplot images to a designated folder.&lt;/strong&gt;&lt;br&gt;
You need to save all the individual scatterplots, so that you can overlap the images in the next step.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;imageio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/path/to/folder/images&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mkdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Join the individual images in chronological order.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;lifeexp_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;savefig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imageio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Export the scatterplots sequence as a gif.&lt;/strong&gt;&lt;br&gt;
The &lt;code&gt;fps&lt;/code&gt; (frames per second) parameter sets the speed of the animation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;imageio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mimsave&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;scatterplot.gif&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, putting everything together, here's the full code and the animated scatterplot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt; 
&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;matplotlib&lt;/span&gt; &lt;span class="n"&gt;inline&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;imageio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/home/lorena/Documents/bootcamp/W1/images&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mkdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1960&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2016&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;axis&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scatterplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;life_expectancy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fertility_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;hue&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;continent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;population&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;sizes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;legend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;gapminder_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;gapminder_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;palette&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Set2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;center&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;black&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontweight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bold&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;#plt.title(f'inspired by Hans Rosling', loc='right', fontsize=10, color='grey', style='italic', pad=-20)
&lt;/span&gt;
    &lt;span class="c1"&gt;#plt.legend(bbox_to_anchor=(0.74, 0.85), loc='center')
&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Life expectancy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Fertility rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;#plt.annotate({country}, )
&lt;/span&gt;
    &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;lifeexp_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;savefig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imageio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;imageio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mimsave&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;scatterplot.gif&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrnnz9x5yhoo5vbveszd.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrnnz9x5yhoo5vbveszd.gif" alt="animated scatterplot" width="432" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>tutorial</category>
      <category>python</category>
      <category>datavisualization</category>
    </item>
    <item>
      <title>5 ways to keep your skills fresh after finishing a coding bootcamp</title>
      <dc:creator>Lorena</dc:creator>
      <pubDate>Sun, 28 Nov 2021 09:37:25 +0000</pubDate>
      <link>https://dev.to/lorena/5-ways-to-keep-your-skills-fresh-after-finishing-a-coding-bootcamp-22i5</link>
      <guid>https://dev.to/lorena/5-ways-to-keep-your-skills-fresh-after-finishing-a-coding-bootcamp-22i5</guid>
      <description>&lt;p&gt;One year ago at this time, I was nervously making last-minute changes to slides for my &lt;a href="[https://github.com/lorenanda/speech-emotion-recognition](https://github.com/lorenanda/speech-emotion-recognition)"&gt;final project&lt;/a&gt; of a Data Science Bootcamp.&lt;/p&gt;

&lt;p&gt;Today, I work as a technical writer at a startup that is developing a low-code workflow automation tool. Though in this role I don't use my data science and Python skills on a daily basis, I still apply them occasionally in data analyses and personal projects.&lt;/p&gt;

&lt;p&gt;In this post, I'll share with you five tips for maintaining and even developing your coding skills after you're done with formal education.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Improve your school projects
&lt;/h2&gt;

&lt;p&gt;Bootcamps are fast-paced. So much so that you might barely complete some projects before the deadline and if you somehow do, they still won't be perfect. There will always be things left to improve, and you should take the time to do them.&lt;/p&gt;

&lt;p&gt;One way to improve your projects and coding skills is to try new models and libraries. For example, if you did classification with logistic regression, try also with random forest; if you used &lt;a href="https://www.tensorflow.org/" rel="noopener noreferrer"&gt;Tensorflow&lt;/a&gt;, now try &lt;a href="https://keras.io/" rel="noopener noreferrer"&gt;Keras&lt;/a&gt;; if you scraped a website with &lt;a href="https://beautiful-soup-4.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;BeautifulSoup&lt;/a&gt;, now do it with &lt;a href="https://scrapy.org/" rel="noopener noreferrer"&gt;Scrapy&lt;/a&gt;. You get the point. &lt;/p&gt;

&lt;h2&gt;
  
  
  2. Work on new projects
&lt;/h2&gt;

&lt;p&gt;Even with all the mandatory projects you'll need to complete in the bootcamp, you'll probably get a  lot of ideas for others. After the bootcamp is the time to explore them!&lt;/p&gt;

&lt;p&gt;Ideally, work on real-life projects or some that have business value for the field you're targeting. There are many data sets available for marketing, finance, medicine, and other fields. Find a relevant data set and apply different models to derive insights from raw numbers. For more ideas, check out the &lt;a href="https://www.kaggle.com/" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt; data sets and competitions.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Code regularly
&lt;/h2&gt;

&lt;p&gt;What you do every day matters. Small things add up: actions turn into habits turn into skills. That's why it's important to code regularly. It doesn't have to be a complex project, even a 15-minute coding session or a short exercise counts. &lt;/p&gt;

&lt;p&gt;For example, you can block one hour every Saturday to practice algorithms and data structures on &lt;a href="https://leetcode.com/" rel="noopener noreferrer"&gt;LeetCode&lt;/a&gt;, &lt;a href="https://www.codewars.com/" rel="noopener noreferrer"&gt;Codewars&lt;/a&gt;, or &lt;a href="https://www.hackerrank.com/" rel="noopener noreferrer"&gt;HackerRank&lt;/a&gt;. You'll not only sharpen your coding skills, but also get a confidence boost as you progress through levels and get badges.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Keep on learning
&lt;/h2&gt;

&lt;p&gt;In data science, machine learning, and AI, research and applications are advancing fast! New papers, models, libraries, and business applications are coming out almost every day. That's why it's important to keep up with the news and advances in the field.&lt;/p&gt;

&lt;p&gt;There are many resources for this. You can read blogs (like &lt;a href="https://towardsdatascience.com/?gi=73ee6fa159ba" rel="noopener noreferrer"&gt;Towards Data Science&lt;/a&gt; and &lt;a href="https://towardsdatascience.com/?gi=73ee6fa159ba" rel="noopener noreferrer"&gt;Data Science Central&lt;/a&gt;) and &lt;a href="https://arxiv.org/list/stat.ML/recent" rel="noopener noreferrer"&gt;papers&lt;/a&gt;, watch videos (for NLP enthusiasts I recommend the YouTube channel &lt;a href="https://www.youtube.com/c/AICoffeeBreak" rel="noopener noreferrer"&gt;AI Coffee Break with Letitia&lt;/a&gt;), listen to podcasts, and take online courses.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Document your learnings
&lt;/h2&gt;

&lt;p&gt;I find the best way to learn something is by teaching it. Explaining something to others helps you structure the learned information and identify problems or issues that are unclear. &lt;/p&gt;

&lt;p&gt;To document your learnings, you can create a blog on &lt;a href="https://lorenaciutacu.medium.com/" rel="noopener noreferrer"&gt;Medium&lt;/a&gt; or &lt;a href="http://dev.to/lorena"&gt;dev&lt;/a&gt;, where you write about your projects. If you're into web development, you can even build your own blog (I made mine with &lt;a href="https://jekyllrb.com/" rel="noopener noreferrer"&gt;Jekyll&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;With these five tips in mind, keep on coding and learning!&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>career</category>
      <category>beginners</category>
      <category>python</category>
    </item>
    <item>
      <title>6 features of tasks that can be automated 📑</title>
      <dc:creator>Lorena</dc:creator>
      <pubDate>Sat, 20 Nov 2021 07:52:57 +0000</pubDate>
      <link>https://dev.to/n8n/6-features-of-tasks-that-can-be-automated-5am8</link>
      <guid>https://dev.to/n8n/6-features-of-tasks-that-can-be-automated-5am8</guid>
      <description>&lt;p&gt;You're working long hours now, six in the morning to six in the afternoon. Sometimes even eight in the afternoon, six days a week. Sometimes seven days a week. It's a long hustle but it keeps you busy.&lt;/p&gt;

&lt;p&gt;Busy but unfulfilled, because many of the things you do are plain boring, repetitive, unengaging, and could probably be done (better) by a machine. If you've found your way to this article, it means you've had it with manual work and you're ready to start automating at least part of it. Congratulations, welcome to the future of work!&lt;/p&gt;

&lt;p&gt;There are plenty of things in our daily lives that surely would rather be automated than half-heartedly accomplished by a bored human who, mind you, might even make mistakes. Workflow automation platforms like n8n enable you to automate even complex tasks with no code (or a bit of JavaScript, if you insist).&lt;/p&gt;

&lt;p&gt;But how do you decide where to begin? In this post, I present to you &lt;strong&gt;six features of tasks that can (and should) be automated&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Repetitive tasks
&lt;/h2&gt;

&lt;p&gt;Download this from here, upload it there, write this, click that. Repeat 10 times a day, 50 times a year, until the end of time. Or until you discover automation, because tasks like this shouldn't be accomplished manually anymore.&lt;/p&gt;

&lt;p&gt;For example, &lt;a href="https://docs.n8n.io/courses/level-one/chapter-2.html" rel="noopener noreferrer"&gt;getting your daily news&lt;/a&gt;, &lt;a href="https://docs.n8n.io/getting-started/create-your-first-workflow/daily-weather-notifications/" rel="noopener noreferrer"&gt;checking the weather&lt;/a&gt;, &lt;a href="https://n8n.io/workflows/1222" rel="noopener noreferrer"&gt;creating backups of your work&lt;/a&gt; (this is also a reminder to do it!), or &lt;a href="https://n8n.io/blog/learn-how-to-automatically-cross-post-your-content-with-n8n/" rel="noopener noreferrer"&gt;cross-posting articles on different channels&lt;/a&gt; are all activities that can be automated with no code.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Boring tasks
&lt;/h2&gt;

&lt;p&gt;"My favorite part of my job is copy-pasting data from one file into a spreadsheet!", said no one ever. More likely, you wish you could be doing anything but that. If a task is unengaging, it doesn't involve decision-making, higher-order thinking, creativity, or "the human touch", then it's a good candidate for automation.&lt;/p&gt;

&lt;p&gt;For example, a common activity in sales is collecting information about companies (like the number of employees, industry, and location) from the website of a business event, in order to create contacts or leads in a CRM. Instead of manually copy-pasting data, you can create &lt;a href="https://n8n.io/blog/how-uproc-scraped-a-multi-page-website-with-a-low-code-workflow/" rel="noopener noreferrer"&gt;a workflow that does web-scraping&lt;/a&gt;, data transfer, and even &lt;a href="https://n8n.io/workflows/1055" rel="noopener noreferrer"&gt;email validation&lt;/a&gt; for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Frequent, regular tasks
&lt;/h2&gt;

&lt;p&gt;It's the end of yet another month and you need to calculate yet another budget for your business, run an inventory on your products and orders. You need to set a reminder and block a full day to get this job done–a day when you could be doing more exciting work or even take a holiday.&lt;/p&gt;

&lt;p&gt;If the task has to be done at the same time or interval and it involves the same sequence of steps, then it could be automated. In fact, we've already built &lt;a href="https://n8n.io/workflows/1207" rel="noopener noreferrer"&gt;a workflow&lt;/a&gt; for the use case mentioned above.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzpn00wko9fl5eob4pw9r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzpn00wko9fl5eob4pw9r.png" alt="Workflow for creating backups on GitHub" width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Rule-based tasks
&lt;/h2&gt;

&lt;p&gt;Most automation-friendly tasks are rule-based, meaning they follow a logical sequence of steps in the form of "if A, then B, else C". This is the kind of low-level decision-making that can be established by a human and, if you know that the process won't change, delegated to a machine.&lt;/p&gt;

&lt;p&gt;For example, filtering sales orders based on their value was a boring task that our friend &lt;a href="https://docs.n8n.io/courses/level-one/chapter-3.html" rel="noopener noreferrer"&gt;Nathan had to do for his team&lt;/a&gt;, before we taught him how to automate it. In the same way, you could &lt;a href="https://n8n.io/blog/no-code-ecommerce-workflow-automations/" rel="noopener noreferrer"&gt;automate your e-commerce business&lt;/a&gt;, for example by &lt;a href="https://n8n.io/workflows/1075" rel="noopener noreferrer"&gt;filtering positive and negative reviews&lt;/a&gt; or &lt;a href="https://n8n.io/workflows/1206" rel="noopener noreferrer"&gt;issuing invoices&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Software-based tasks
&lt;/h2&gt;

&lt;p&gt;Did you know that U.S. users had &lt;a href="https://www.statista.com/statistics/267309/number-of-apps-on-mobile-phones/" rel="noopener noreferrer"&gt;on average 20 apps&lt;/a&gt; installed on their mobile? And that organizations worldwide were using &lt;a href="https://www.statista.com/statistics/1233538/average-number-saas-apps-yearly/" rel="noopener noreferrer"&gt;on average 80&lt;/a&gt; software as a service (SaaS) applications? Now think of all the tasks that you're doing daily and how many of those involve transferring or synchronizing data between different apps, without human input.&lt;/p&gt;

&lt;p&gt;For example, if you need to sync data between your CRM and a database, you can create a no-code workflow for that and let the two systems communicate with each other. Don't act as an intermediary machine, you're just making it awkward.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Time-consuming tasks
&lt;/h2&gt;

&lt;p&gt;I don't know about you, but for me, the most annoying thing about boring tasks is that they are time-consuming. Fine, life and work are not always rainbows and butterflies, sparkling with creativity and meaningful activities. I can do some brainless tasks for a while if needed, but when they start taking up hours of my precious time–that's where I draw the line and reach for n8n.&lt;/p&gt;

&lt;p&gt;For example, in one of my previous roles, I was responsible for creating reports, which involved aggregating data from different sources (Google Analytics, BigQuery, Salesforce, Postgres), calculating some custom metrics, and sending the results to management or clients. This kind of reporting could take up to two hours, every month/quarter/year–or only a few minutes once to set up &lt;a href="https://n8n.io/workflows/892" rel="noopener noreferrer"&gt;a workflow&lt;/a&gt; in n8n.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1e11fcp5xrgtxqmf8as.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1e11fcp5xrgtxqmf8as.png" alt="Workflow for running inventories on Shopify orders" width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this post, you've learned how to identify tasks that can be automated. To sum up, automatable tasks are &lt;strong&gt;repetitive, boring, regular, rule-based, software-based, and time-consuming&lt;/strong&gt;. Keep this in mind next time you're working on something, and try to automate it!&lt;/p&gt;

</description>
      <category>automation</category>
      <category>nocode</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Detecting emotions from speech with neural networks in Python</title>
      <dc:creator>Lorena</dc:creator>
      <pubDate>Fri, 12 Nov 2021 12:11:44 +0000</pubDate>
      <link>https://dev.to/lorena/detecting-emotions-from-speech-with-neural-networks-in-python-3ioe</link>
      <guid>https://dev.to/lorena/detecting-emotions-from-speech-with-neural-networks-in-python-3ioe</guid>
      <description>&lt;p&gt;During a data science bootcamp, I built a machine learning model that detects emotions from speech (pre-recorded files and live-recorded voices). The code is available on &lt;a href="https://github.com/lorenanda/speech-emotion-recognition" rel="noopener noreferrer"&gt;my GitHub&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;This has been one of the most challenging projects I've worked on, but also the most exciting. In this post, I'll walk you through my project: from planning adn choosing a data set to building machine learning models and evaluating their performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project planning
&lt;/h2&gt;

&lt;p&gt;First and foremost, I designed a project plan, after having a brief look at the data set. From my work experience and the assignments completed in the past three months, I've learned that this step is crucial for the success of a coding project. Planning helps me (and the team) organize my ideas, break down the big project into smaller tasks, identify issues, and track the progress -- and not despair at the amount of work to be done in a short time. &lt;/p&gt;

&lt;p&gt;For this purpose, I created a simple &lt;a href="https://www.google.com/url?sa=t&amp;amp;rct=j&amp;amp;q=&amp;amp;esrc=s&amp;amp;source=web&amp;amp;cd=&amp;amp;cad=rja&amp;amp;uact=8&amp;amp;ved=2ahUKEwjEp7_GwP_tAhXD6aQKHXP5ClMQFjAAegQIAhAC&amp;amp;url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FKanban_(development)&amp;amp;usg=AOvVaw2B54c6DIMX8rua56XtMTP9" rel="noopener noreferrer"&gt;Kanban board&lt;/a&gt; directly in the GitHub repository of my project, so that I have the code and tasks in one place.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2021%2F01%2Fscreenshot_2020-12-11-lorenanda-speech-emotion-recognition.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2021%2F01%2Fscreenshot_2020-12-11-lorenanda-speech-emotion-recognition.png" alt="Project board in GitHub" width="800" height="400"&gt;&lt;/a&gt;&lt;em&gt;Project board in GitHub&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To create a project board linked to a repository in GitHub:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In your desired repository, click on the tab &lt;code&gt;Projects&lt;/code&gt;, then on &lt;code&gt;Create project&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Enter the &lt;code&gt;Project board name&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;(Optional) Enter a &lt;code&gt;Description&lt;/code&gt; of the project and select a &lt;code&gt;Project template&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Click on &lt;code&gt;Create project&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Data set
&lt;/h2&gt;

&lt;p&gt;I used the &lt;strong&gt;RAVDESS data set&lt;/strong&gt;, which contains 1440 audio files. These are voice recordings of 24 actors (12 male, 12 female) who say two sentences in two different intensities (normal and strong) with eight intonations that express different emotions: calm, happy, sad, angry, fearful, surprised, disgusted, and neutral. There are 192 recordings for each emotion, except for neutral, which doesn't have recordings in strong intensity. &lt;/p&gt;

&lt;p&gt;To sum up, the original RAVDESS data set includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1440 recordings&lt;/li&gt;
&lt;li&gt;24 speakers&lt;/li&gt;
&lt;li&gt;12 male, 12 female&lt;/li&gt;
&lt;li&gt;2 sentences&lt;/li&gt;
&lt;li&gt;2 intensities&lt;/li&gt;
&lt;li&gt;8 intonations / emotions&lt;/li&gt;
&lt;li&gt;192 recordings for 7 emotions&lt;/li&gt;
&lt;li&gt;96 recordings for 1 emotion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2020%2F12%2Fplot_emotions.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2020%2F12%2Fplot_emotions.png" alt="RAVDESS data set distribution" width="800" height="400"&gt;&lt;/a&gt;&lt;em&gt;RAVDESS data set distribution&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Oversampling
&lt;/h3&gt;

&lt;p&gt;The data set was imbalanced, so I used the &lt;code&gt;RandomOversample&lt;/code&gt; method to create new features for the neutral class.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;oversample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speech_emotion_recognition/features/X.joblib&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speech_emotion_recognition/features/y.joblib&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 

    &lt;span class="n"&gt;oversample&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RandomOverSampler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sampling_strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;minority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;X_over&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_over&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;oversample&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_resample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;X_over_save&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_over_save&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X_over.joblib&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y_over.joblib&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_over&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speech_emotion_recognition/features/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_over_save&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_over&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speech_emotion_recognition/features/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_over_save&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Oversampling added 96 new datapoints, so in the end I had &lt;strong&gt;1536 audio files&lt;/strong&gt; to work with. &lt;/p&gt;

&lt;p&gt;Another imbalance was gender-related: there were slightly more recordings by males and in normal intensity. I didn't deal with this imbalance because it wasn't significant to my project, since I only wanted to predict the emotion. However, it would be interesting to explore in the future.&lt;/p&gt;

&lt;h3&gt;
  
  
  Feature extraction
&lt;/h3&gt;

&lt;p&gt;There are many features that can be extracted from audio files, but I decided to work with the &lt;strong&gt;Mel Frequency Cepstral Coefficient (MFCC)&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC.&lt;/p&gt;

&lt;p&gt;The difference between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal spectrum. This frequency warping can allow for better representation of sound, for example, in audio compression. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Mel-frequency_cepstrum" rel="noopener noreferrer"&gt;source&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To extract the MFCC from the audio files, I used the Python library &lt;a href="https://librosa.org/doc/latest/index.html" rel="noopener noreferrer"&gt;&lt;code&gt;librosa&lt;/code&gt;&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_features&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;save_dir&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;feature_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;walk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;y_lib&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;librosa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;res_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kaiser_fast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;mfccs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;librosa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mfcc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y_lib&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sample_rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_mfcc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="nb"&gt;file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;arr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mfccs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;
            &lt;span class="n"&gt;feature_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Data loaded in %s seconds.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;feature_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;asarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;asarray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;X_save&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_save&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X.joblib&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y.joblib&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;save_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_save&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;save_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_save&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Preprocessing completed.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The visual representation of MFCC looks like this: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2020%2F12%2Fmfcc1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2020%2F12%2Fmfcc1.png" alt="MFCC plot" width="800" height="400"&gt;&lt;/a&gt;&lt;em&gt;MFCC plot&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Machine Learning Models
&lt;/h2&gt;

&lt;p&gt;I trained three different neural networks models on the MFCC and emotion labels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Multi-Layer Perceptron (MLP)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mlp_classifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;mlp_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MLPClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;hidden_layer_sizes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt;
    &lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adam&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;shuffle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;momentum&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Convolutional Neural Network (CNN)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cnn_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;x_traincnn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expand_dims&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x_testcnn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expand_dims&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Conv1D&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;same&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Activation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Conv1D&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;same&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Activation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;Conv1D&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;same&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Activation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;BatchNormalization&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Activation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Flatten&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Activation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;softmax&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;categorical_crossentropy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adam&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;accuracy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;cnn_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;x_traincnn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_testcnn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Long Short-Term Memory (LSTM)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lstm_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X_train_lstm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expand_dims&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X_test_lstm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expand_dims&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;lstm_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;lstm_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;LSTM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;return_sequences&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;lstm_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;LSTM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;lstm_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;lstm_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dropout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;lstm_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;softmax&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;lstm_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adam&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;categorical_crossentropy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;accuracy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;lstm_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;lstm_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lstm_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train_lstm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After several iterations of tweaking the hyperparameters, I found that generally the models performed better with low learning rates (0.001), &lt;code&gt;adam&lt;/code&gt; optimizer, and less layers. All models overfit (they couldn't generalize on unseen data), but this seems to be a common issue in neural networks and on audio data. &lt;/p&gt;

&lt;p&gt;As expected, MLP had the lowest accuracy, since it's a very basic model (a simple feed-forward artificial neural network). CNN and LSTM had similar train accuracy (80%), but CNN performed better on test data (60%) than LSTM (51%). To give you some context, state-of-the-art models for speech classification have an accuracy of 70-80%, so I was quite happy with my CNN model accuracy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2020%2F12%2Fmodels_accuracy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2020%2F12%2Fmodels_accuracy.png" alt="Accuracy of different ML models" width="800" height="400"&gt;&lt;/a&gt;&lt;em&gt;Accuracy of different ML models&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It was particularly interesting to look at the actual vs. predicted emotions, to see what emotions were misclassified. From the correlations matrices of CNN and LSTM, I noticed that both models misclassified emotions that sound similar or are ambiguous (even for humans), like sad-calm or angry-happy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2020%2F12%2Flstm_confusionmatrix.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2020%2F12%2Flstm_confusionmatrix.png" alt="Confusion matrix of LSTM" width="800" height="400"&gt;&lt;/a&gt;&lt;em&gt;Confusion matrix of LSTM&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2020%2F12%2Fcnn_confusionmatrix.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2020%2F12%2Fcnn_confusionmatrix.png" alt="Confusion matrix of CNN" width="800" height="400"&gt;&lt;/a&gt;&lt;em&gt;Confusion matrix of CNN&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Predictions
&lt;/h2&gt;

&lt;p&gt;The exciting part was to make predictions on new data, more specifically on &lt;a href="http://www.moviesoundclips.net/" rel="noopener noreferrer"&gt;movie sound clips&lt;/a&gt; and my own voice in real-time. To record my voice, I used the Python library &lt;a href="https://python-sounddevice.readthedocs.io/en/0.4.1/" rel="noopener noreferrer"&gt;&lt;code&gt;sounddevice&lt;/code&gt;&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;soundfile&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sf&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sounddevice&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scipy.io.wavfile&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;write&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;record_voice&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;fs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;44100&lt;/span&gt;  &lt;span class="c1"&gt;# Sample rate
&lt;/span&gt;    &lt;span class="n"&gt;seconds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;  &lt;span class="c1"&gt;# Duration of recording
&lt;/span&gt;    &lt;span class="c1"&gt;# sd.default.device = "Built-in Audio"  # Speakers full name here
&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Say something:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;myrecording&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;samplerate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;channels&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Wait until recording is finished
&lt;/span&gt;    &lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speech_emotion_recognition/recordings/myvoice.wav&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;myrecording&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Voice recording saved.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I then tested the CNN and LSTM models on pre- and live-recorded audio files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;make_predictions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cnn_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speech_emotion_recognition/models/cnn_model.h5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;lstm_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speech_emotion_recognition/models/lstm_model.h5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;prediction_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prediction_sr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;librosa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;res_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kaiser_fast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;sr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;22050&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;mfccs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;librosa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mfcc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prediction_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prediction_sr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_mfcc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expand_dims&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mfccs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expand_dims&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lstm_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict_classes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;emotions_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;neutral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;happy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sad&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;angry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fearful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;disgusted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;surprised&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;emotions_dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This voice sounds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both models identified the correct or plausible emotion from recorded speech!&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;It was super exciting to work on this project and I'm already thinking of improving and extending it in some ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Try other models (not necessarily neural networks).&lt;/li&gt;
&lt;li&gt;  Extract other audio features to see if they are better predictors than the MFCC.&lt;/li&gt;
&lt;li&gt;  Train on larger data sets, since 1500 files and only 200 samples per emotion is not enough.&lt;/li&gt;
&lt;li&gt;  Train on natural data, i.e. on recordings of people speaking in unstaged situations, so that the emotional speech sounds more realistic.&lt;/li&gt;
&lt;li&gt;  Train on more diverse data, i.e. on recordings of people of different cultures and languages. This is important because the expression of emotions varies across cultures and is influenced also by individual experiences.&lt;/li&gt;
&lt;li&gt;  Combine speech with facial expressions and text (speech-to-text) for multimodal sentiment analysis.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>tutorial</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Create a toxic language detector for Telegram in 4 steps 🤬</title>
      <dc:creator>Lorena</dc:creator>
      <pubDate>Tue, 21 Sep 2021 08:06:38 +0000</pubDate>
      <link>https://dev.to/n8n/create-a-toxic-language-detector-for-telegram-in-4-steps-3m7o</link>
      <guid>https://dev.to/n8n/create-a-toxic-language-detector-for-telegram-in-4-steps-3m7o</guid>
      <description>&lt;p&gt;When was the last time you talked to someone online, be it friends, coworkers, or even strangers? Nowadays, you most likely do it every day. Online communication platforms like Telegram, Reddit, or Discord have made it possible for people from all over the world to connect and share their thoughts on pretty much any topic, instantly. This can be an enriching experience for users, but these platforms can also foster toxicity like cyberbullying, threats, and insults, forcing some users offline and silencing their voices.&lt;/p&gt;

&lt;p&gt;One solution to this problem comes from &lt;a href="https://jigsaw.google.com/" rel="noopener noreferrer"&gt;Jigsaw&lt;/a&gt; and Google's Counter Abuse Technology team, who developed &lt;a href="https://www.perspectiveapi.com/" rel="noopener noreferrer"&gt;&lt;em&gt;Perspective API&lt;/em&gt;&lt;/a&gt;: a free API that uses machine learning to identify toxic language in English, Spanish, French, German, Portuguese, Italian, and Russian. Toxic language is defined here as"a rude, disrespectful, or unreasonable comment that is likely to make someone leave a discussion".&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh4.googleusercontent.com%2FPgXW33_eg9l_te-ivDXrre6_hA6kijuiZ71heTak-Vl-VhdcvB4k9eK7lnuq-tSs_etODHy64Jy4Bj1Uj-QqwNtN2bVYehdoqyx3G-4HXI2VJ_zBGhjRiMlw01BuPN7VHb4HclI5%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh4.googleusercontent.com%2FPgXW33_eg9l_te-ivDXrre6_hA6kijuiZ71heTak-Vl-VhdcvB4k9eK7lnuq-tSs_etODHy64Jy4Bj1Uj-QqwNtN2bVYehdoqyx3G-4HXI2VJ_zBGhjRiMlw01BuPN7VHb4HclI5%3Ds0" alt="Perspective API in action" width="800" height="400"&gt;&lt;/a&gt;Perspective API in action&lt;/p&gt;

&lt;p&gt;In practice, Perspective scores a phrase based on the perceived impact the text may have in a conversation. The phrase can be analyzed on different attributes: flirtation, identity attack, insult, profanity, sexually explicit, threat, and (severe) toxicity. Keep in mind though that machine learning models can only be as good as the data they're trained on. This means that they may misclassify as toxic some innocent comments (and vice versa), so the flagged comments should be reviewed by a human eye.&lt;/p&gt;

&lt;p&gt;Perspective API has been implemented by &lt;a href="https://www.perspectiveapi.com/case-studies/" rel="noopener noreferrer"&gt;several major publishers and platforms&lt;/a&gt; like Reddit, The New York Times, and DISQUS, helping them moderate online comments. At n8n, we communicate with our 16,000+ community members in the &lt;a href="http://community.n8n.io/" rel="noopener noreferrer"&gt;Discourse forum&lt;/a&gt;, on Discord, &lt;a href="https://twitter.com/n8n_io" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt;, and even via &lt;a href="https://t.me/comunidadn8n" rel="noopener noreferrer"&gt;Telegram for Spanish speakers&lt;/a&gt;. We value open, &lt;a href="https://n8n.io/workflows/982" rel="noopener noreferrer"&gt;inclusive&lt;/a&gt;, and respectful communication and want to ensure that everyone has a positive experience in the n8n community – and beyond.&lt;/p&gt;

&lt;p&gt;To this end, we used the Perspective API to build the &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.googlePerspective/" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;em&gt;Google Perspective node&lt;/em&gt;&lt;/strong&gt;&lt;/a&gt;, which allows you to integrate toxic language detection in your workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow for detecting toxic language in Telegram messages
&lt;/h2&gt;

&lt;p&gt;To give you an idea of how you can use the &lt;em&gt;Google Perspective node&lt;/em&gt;, we created &lt;a href="https://n8n.io/workflows/1216" rel="noopener noreferrer"&gt;a workflow&lt;/a&gt; that detects toxic language in messages sent in a Telegram chat and replies with a warning message.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FpSN-PhPuKlXONRTmsT6_6pNDUNHRwtaRfBpuCnslpZS4NsuZwfJmYl8ANTa4gcdL0JMUXvLEyJQP9k8It2AvNyWLHdJ0m3dJ1nSM_z-CprrOwjqNXTNjg-2WEriYJaDWVXsKZ-YZ%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FpSN-PhPuKlXONRTmsT6_6pNDUNHRwtaRfBpuCnslpZS4NsuZwfJmYl8ANTa4gcdL0JMUXvLEyJQP9k8It2AvNyWLHdJ0m3dJ1nSM_z-CprrOwjqNXTNjg-2WEriYJaDWVXsKZ-YZ%3Ds0" alt="Workflow for detecting toxic language in Telegram" width="800" height="400"&gt;&lt;/a&gt;Workflow for detecting toxic language in Telegram&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;&lt;em&gt;Telegram Trigger node&lt;/em&gt;&lt;/strong&gt; starts the workflow when a new message is sent in a Telegram chat.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;em&gt;Google Perspective node&lt;/em&gt;&lt;/strong&gt; analyzes the text of the message and returns a probability value between 0 and 1 of how likely it is that the content is toxic.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;em&gt;IF node&lt;/em&gt;&lt;/strong&gt; filters messages with a toxic probability value above 0.7.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;em&gt;Telegram node&lt;/em&gt;&lt;/strong&gt; sends a message in the chat with the text "I don't tolerate toxic language" if the probability value is above 0.7.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;em&gt;NoOp node&lt;/em&gt;&lt;/strong&gt; takes no action if the probability value is below 0.7. This node is optional and serves only to show that the workflow can be extended in this direction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now let's see how to configure each node step by step. If you haven't built an n8n workflow yet, you might want to take a look at our &lt;a href="https://docs.n8n.io/quickstart/" rel="noopener noreferrer"&gt;quickstart guide&lt;/a&gt; or take the &lt;a href="https://docs.n8n.io/courses/level-one/" rel="noopener noreferrer"&gt;beginner's course&lt;/a&gt;. This will help you understand the configuration of the nodes used in this workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Get new messages from Telegram
&lt;/h3&gt;

&lt;p&gt;First of all, you need to create a Telegram bot and get credentials. Start a chat with &lt;a href="https://telegram.me/BotFather" rel="noopener noreferrer"&gt;Botfather&lt;/a&gt; in your Telegram account and follow the instructions to create your bot and get credentials. Make sure you add your newly created bot to the channel you want to monitor.&lt;/p&gt;

&lt;p&gt;Then, open the &lt;em&gt;Telegram Trigger node&lt;/em&gt; and add your &lt;em&gt;Credentials Name&lt;/em&gt; and &lt;em&gt;Access Token&lt;/em&gt; in &lt;em&gt;Telegram API&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In the &lt;em&gt;Updates&lt;/em&gt; field select: &lt;em&gt;message, edited_message, channel_post,&lt;/em&gt; and &lt;em&gt;edited_channel_post&lt;/em&gt;. These update options will trigger the workflow when a text message is posted.&lt;/p&gt;

&lt;p&gt;To test if the bot works well so far, execute the &lt;em&gt;Trigger node&lt;/em&gt; and send a message to the Telegram channel. We tested this workflow with the message "You're a stupid bot! I hate you!" (we swear it's just for testing purposes, we actually think bots are pretty cool and smart). The &lt;em&gt;Telegram Trigger node&lt;/em&gt; should output the following result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FbiB_15LoriRD_BK0T2yJqtYDCadzvBOy9WZsgNWbafxnbQEsKXLuHbyFiva1Kz_umJn8Uo3tjc4xBaLFIEkFohLEUxPg__rrW0YQrprJbBKkPEDj-3qvJTYo_U_KbuYihUccdoRY%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FbiB_15LoriRD_BK0T2yJqtYDCadzvBOy9WZsgNWbafxnbQEsKXLuHbyFiva1Kz_umJn8Uo3tjc4xBaLFIEkFohLEUxPg__rrW0YQrprJbBKkPEDj-3qvJTYo_U_KbuYihUccdoRY%3Ds0" alt="Configuration of the Telegram Trigger node" width="800" height="400"&gt;&lt;/a&gt;Configuration of the Telegram Trigger node&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Analyze the toxicity of the message
&lt;/h3&gt;

&lt;p&gt;In the second step, the incoming message from Telegram has to be analyzed with Perspective. In the &lt;em&gt;Google Perspective node&lt;/em&gt; configure the following parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;em&gt;Operation&lt;/em&gt;: Analyze ContentThis operation analyzes the incoming text message.&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;Text&lt;/em&gt;: &lt;code&gt;{{$json["message"]["text"]}}&lt;/code&gt;
This expression selects the incoming Telegram message to be analyzed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the section &lt;em&gt;Attributes to Analyze&lt;/em&gt; you can add one or more attributes supported by Perspective that you want to be detected in the incoming message. If you don't add any attribute, all will be returned by default. For this example, the node is configured to detect profanities and identity attacks in the text, so two attributes are added with the properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;em&gt;Attribute Name:&lt;/em&gt; Profanity&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;Score Threshold&lt;/em&gt;: 0.00
This value sets the score above which to return results. The score is a value between 0 and 1 representing the probability that the text is toxic; it doesn't reflect the intensity (how toxic the text is). For example, if you set the &lt;em&gt;Score Threshold&lt;/em&gt; at 0.5, then only messages that are 50% likely to be toxic are returned. If no value is set, at zero all scores are returned. You can read more &lt;a href="https://medium.com/jigsaw/what-do-perspectives-scores-mean-113b37788a5d" rel="noopener noreferrer"&gt;in this article&lt;/a&gt; about what the scores mean.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the section &lt;em&gt;Options&lt;/em&gt;, you can select the &lt;em&gt;Language&lt;/em&gt; of the text input. This option is useful if you want to monitor only a specific language. If unspecified, the node will auto-detect the language. In our example, we select the &lt;em&gt;Language&lt;/em&gt; English.&lt;/p&gt;

&lt;p&gt;Now if you execute the &lt;em&gt;Google Perspective node&lt;/em&gt;, the output should look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh4.googleusercontent.com%2FXI2H1OnmArFhHG6PADlYp4ENKJZXe2S0RpWEtf2kLwXte19R1STMMSY8xAn0mwSv7PAcy89_ElOp_zaye6FywKjLAV4shx10m1sqWuuIFi5UK270vo6kk4iheAjXhHkGNu5tFY0j%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh4.googleusercontent.com%2FXI2H1OnmArFhHG6PADlYp4ENKJZXe2S0RpWEtf2kLwXte19R1STMMSY8xAn0mwSv7PAcy89_ElOp_zaye6FywKjLAV4shx10m1sqWuuIFi5UK270vo6kk4iheAjXhHkGNu5tFY0j%3Ds0" alt="Configuration of the Google Perspective node" width="800" height="400"&gt;&lt;/a&gt;Configuration of the Google Perspective node&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Filter toxic messages
&lt;/h3&gt;

&lt;p&gt;In the third step, the toxic messages with a probability higher that 0.7 have to be filtered out. For this, you need to set up an &lt;em&gt;IF node&lt;/em&gt; with the following parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;em&gt;Value 1:&lt;/em&gt; &lt;code&gt;{{$json["attributeScores"]["PROFANITY"]["summaryScore"]["value"]}}&lt;/code&gt;
This expression selects the score value of the respective attribute.&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;Operation:&lt;/em&gt; Larger&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;Value 2&lt;/em&gt;: 0.7
This is the value we want to compare the score with.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you execute the IF node now, it outputs the following results:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FulTV-NrI_foRhtJdXSsee98X2iiamH5YXSuyMhJOXcNCnj2EDV6NF_vdH-HkobnmyMECQcxVTCZtBz4ZJ1J6Ivzslmkg9VJxEt9JfaSB2SRsCFIuizGfrR2e2bhHwXpbd80mCG6w%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FulTV-NrI_foRhtJdXSsee98X2iiamH5YXSuyMhJOXcNCnj2EDV6NF_vdH-HkobnmyMECQcxVTCZtBz4ZJ1J6Ivzslmkg9VJxEt9JfaSB2SRsCFIuizGfrR2e2bhHwXpbd80mCG6w%3Ds0" alt="Configuration of the IF node" width="800" height="400"&gt;&lt;/a&gt;Configuration of the IF node&lt;/p&gt;

&lt;p&gt;The message "You're a stupid bot! I hate you!" scored 0.92 for profanity and 0.62 for identity attack, which means it has downright strong toxic language on these attributes.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Send a warning message to Telegram
&lt;/h3&gt;

&lt;p&gt;The final step is taking action against the toxic message. A mild action would be to just reply to the message in the Telegram channel warning the user that "We don't tolerate toxic language here!". To do this, configure the &lt;em&gt;Telegram node&lt;/em&gt; with the following parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;em&gt;Resource&lt;/em&gt;: Message&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;Operation&lt;/em&gt;: Send Message&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;Chat ID&lt;/em&gt;: &lt;code&gt;{{$node["Telegram Trigger"].json["message"]["chat"]["id"]}}&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;Text&lt;/em&gt;: I don't tolerate toxic language!&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;Add Field &amp;gt; Reply to Message ID&lt;/em&gt;: &lt;code&gt;{{$node["Telegram Trigger"].json["message"]["message_id"]}}&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh4.googleusercontent.com%2FwA7GLd-yBfCEzNKH4hYxGC1Y7oV46KLpObgeDiPo7lBZjTnqyc02B01Ja_gNwbFZLeh_CTPtjVqUz_VlkPHvg2PO6SW2-5qzevSlYc0F6SeDve8bUp_NYJ9pddmKrKdgLcd26_57%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh4.googleusercontent.com%2FwA7GLd-yBfCEzNKH4hYxGC1Y7oV46KLpObgeDiPo7lBZjTnqyc02B01Ja_gNwbFZLeh_CTPtjVqUz_VlkPHvg2PO6SW2-5qzevSlYc0F6SeDve8bUp_NYJ9pddmKrKdgLcd26_57%3Ds0" alt="Configuration of the Telegram node" width="800" height="400"&gt;&lt;/a&gt;Configuration of the Telegram node&lt;/p&gt;

&lt;p&gt;Now the bully will be publicly admonished in Telegram (once again, sorry, bot, you're really cool):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh4.googleusercontent.com%2FlcMLPPGgGij8Tr_1C1x4y7vi142E4bsL8eaC_CPHhxRV9u9KGAvYgnGMFH3IrkS8U4Mj-e2uR4RFQD0P0w-IzwlvsEmiOPl0cLtGrJ5Cx9Q3Kkkl88KD-XL4c1v-OxH01LJRyQiw%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh4.googleusercontent.com%2FlcMLPPGgGij8Tr_1C1x4y7vi142E4bsL8eaC_CPHhxRV9u9KGAvYgnGMFH3IrkS8U4Mj-e2uR4RFQD0P0w-IzwlvsEmiOPl0cLtGrJ5Cx9Q3Kkkl88KD-XL4c1v-OxH01LJRyQiw%3Ds0" alt="Response to a toxic message in Telegram" width="800" height="400"&gt;&lt;/a&gt;Response to a toxic message in Telegram&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next?
&lt;/h2&gt;

&lt;p&gt;In this post, you've learned about the challenge and importance of monitoring toxic language in online communities and how you can build a no-code Telegram bot for this purpose. The use case in this tutorial is fairly simplistic, but this kind of toxic language detector can be implemented in various platforms at scale.&lt;/p&gt;

&lt;p&gt;For example, you could tweak this workflow and connect the &lt;em&gt;Google Perspective node&lt;/em&gt; to Discord, Discourse, or DISQUS to detect toxic language in online communities and forums, or even to Gmail to filter out toxic emails. You can take different actions to toxic messages, for example forwarding them to a moderator, storing them in a database, flagging or banning the user depending on their message scores.&lt;/p&gt;

&lt;p&gt;Here's what you can do next:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Try this workflow yourself: &lt;a href="https://n8n.io/" rel="noopener noreferrer"&gt;install n8n&lt;/a&gt; or sign up for a free 30-day trial on &lt;a href="https://n8n.cloud/" rel="noopener noreferrer"&gt;n8n.cloud&lt;/a&gt; ☁️&lt;/li&gt;
&lt;li&gt;  Discover &lt;a href="https://n8n.io/workflows" rel="noopener noreferrer"&gt;more workflows&lt;/a&gt; using the &lt;em&gt;Telegram (Trigger) node&lt;/em&gt; ⚙️&lt;/li&gt;
&lt;li&gt;  Join the discussion in the &lt;a href="https://community.n8n.io/c/docs-and-tutorials/6" rel="noopener noreferrer"&gt;n8n community forum&lt;/a&gt; 🗣️&lt;/li&gt;
&lt;li&gt;  Read more &lt;a href="https://n8n.io/blog/tag/tutorial/" rel="noopener noreferrer"&gt;workflow tutorials&lt;/a&gt; 💡&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>nocode</category>
      <category>nlp</category>
      <category>tutorial</category>
      <category>chatbot</category>
    </item>
    <item>
      <title>How to get started with CRM automation (with 3 no-code workflow ideas) 👥</title>
      <dc:creator>Lorena</dc:creator>
      <pubDate>Tue, 14 Sep 2021 07:45:48 +0000</pubDate>
      <link>https://dev.to/n8n/how-to-get-started-with-crm-automation-with-3-no-code-workflow-ideas-1ef5</link>
      <guid>https://dev.to/n8n/how-to-get-started-with-crm-automation-with-3-no-code-workflow-ideas-1ef5</guid>
      <description>&lt;p&gt;If you run a business, sell a product, or offer services, you know how important it is to nurture the relationship with your customers and gain new ones. Whether you operate alone or with a sales team, you probably also know how difficult it can be to keep track of your leads, customers, and orders.&lt;/p&gt;

&lt;p&gt;We're here to show you that you can optimize your sales and customer workflows with two keywords: &lt;strong&gt;CRM&lt;/strong&gt; and &lt;strong&gt;automation&lt;/strong&gt;. Read on to learn what exactly a CRM is, why and when you should use a CRM for your business, and how to automate three common CRM sales workflows in only a few clicks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why and when you should use a CRM&lt;/li&gt;
&lt;li&gt; 3 steps to CRM automation for the sales funnel

&lt;ul&gt;
&lt;li&gt;  Choose the right CRM for your use case
&lt;/li&gt;
&lt;li&gt;  Decide what you will automate
&lt;/li&gt;
&lt;li&gt;  Build workflows

&lt;ul&gt;
&lt;li&gt;  Capture leads from Typeform submissions
&lt;/li&gt;
&lt;li&gt;  Send reminders after meetings with prospects
&lt;/li&gt;
&lt;li&gt;  Process newly created deals based on their stage, value, and priority
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;What's next?&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;A &lt;strong&gt;customer relationship management (CRM)&lt;/strong&gt; tool is pretty self-explanatory: it helps you manage the relationships with your customers. This means that it stores information on your customers (such as name, title, company, role), the relation they have with your company (e.g., lead, opportunity, deal), and the status of the relationship (e.g., closes, open, waiting), and the monetary value of closed deals (like quote and price).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.statista.com/statistics/605933/worldwide-customer-relationship-management-market-forecast/" rel="noopener noreferrer"&gt;The CRM software market is forecast to grow to $43.5 bn in 2024&lt;/a&gt;, proving the increasing popularity and necessity of CRM tools for companies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FVLLc8EAiYXhhfRHvUiNpOGW_kPGMmCQldNOf0hYQf8Ah_hCpOTxTuzmnZySPTioCopf8jTMsc57CEs0Mwf1qk37qnMqGg6djctaChzx1pxGM-9Wz5Ob_CStaNUrlRrrw9t9U8N7T%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FVLLc8EAiYXhhfRHvUiNpOGW_kPGMmCQldNOf0hYQf8Ah_hCpOTxTuzmnZySPTioCopf8jTMsc57CEs0Mwf1qk37qnMqGg6djctaChzx1pxGM-9Wz5Ob_CStaNUrlRrrw9t9U8N7T%3Ds0" alt="Global CRM software market growth" width="800" height="400"&gt;&lt;/a&gt;Global CRM software market growth&lt;/p&gt;

&lt;p&gt;Most commonly, CRMs are used by sales teams to track their sales activities and milestones in the sales funnel, from lead to deal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why and when you should use a CRM
&lt;/h2&gt;

&lt;p&gt;To better understand the value of a CRM, let's take a business example.&lt;/p&gt;

&lt;p&gt;Say you have a creative business specializing in professional corporate photography. Your &lt;strong&gt;visitors&lt;/strong&gt; can book shootings on your website &lt;a href="https://n8n.io/blog/no-code-ecommerce-workflow-automations/" rel="noopener noreferrer"&gt;(which offers plenty of opportunities for automation)&lt;/a&gt;, but you also actively network to find new &lt;strong&gt;opportunities&lt;/strong&gt;. At a business event you've attended, you've met Marketing and Branding employees from different companies (&lt;strong&gt;contacts&lt;/strong&gt;). Of these, a couple have expressed interest in your professional photo shootings for their team, qualifying them as &lt;strong&gt;leads&lt;/strong&gt; for your business. This means they could become your &lt;strong&gt;customers&lt;/strong&gt;, which is awesome! Now you need to follow up with them, make them an &lt;strong&gt;offer&lt;/strong&gt; they can't refuse, and eventually close the &lt;strong&gt;deal&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh4.googleusercontent.com%2FLZfwmishdMdwdXF4JJl_30FZmTFEZeLdZlQB9Wb1Aeb1kZ2WFEbqjbKrGEVfCXnxz_Edn7g6EcbJtIKQ8yG2Red91qek0wc2O7R4GiH1mv5RfbK36QCClVLmhM-1A44urIjNU7RC%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh4.googleusercontent.com%2FLZfwmishdMdwdXF4JJl_30FZmTFEZeLdZlQB9Wb1Aeb1kZ2WFEbqjbKrGEVfCXnxz_Edn7g6EcbJtIKQ8yG2Red91qek0wc2O7R4GiH1mv5RfbK36QCClVLmhM-1A44urIjNU7RC%3Ds0" alt="The sales and marketing funnel" width="800" height="400"&gt;&lt;/a&gt;The sales and marketing funnel&lt;/p&gt;

&lt;p&gt;How can you and your sales team keep track of all these steps, for each lead, while also making sure that you nurture the relationship with your (potential) customers and organize the logistics with your team of photographers? If you think a digital calendar, a paper agenda, and a spreadsheet could do the job, you wouldn't be completely wrong. Sure they can help you organize your time and contacts, but provide limited features, are error-prone and even unmanageable in the long run.&lt;/p&gt;

&lt;p&gt;Here are 3 common challenges that sales-oriented teams face in their organization:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Information is spread in different sources.&lt;/strong&gt;\
You contact a customer via email, your salesperson talks to them on the phone, then you write down key information on a post-it and your colleague inserts the order details in a spreadsheet. If anyone asks something about that specific customer, you'd need to sift through emails and notes and ask several people who've come in contact with the customer--a highly inefficient process.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Information is duplicated or missing completely.&lt;/strong&gt;\
If your salespeople have back-to-back calls and meetings with leads and customers, inevitably they might forget to pass on some information, or even assume that a colleague has taken care of that. At the other end, two salespeople might contact the same lead because they don't have an overview of their assignments. This is how meetings get overlooked, clients get annoyed, and you don't get orders.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Salespeople are unmotivated and exhausted by repetitive tasks.&lt;/strong&gt;\
It's no secret that sales is a fast-paced and high-pressure field. But the role of a salesperson can become particularly challenging if they are often tasked with repetitive work (like sending the same email to different leads) or their work is inefficiently organized. These problems lead to a decrease in performance and invariably in sales.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A CRM solves the five problems listed above, by providing you with a feature-rich integrated system that can save your business up to 10 hours of work every week. In short, you should use a CRM if you sell a product, provide a service, or deal with customers or clients in any way.&lt;/p&gt;

&lt;p&gt;If you are convinced by the advantages of a CRM, but reticent about the costs, note that &lt;a href="https://n8n.io/blog/3-reasons-why-startups-should-invest-in-automation/" rel="noopener noreferrer"&gt;the cost of not automating&lt;/a&gt; is higher than the investment in a CRM, which ultimately increases the productivity and value of your sales team.&lt;/p&gt;

&lt;h2&gt;
  
  
  3 steps to CRM automation for the sales funnel
&lt;/h2&gt;

&lt;p&gt;Now that you've seen the advantages of using a CRM, it's time to start implementing it.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Choose the right CRM for your use case
&lt;/h3&gt;

&lt;p&gt;On a first look, you might get overwhelmed by all the CRM providers available on the market. There are options for different team sizes, departments, and budgets. To help you get an overview of their features, we compiled a list of 10 of the most popular CRMs, which also come with n8n integrations that allow you to perform common CRUD (create, read, update, delete) operations on your saved contacts, companies, deals, and more.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FQTroBlryabnuUQQNrLAJ7be2dc7ZVdsSNxNNR_QPxOtDaiSQqo9oZMAM2P0jMESJoUcXuIzFi3euSdmVSzCI_6DsJv7GzvfjpzWlho_EqOLasts3GdqPVkUXQo-OvsZxJc-q0xtj%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FQTroBlryabnuUQQNrLAJ7be2dc7ZVdsSNxNNR_QPxOtDaiSQqo9oZMAM2P0jMESJoUcXuIzFi3euSdmVSzCI_6DsJv7GzvfjpzWlho_EqOLasts3GdqPVkUXQo-OvsZxJc-q0xtj%3Ds0" alt="CRM integrations on n8n" width="800" height="400"&gt;&lt;/a&gt;CRM integrations on n8n&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;a href="https://www.agilecrm.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Agile CRM&lt;/strong&gt;&lt;/a&gt; is an all-in-one CRM software for marketing, sales, and service. With the &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.agileCrm/" rel="noopener noreferrer"&gt;&lt;em&gt;Agile CRM node&lt;/em&gt;&lt;/a&gt; you can manage company, contact, and deal details in your workflows.&lt;/li&gt;
&lt;li&gt; &lt;a href="https://www.copper.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Copper&lt;/strong&gt;&lt;/a&gt; is a CRM integration for Google Workspace and is best suited for small and medium-sized businesses. The n8n nodes &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.copper/" rel="noopener noreferrer"&gt;&lt;em&gt;Copper&lt;/em&gt;&lt;/a&gt; and &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.copperTrigger/" rel="noopener noreferrer"&gt;&lt;em&gt;Copper Trigger&lt;/em&gt;&lt;/a&gt; provide the basic CRUD operations for companies, customer sources, leads, opportunities, persons, projects, tasks, and users.&lt;/li&gt;
&lt;li&gt; &lt;a href="https://www.hubspot.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;HubSpot&lt;/strong&gt;&lt;/a&gt;'s CRM platform provides tools for social media marketing, sales, content management, and customer service. With the &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.hubspot/" rel="noopener noreferrer"&gt;&lt;em&gt;HubSpot node&lt;/em&gt;&lt;/a&gt; and &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.hubspotTrigger/" rel="noopener noreferrer"&gt;&lt;em&gt;HubSpot Trigger node&lt;/em&gt;&lt;/a&gt; you can manage contacts, contact lists, companies, deals, forms, and tickets.&lt;/li&gt;
&lt;li&gt; &lt;a href="https://www.intercom.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Intercom&lt;/strong&gt;&lt;/a&gt; is a conversational relationship platform which allows businesses to communicate with prospective and existing customers within their app, on their website, through social media, or via email. The &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.intercom/" rel="noopener noreferrer"&gt;&lt;em&gt;Intercom node&lt;/em&gt;&lt;/a&gt; lets you manage companies, leads, and users from the CRM.&lt;/li&gt;
&lt;li&gt; &lt;a href="https://keap.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Keap&lt;/strong&gt;&lt;/a&gt; offers an e-mail marketing and sales platform for small businesses, including products to manage and optimize the customer lifecycle, customer relationship management, marketing automation, lead capture, and e-commerce. The &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.keap/" rel="noopener noreferrer"&gt;&lt;em&gt;Keap node&lt;/em&gt;&lt;/a&gt; and &lt;a href="https://dev.toabout:blank"&gt;&lt;em&gt;Keap Trigger node&lt;/em&gt;&lt;/a&gt; allow you to manage companies, contacts, contact notes and tags, ecommerce orders and products, emails, and files.&lt;/li&gt;
&lt;li&gt; &lt;a href="https://www.pipedrive.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Pipedrive&lt;/strong&gt;&lt;/a&gt; is a cloud-based sales software company that aims to improve the productivity of businesses through the use of their software. You can use the &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.pipedrive/" rel="noopener noreferrer"&gt;&lt;em&gt;Pipedrive node&lt;/em&gt;&lt;/a&gt; and &lt;a href="https://dev.toabout:blank"&gt;&lt;em&gt;Pipedrive Trigger node&lt;/em&gt;&lt;/a&gt; to manage activities, deals, deal products, files, leads, notes, organizations, persons, and products.&lt;/li&gt;
&lt;li&gt; &lt;a href="https://www.salesforce.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Salesforce&lt;/strong&gt;&lt;/a&gt; is the &lt;a href="https://www.statista.com/statistics/972598/crm-applications-vendors-market-share-worldwide/" rel="noopener noreferrer"&gt;leading vendor in the CRM market worldwide&lt;/a&gt;. Salesforce provides customer relationship management service and also sells a complementary suite of enterprise applications focused on customer service, marketing automation, analytics, and application development. The &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.salesforce/" rel="noopener noreferrer"&gt;&lt;em&gt;Salesforce node&lt;/em&gt;&lt;/a&gt; allows you to manage over 10 different fields, such as contacts, leads, opportunities, flows, and tasks.&lt;/li&gt;
&lt;li&gt; &lt;a href="https://www.salesmate.io/" rel="noopener noreferrer"&gt;&lt;strong&gt;Salesmate&lt;/strong&gt;&lt;/a&gt; is a cloud-based CRM solution that caters to small and midsize businesses across various industries. Key features include contact management, sales pipeline management, email marketing and internal chat and phone integration. The &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.salesmate/" rel="noopener noreferrer"&gt;&lt;em&gt;Salesmate node&lt;/em&gt;&lt;/a&gt; lets you manage information about activities, companies, and deals.&lt;/li&gt;
&lt;li&gt; &lt;a href="https://www.zoho.com/crm/" rel="noopener noreferrer"&gt;&lt;strong&gt;Zoho CRM&lt;/strong&gt;&lt;/a&gt; is an online Sales CRM software that manages sales, marketing and support. The &lt;em&gt;Zoho CRM&lt;/em&gt; node allows you to manage accounts, contacts, deals, invoices, leads, products, purchase orders, quotes, sales orders, and vendors. With the &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.zohoCrm/" rel="noopener noreferrer"&gt;&lt;em&gt;Zoho CRM node&lt;/em&gt;&lt;/a&gt; you can perform CRUD operations on deals, invoices, leads, quotes, and many more.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.freshworks.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Freshworks&lt;/strong&gt;&lt;/a&gt; is a cloud-based CRM that helps businesses manage their interactions with their customers and leads. The &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.freshworksCrm/" rel="noopener noreferrer"&gt;&lt;em&gt;Freshworks CRM node&lt;/em&gt;&lt;/a&gt;provides basic operations for managing sales activities, tasks, deals, and more.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  2. Decide what you will automate
&lt;/h3&gt;

&lt;p&gt;After you've picked a CRM and explored its functionalities, you should define what you want to automate. Think of the tasks involved in every step of the sales funnel and ask yourself these questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Is the task repetitive?&lt;/li&gt;
&lt;li&gt;  Is the task time-consuming?&lt;/li&gt;
&lt;li&gt;  Do you need to perform the task often and regularly?&lt;/li&gt;
&lt;li&gt;  Does the task have a high value?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you answered yes to these questions, then your task is most probably a case for automation. Once you've identified the pain points in your current manual workflows, you can start defining and designing automated workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Build workflows
&lt;/h3&gt;

&lt;p&gt;Before starting to create workflows for the tasks identified in the previous step, ask yourself one more question: do you need to automate tasks that take place only within the CRM or also between the CRM and other apps or services?&lt;/p&gt;

&lt;p&gt;For the first case, note that some of the CRMs listed above offer built-in automation functionality for simple workflows. For the second case, you can take advantage of the n8n nodes, which allow you to connect your CRM to 200+ apps or services.&lt;/p&gt;

&lt;p&gt;To help you get started, we've created &lt;strong&gt;3 workflows with HubSpot and Pipedrive&lt;/strong&gt; for automation at every step of the sales journey. Of course, you can replace the &lt;em&gt;HubSpot&lt;/em&gt; and &lt;em&gt;Pipedrive nodes&lt;/em&gt; with another CRM of your choice.&lt;/p&gt;

&lt;h4&gt;
  
  
  Capture leads from Typeform submissions
&lt;/h4&gt;

&lt;p&gt;Typeforms are a presentable and efficient way of capturing leads and feedback from your customers. For example, you can embed a typeform on your website where visitors can request a quote for your services, or one which asks them to submit their contact details in order to download gated content.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://n8n.io/workflows/1223" rel="noopener noreferrer"&gt;This workflow&lt;/a&gt; is triggered when a typeform is submitted, then it saves the sender's information into HubSpot as a new contact.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FPRdrxY-rTttiDAjsMJ0i7iiDUSuXHSiojy-ylUgmc60VYL5Q9m1L9dxenQX3rA5AN1PDJLkUU3T5bz64B0EYysViR9Q7tmg_ZU6iPdptPFiUdMYhBn8Y2yJwOoZEIvp8yJN7AOTT%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FPRdrxY-rTttiDAjsMJ0i7iiDUSuXHSiojy-ylUgmc60VYL5Q9m1L9dxenQX3rA5AN1PDJLkUU3T5bz64B0EYysViR9Q7tmg_ZU6iPdptPFiUdMYhBn8Y2yJwOoZEIvp8yJN7AOTT%3Ds0" alt="Workflow for capturing leads from Typeform submissions" width="800" height="400"&gt;&lt;/a&gt;Workflow for capturing leads from Typeform submissions&lt;/p&gt;

&lt;h4&gt;
  
  
  Send reminders after meetings with prospects
&lt;/h4&gt;

&lt;p&gt;We mentioned that a common problem within sales teams is synchronization and information transfer since salespeople might forget to note down details from their conversations with leads, jumping from one call to another.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://n8n.io/workflows/1221" rel="noopener noreferrer"&gt;This workflow&lt;/a&gt; is triggered when a client meeting is scheduled via Calendly. Then, an activity is automatically created in Pipedrive, to keep track of the lead cycle. Fifteen minutes after the end of the meeting, a message is sent to the responsible salesperson in Slack, reminding them to write down their notes and insights from the meeting with the lead.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FVkzn9lQV4jVUABY_Dr4GrJJCeyX2JhAtD2UIeu6Hag-xq0V_xptw_dRQt_461UlS1n3V03seb8qIvM7vcigLpBO0VSyKktunfTqkDi7K8PVoDR-yh8hi6L6dDji_gPyCLYPRC1Y2%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FVkzn9lQV4jVUABY_Dr4GrJJCeyX2JhAtD2UIeu6Hag-xq0V_xptw_dRQt_461UlS1n3V03seb8qIvM7vcigLpBO0VSyKktunfTqkDi7K8PVoDR-yh8hi6L6dDji_gPyCLYPRC1Y2%3Ds0" alt="Workflow for sending reminders after Calendly meetings" width="800" height="400"&gt;&lt;/a&gt;Workflow for sending reminders after Calendly meetings&lt;/p&gt;

&lt;h4&gt;
  
  
  Process newly created deals based on their stage, value, and priority
&lt;/h4&gt;

&lt;p&gt;You're reaching the bottom of the sales funnel and getting deals--good for you! From here, there are several tasks you can automate to speed up the sales process.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://n8n.io/workflows/1225" rel="noopener noreferrer"&gt;This workflow&lt;/a&gt; is triggered when a new deal is created in HubSpot. Then, it processes the deal based on its type and stage.&lt;/p&gt;

&lt;p&gt;The first branching follows three cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  If the deal is closed and won, a message is sent in a Slack channel, so that the whole team can celebrate the success.&lt;/li&gt;
&lt;li&gt;  If a presentation has been scheduled for the deal, then a Google Slides presentation template is created.&lt;/li&gt;
&lt;li&gt;  If the deal is closed and lost, the deal's details are added to an Airtable table. From here, you can analyze the data to get insights into what and why certain deals don't get closed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The second branching follows two cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  If the deal is for a new business and has a value above 500, a high-priority ticket assigned to an experienced team member is created in HubSpot&lt;/li&gt;
&lt;li&gt;  If the deal is for an existing business and has a value below 500, a low-priority ticket is created.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FoWE5yw3Dr0itlQEWlBCP6TwfV3PTR9cgUy_zqUqtse-u33q5S2yLwX6yOZnTIufR8G0CyWCNHIAlDDjfS4VWaz66ceNdiQoBKH3HLv7uaN_b7ZHRAye22gTQ8W-KvYKV5iDjfhbw%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FoWE5yw3Dr0itlQEWlBCP6TwfV3PTR9cgUy_zqUqtse-u33q5S2yLwX6yOZnTIufR8G0CyWCNHIAlDDjfS4VWaz66ceNdiQoBKH3HLv7uaN_b7ZHRAye22gTQ8W-KvYKV5iDjfhbw%3Ds0" alt="Workflow for processing new leads created in HubSpot" width="" height=""&gt;&lt;/a&gt;Workflow for processing new leads created in HubSpot&lt;/p&gt;

&lt;p&gt;Apart from Typeform, you can also use the &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.eventbriteTrigger/" rel="noopener noreferrer"&gt;&lt;em&gt;Eventbrite Trigger node&lt;/em&gt;&lt;/a&gt;to capture the contact information of people who registered for an event, or the &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.surveyMonkeyTrigger/" rel="noopener noreferrer"&gt;&lt;em&gt;SurveyMonkey Trigger node&lt;/em&gt;&lt;/a&gt; to save the responses of a survey.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next?
&lt;/h2&gt;

&lt;p&gt;In this post, you've learned about the advantages of CRM tools, when and why you should use a CRM, and what workflows you can automate with different CRMs. You've seen how automating different processes in the sales funnel can increase your productivity and minimize the time between the first contact and a closed deal.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Try these workflows yourself: &lt;a href="https://n8n.io/" rel="noopener noreferrer"&gt;install n8n&lt;/a&gt; or sign up for a free 30-day trial on &lt;a href="https://n8n.cloud/" rel="noopener noreferrer"&gt;n8n.cloud&lt;/a&gt; ☁️&lt;/li&gt;
&lt;li&gt;  Discover &lt;a href="https://n8n.io/workflows" rel="noopener noreferrer"&gt;more workflows&lt;/a&gt; using CRM nodes ⚙️&lt;/li&gt;
&lt;li&gt;  Join the discussion in the &lt;a href="https://community.n8n.io/c/docs-and-tutorials/6" rel="noopener noreferrer"&gt;n8n community forum&lt;/a&gt; 🗣️&lt;/li&gt;
&lt;li&gt;  Read more &lt;a href="https://n8n.io/blog/tag/tutorial/" rel="noopener noreferrer"&gt;posts about workflow ideas&lt;/a&gt; 💡&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>automation</category>
      <category>nocode</category>
      <category>sales</category>
      <category>business</category>
    </item>
    <item>
      <title>6 e-commerce workflows to power up your Shopify store 🛒</title>
      <dc:creator>Lorena</dc:creator>
      <pubDate>Fri, 03 Sep 2021 09:08:04 +0000</pubDate>
      <link>https://dev.to/n8n/6-e-commerce-workflows-to-power-up-your-shopify-store-5b60</link>
      <guid>https://dev.to/n8n/6-e-commerce-workflows-to-power-up-your-shopify-store-5b60</guid>
      <description>&lt;p&gt;The online shopping trend has been driven by increasing digitalization in the past years, and the COVID-19 outbreak has only fueled e-commerce growth. Consumers around the world are turning to online stores for pretty much all product categories, due to in-person restrictions or contamination concerns.&lt;/p&gt;

&lt;p&gt;In 2020, &lt;a href="https://www.statista.com/statistics/379046/worldwide-retail-e-commerce-sales/" rel="noopener noreferrer"&gt;retail e-commerce sales worldwide&lt;/a&gt; amounted to $4.28 trillion and i&lt;a href="https://www.statista.com/statistics/251666/number-of-digital-buyers-worldwide/" rel="noopener noreferrer"&gt;n 2021&lt;/a&gt;, over 2.14 billion people worldwide are expected to buy goods and services online.&lt;/p&gt;

&lt;p&gt;But as online shoppers have grown in numbers, so have digital shop owners. Thanks to e-commerce platforms, almost anyone can set up an online shop with only a few clicks. However, running even a small digital business also involves some manual, repetitive tasks that might add up and steal too much of your precious time. Luckily, these kinds of tasks can be automated.&lt;/p&gt;

&lt;p&gt;In this post, we'll have a look at the most popular e-commerce platforms and how to automate common e-commerce workflows in n8n.&lt;/p&gt;

&lt;h1&gt;
  
  
  The e-commerce platforms leading the global market
&lt;/h1&gt;

&lt;p&gt;Many e-commerce software platforms have emerged in the last years, but the &lt;a href="https://www.statista.com/statistics/710207/worldwide-ecommerce-platforms-market-share/" rel="noopener noreferrer"&gt;most popular ones&lt;/a&gt;, leading the market, are &lt;strong&gt;&lt;a href="https://woocommerce.com/" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;&lt;/strong&gt; and &lt;a href="https://www.shopify.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Shopify&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv423kzzi5q97ioeqpl0g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv423kzzi5q97ioeqpl0g.png" alt="Market share of e-commerce software platforms worldwide in 2021" width="775" height="522"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.statista.com/statistics/950550/worldwide-ecommerce-platforms-market-share/" rel="noopener noreferrer"&gt;&lt;em&gt;Market share of e-commerce software platforms worldwide in 2021&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shopify&lt;/strong&gt; is a paid e-commerce platform that offers templates for quickly designing your online shop. With Shopify, you don't have to worry too much about technicalities and instead focus on selling your products.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WooCommerce&lt;/strong&gt; is the second most popular e-commerce software platform as of &lt;a href="https://www.statista.com/statistics/710207/worldwide-ecommerce-platforms-market-share/" rel="noopener noreferrer"&gt;April 2021&lt;/a&gt;, owning over 23% of the market share. WooCommerce is actually an open-source WordPress plugin, making it the go-to choice for smaller and cost-conscious shop owners.&lt;/p&gt;

&lt;h1&gt;
  
  
  Why and when you should automate your online shop
&lt;/h1&gt;

&lt;p&gt;Regardless of the platform you choose or the products you sell, there might come a time when managing your orders becomes too time-consuming and you'll probably find yourself in one of these two common situations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Your business is growing&lt;/strong&gt; and you don't have the bandwidth anymore to manage all the orders. You might consider hiring one or two people to help you out, but be aware that additional employees could mean additional responsibilities for you.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Your online shop is a side hustle&lt;/strong&gt; and you don't have much time to take care of it besides your main job. You neither want to compromise your career nor give up your hobby business.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you identify yourself with one of these cases, you should consider automating at least part of your work. Workflow automation might sound intimidating or a skill reserved for the tech-savvy ones -- but don't be intimidated.&lt;/p&gt;

&lt;p&gt;No-code tools with a visual user interface like &lt;a href="https://n8n.io/" rel="noopener noreferrer"&gt;n8n&lt;/a&gt; make automation feel like childsplay. You can easily combine WooCommerce and Shopify integrations with other apps or services to automate common workflows in your digital store.&lt;/p&gt;

&lt;h1&gt;
  
  
  Workflow automation ideas for Shopify
&lt;/h1&gt;

&lt;p&gt;n8n offers four nodes for Shopify and WooCommerce: &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.shopify/" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;em&gt;Shopify node&lt;/em&gt;&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.shopifyTrigger/" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;em&gt;Shopify Trigger node&lt;/em&gt;&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.wooCommerce/" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;em&gt;WooCommerce node&lt;/em&gt;&lt;/strong&gt;&lt;/a&gt;, and &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.wooCommerceTrigger/" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;em&gt;WooCommerce Trigger node&lt;/em&gt;&lt;/strong&gt;&lt;/a&gt;. They provide various operations for managing orders, products, customers, and carts. Read our docs to learn how to configure the nodes, then you can start configuring various parameters to build workflows for your online store.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;Shopify&lt;/em&gt; and &lt;em&gt;WooCommerce&lt;/em&gt; nodes open up many possibilities for automation, helping you win back time and focus on things that matter. Here are six ideas of workflows you can automate for your Shopify store (and adapt for WooCommerce):&lt;/p&gt;

&lt;h2&gt;
  
  
  Promote your new products on social media
&lt;/h2&gt;

&lt;p&gt;A &lt;a href="https://www.statista.com/statistics/1031962/global-social-commerce-activities-age/" rel="noopener noreferrer"&gt;survey on social commerce&lt;/a&gt; revealed that 43% of users research products online via social networks and 28% discover brands via ads on social media. This goes to show how important it is to have a presence on social media and regularly share and promote your products.&lt;/p&gt;

&lt;p&gt;To help you automate your social media activity, we created &lt;a href="https://n8n.io/workflows/1205" rel="noopener noreferrer"&gt;a workflow&lt;/a&gt; that is triggered when you create a product on your shop and automatically shares the news on your Twitter account and a Telegram channel with the message "Hey there, my design is now on a new product ✨ Visit my [shop_name] shop to get this cool &lt;a href="https://dev.toand%20check%20out%20more%20[product_category]"&gt;product_name&lt;/a&gt; 🛍️ [shop_link]".&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh4.googleusercontent.com%2FPfT3breNP11_HKVZtsbWEbvaQeAx6Lw9DndVq-cxhtkJd7omEgOVxzmaSp3lXU4vWbFLBXzo0McRpv3o0mUZrZQaDuKJoBcL1PqyoJ6aV3BC2Jr89Oly36Mvv9r-Dq-rFaHjiMTU%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh4.googleusercontent.com%2FPfT3breNP11_HKVZtsbWEbvaQeAx6Lw9DndVq-cxhtkJd7omEgOVxzmaSp3lXU4vWbFLBXzo0McRpv3o0mUZrZQaDuKJoBcL1PqyoJ6aV3BC2Jr89Oly36Mvv9r-Dq-rFaHjiMTU%3Ds0" alt="Workflow for social media promotion" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Workflow for social media promotion&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can make the message even more appealing to your audience by adding a &lt;em&gt;Bannerbear node&lt;/em&gt; that automatically creates template images for your new product announcements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Update customer and order details in Zoho CRM
&lt;/h2&gt;

&lt;p&gt;Once the first orders start to come in, it's time to neatly track the orders and nurture the relationship with your customers, ideally in a customer relationship management (CRM) system.&lt;/p&gt;

&lt;p&gt;The first branch of &lt;a href="https://n8n.io/workflows/1206" rel="noopener noreferrer"&gt;this workflow&lt;/a&gt; saves customer orders from Shopify to the Zoho CRM and Trello. In the &lt;em&gt;Zoho node&lt;/em&gt;, you can select the option &lt;em&gt;Create or Update&lt;/em&gt;, which creates a new contact if a contact with a matching last name and email address exists. This way, you don't have to worry about duplicate contacts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FVZPu93FFd12B4IH-MmCNYumKg_yt-SxoIPvF6uKI9KjM2unE4kUTdwE0t2R3DXUwc5CUA7Wy2iwVeoB37AnQ3pKrTv1lvr7BTPfPVG-VI7yV_m1ucxoKDf6xrHulPm1oY7JOkRxy%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FVZPu93FFd12B4IH-MmCNYumKg_yt-SxoIPvF6uKI9KjM2unE4kUTdwE0t2R3DXUwc5CUA7Wy2iwVeoB37AnQ3pKrTv1lvr7BTPfPVG-VI7yV_m1ucxoKDf6xrHulPm1oY7JOkRxy%3Ds0" alt="Workflow for saving order details from Shopify to Trello and Zoho CRM" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Workflow for saving order details from Shopify to Trello and Zoho CRM&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Of course, you can replace the &lt;em&gt;Zoho node&lt;/em&gt; with another CRM, for example, &lt;em&gt;Salesforce&lt;/em&gt; or &lt;em&gt;Agile CRM&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create invoices for new orders
&lt;/h2&gt;

&lt;p&gt;A not-so-fun part of being a shop owner is paperwork like invoices. Manually writing the details of each order for each customer is not only tedious but also error-prone. So why not automate this task?&lt;/p&gt;

&lt;p&gt;The second branch of &lt;a href="https://n8n.io/workflows/1206" rel="noopener noreferrer"&gt;this workflow&lt;/a&gt; automatically generates invoices with Harvest when an order is created in Shopify. Then, it creates a Trello card with the order information and the invoice attached.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FgQKFDQxQjHaujl4qDGx3HA7fxZzrEk9BYx7xph5d-Di69h8gWmVdZH53l_k6433ECcb61VMAzJRtFECbVnUfI1enrz-NGxKqbLLbbTtoDypbwr92lV30fpu_CmfROXWH-wkns_BB%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2FgQKFDQxQjHaujl4qDGx3HA7fxZzrEk9BYx7xph5d-Di69h8gWmVdZH53l_k6433ECcb61VMAzJRtFECbVnUfI1enrz-NGxKqbLLbbTtoDypbwr92lV30fpu_CmfROXWH-wkns_BB%3Ds0" alt="Workflow for creating invoices in Harvest from Shopify orders" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Workflow for creating invoices in Harvest from Shopify orders&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Offer coupons and discounts to high-order customers
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.statista.com/statistics/1231069/leading-reasons-for-buying-products-when-shopping-online/" rel="noopener noreferrer"&gt;In 2020&lt;/a&gt;, the leading reasons why internet users around the world added a product to their online basket and purchased the item were free delivery, coupons or discounts, and reviews from other customers.&lt;/p&gt;

&lt;p&gt;Though free delivery is up to your pricing model, you can (and should) invest in the latter two incentives. For example, you can offer discounts to high-order customers.&lt;/p&gt;

&lt;p&gt;The third branch of &lt;a href="https://n8n.io/workflows/1206" rel="noopener noreferrer"&gt;this workflow&lt;/a&gt; is triggered when a new order is created and checks if the order value is above 100 (or any value you set) -- if it is, it sends an email to the customer with a 10% discount coupon for their next order, otherwise, it sends them an email thanking them for their order.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh6.googleusercontent.com%2F_hXeGJ0Zy91SxF_t_mXa3GYiyuJT2RyjmfCJeN92vtND7lsuJLo7oGbvDTPCg7TJQSOEaV1E0I3xgtJ_Z9O-2qu0MyX5hz9oUVsMwp4gp8Uajn2EI9f2J3j-1G4buWZm3b0kmB4-%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh6.googleusercontent.com%2F_hXeGJ0Zy91SxF_t_mXa3GYiyuJT2RyjmfCJeN92vtND7lsuJLo7oGbvDTPCg7TJQSOEaV1E0I3xgtJ_Z9O-2qu0MyX5hz9oUVsMwp4gp8Uajn2EI9f2J3j-1G4buWZm3b0kmB4-%3Ds0" alt="Workflow for filtering high-value customers from Shopify" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Workflow for filtering high-value customers from Shopify&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For extra productivity, you can combine this and the previous two workflows into &lt;a href="https://n8n.io/workflows/1206" rel="noopener noreferrer"&gt;one&lt;/a&gt; super-workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh6.googleusercontent.com%2FrrUF6JaAVO0z9RQmXa5BMHVgsFN9q7H4IbAI7r2WOvRCOwjjtaKoAQzcyb78HNEqSbwFd4hqNHYBH9mMqKd-xgVRPE1BwT3Jq0JjP-g-OWaowQArRJoUecsFM2gdzlbKrn1l6geb%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh6.googleusercontent.com%2FrrUF6JaAVO0z9RQmXa5BMHVgsFN9q7H4IbAI7r2WOvRCOwjjtaKoAQzcyb78HNEqSbwFd4hqNHYBH9mMqKd-xgVRPE1BwT3Jq0JjP-g-OWaowQArRJoUecsFM2gdzlbKrn1l6geb%3Ds0" alt="Workflow for processing new Shopify orders" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Workflow for processing new Shopify orders&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Request customers to write a review after they have received their order
&lt;/h2&gt;

&lt;p&gt;In the previous workflow, we've mentioned product reviews and the third reason why customers buy products online. In recent years, it has become increasingly important to the consumer to read up on a product, business, or service before spending any money. In 2020, reviews were the third top reason that convinced shoppers to purchase a product online. &lt;a href="https://www.statista.com/statistics/1020836/share-of-shoppers-reading-reviews-before-purchase/" rel="noopener noreferrer"&gt;This year&lt;/a&gt;, nearly 70% of online shoppers typically read between one and six customer reviews before making a purchasing decision.&lt;/p&gt;

&lt;p&gt;This proves the value of investing in customer experience and incentivizing shoppers to write reviews about your products. To incentivize them, you can tweak &lt;a href="https://n8n.io/workflows/1206" rel="noopener noreferrer"&gt;this workflow&lt;/a&gt; to be triggered when an order is marked as fulfilled by selecting the &lt;em&gt;Topic: Order Fulfilled&lt;/em&gt; in the &lt;em&gt;Shopify Trigger node&lt;/em&gt;. Then, an email should be sent to the customer, asking them to write a review about their product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run sales inventories and reports in Google Sheets
&lt;/h2&gt;

&lt;p&gt;When your online shop generates a steady flow of orders, it's necessary to keep an inventory and track your growth regularly. For small businesses, even a Google Sheet can be enough for keeping records.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://n8n.io/workflows/1207" rel="noopener noreferrer"&gt;This workflow&lt;/a&gt; is scheduled to run every week, when it gets all your Shopify orders, calculates their sales value, and stores the data in Google Sheets for you to evaluate. Additionally, it can send a message to a Slack channel (if you work together with a small team) or Telegram to inform you about your weekly sales.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FeUv1I8a5iJJfVtPYkIVgK96eEA0c1tcw7NbeGwut5yhRcJ2uXvRwbeVthn3eeW4OgXIPjPLH1kZdT1ZxHlbgEyyZoeqOiVxadLih9X6dvZOYd9p7SINMX6D4sWGWKNJs6DReWRig%3Ds0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FeUv1I8a5iJJfVtPYkIVgK96eEA0c1tcw7NbeGwut5yhRcJ2uXvRwbeVthn3eeW4OgXIPjPLH1kZdT1ZxHlbgEyyZoeqOiVxadLih9X6dvZOYd9p7SINMX6D4sWGWKNJs6DReWRig%3Ds0" alt="Workflow for running inventories of Shopify orders" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Workflow for running inventories of Shopify orders&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  What's next?
&lt;/h1&gt;

&lt;p&gt;In this post, we talked about the growing popularity of e-commerce and possibilities for automation. Whether as a full-time business or a fun side hustle, if you want to sell goods online -- from art prints to clothing and cosmetic products -- e-commerce platforms like Shopify and WooCommerce enable you to set up an online shop with only a few clicks.&lt;/p&gt;

&lt;p&gt;Now that you've learned how to automate six common workflows in your online shop to save precious time, here's what you can do next:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Try these workflows yourself: &lt;a href="https://n8n.io/" rel="noopener noreferrer"&gt;install n8n&lt;/a&gt; or sign up for a free 30-day trial on &lt;a href="https://n8n.cloud/" rel="noopener noreferrer"&gt;n8n.cloud&lt;/a&gt; ☁️&lt;/li&gt;
&lt;li&gt;  Discover more workflows using the &lt;em&gt;Shopify (Trigger)&lt;/em&gt; and &lt;em&gt;WooCommerce (Trigger)&lt;/em&gt; nodes on the &lt;a href="https://n8n.io/workflows" rel="noopener noreferrer"&gt;n8n workflows page&lt;/a&gt; -- and feel free to &lt;a href="https://docs.n8n.io/reference/contributing.html#contribute-a-workflow-%F0%9F%A7%AC" rel="noopener noreferrer"&gt;share your workflows&lt;/a&gt; as well ⚙️&lt;/li&gt;
&lt;li&gt;  Join the discussion in the &lt;a href="https://community.n8n.io/c/docs-and-tutorials/6" rel="noopener noreferrer"&gt;n8n community forum&lt;/a&gt; 🗣️&lt;/li&gt;
&lt;li&gt;  Discover more &lt;a href="https://n8n.io/blog/tag/ideas/" rel="noopener noreferrer"&gt;automation use cases&lt;/a&gt; 💡&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>automation</category>
      <category>nocode</category>
      <category>ecommerce</category>
      <category>workflow</category>
    </item>
    <item>
      <title>Web-scraping IMDb with R</title>
      <dc:creator>Lorena</dc:creator>
      <pubDate>Mon, 28 Jun 2021 10:45:52 +0000</pubDate>
      <link>https://dev.to/lorena/web-scraping-imdb-with-r-26np</link>
      <guid>https://dev.to/lorena/web-scraping-imdb-with-r-26np</guid>
      <description>&lt;p&gt;You don't need to be a data scientist to work with data. In fact, you most probably use a lot of data every day in both your professional and personal life. For example, price comparison sites, movie rankings, sales leads, and trending topics are all examples of data that is available online in some sort of table form.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why web-scraping?
&lt;/h2&gt;

&lt;p&gt;If you want to collect or analyse this data, copy-pasting each record is most likely out of the question. This manual task is not only time-consuming, but also expensive and prone to error – not to mention it's also downright boring.&lt;/p&gt;

&lt;p&gt;Here's where automatic web-scraping can help! &lt;strong&gt;Web scraping&lt;/strong&gt; is a method of automatically gathering data from websites in a structured manner and storing it into a local database or spreadsheet.&lt;/p&gt;

&lt;p&gt;There are many no-code &lt;strong&gt;tools for web scraping&lt;/strong&gt;, like browser plug-ins (e.g. &lt;a href="https://www.webscraper.io/" rel="noopener noreferrer"&gt;Webscraper&lt;/a&gt;) and software (e.g. &lt;a href="https://www.parsehub.com/" rel="noopener noreferrer"&gt;Parsehub&lt;/a&gt;). However, if you need more advanced scraping settings and have basic coding skills, I recommend the Python libraries &lt;a href="https://www.crummy.com/software/BeautifulSoup/" rel="noopener noreferrer"&gt;Beautiful Soup&lt;/a&gt; or &lt;a href="https://selenium-python.readthedocs.io/" rel="noopener noreferrer"&gt;Selenium&lt;/a&gt;, and the R package &lt;a href="https://cran.r-project.org/web/packages/rvest/README.html" rel="noopener noreferrer"&gt;rvest&lt;/a&gt;. The latter is the one I used for scraping IMDb and you can find the commented code on &lt;a href="https://github.com/lorenanda/imdb/blob/master/imdb_top_2018.R" rel="noopener noreferrer"&gt;my GitHub&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Before I proceed to the fun part, note that the legality of web scraping is not clearly defined around the world, so you should check the website's terms of use before scraping it!&lt;/p&gt;

&lt;p&gt;Now let's dive in. I wanted my data analysis to answer three questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  What are the most successful movies released in 2018?&lt;/li&gt;
&lt;li&gt;  What genres do the popular movies belong to?&lt;/li&gt;
&lt;li&gt;  What is the duration of the most popular movies?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Top popular movies
&lt;/h2&gt;

&lt;p&gt;I used &lt;a href="https://www.imdb.com/search/title?year=2018" rel="noopener noreferrer"&gt;IMDb&lt;/a&gt; as a reference, because it contains all the information I need. On the website I selected the movies released between 01.01.-31.12.2018, sorted by popularity, and limited my search to the first page, so the top 50 movies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="n"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rvest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.imdb.com/search/title?year=2018"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;imdb&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;read_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imdb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;top 5 popular movies in 2018&lt;/strong&gt; were:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;em&gt;Aquaman&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt; &lt;em&gt;Green Book&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt; &lt;em&gt;Bohemian Rhapsody&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt; &lt;em&gt;Spider-Man: Into the Spider-Verse&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt; &lt;em&gt;Avengers: Infinity War&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Top movie genres
&lt;/h2&gt;

&lt;p&gt;Scraping the genre tags of each movie is pretty straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="n"&gt;genre_data_html&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;html_nodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imdb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".genre"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;genre_data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;html_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;genre_data_html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;genre_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, this returns a list of genres for each movie, because the movies are labeled with multiple genres. The text data needs to be cleaned a bit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="c1"&gt;#remove the \n in front of the genres&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;genre_data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"\n"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;genre_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c1"&gt;#remove the spaces between genres&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;genre_data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;genre_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, another tricky thing is that the genres of each movie are enumerated alphabetically, not in order of importance. To simplify my work, I selected only the first genre:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="c1"&gt;#display only the first genre in the list&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;genre_data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;",.*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;genre_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At a glance, I noticed that 3 out 5 are action-hero movies, so I visualized closer at the &lt;strong&gt;genre distribution:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="c1"&gt;#plot the number of movies by genre&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ggplot2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;ggplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imdb_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;aes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;genre_data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;geom_bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"purple"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"green"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;ggtitle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Number of movies by genre"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;xlab&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Genre"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ylab&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Number of movies"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2019%2F03%2Fgenres_count.png%3Fw%3D736" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2019%2F03%2Fgenres_count.png%3Fw%3D736" alt="genres_count" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;My initial observation was confirmed: &lt;strong&gt;Action&lt;/strong&gt; and &lt;strong&gt;Drama&lt;/strong&gt; are the most popular genres, followed by &lt;strong&gt;Biography&lt;/strong&gt;. I guess most people enjoy, on one hand, movies that transport them into wild worlds and simulate experiences out of the ordinary, and on the other hand, movies that depict dramatic life stories and relate to some extent to their real life.&lt;/p&gt;

&lt;h2&gt;
  
  
  Top movies duration
&lt;/h2&gt;

&lt;p&gt;Next, I analyzed the &lt;strong&gt;distribution of movie duration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="c1"&gt;#plot the movies by runtime&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;barplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imdb_df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;Runtime&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imdb_df&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;Runtime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;ggplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imdb_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;aes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;runtime_data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;geom_histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"purple"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"green"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;ggtitle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Distribution of movie runtimes"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;xlab&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Minutes"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ylab&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Number of movies"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The plot shows that most popular movies last on average &lt;strong&gt;104 minutes&lt;/strong&gt; (median 117 minutes). The &lt;strong&gt;longest movie&lt;/strong&gt; is &lt;em&gt;Avengers: Infinity War &lt;/em&gt;(149 minutes) and the &lt;strong&gt;shortest movie&lt;/strong&gt; (excluding TV-shows) is &lt;em&gt;A.I. Rising&lt;/em&gt; (85 minutes). From the histogram it is clear that the bars on the left represent the TV-shows (under 60 minutes).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2019%2F03%2Fhist_runtime.png%3Fw%3D736" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2019%2F03%2Fhist_runtime.png%3Fw%3D736" alt="hist_runtime" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I also analyzed the runtime distribution by genre. First, I aggregated the movies by genre:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="c1"&gt;#group movies by genre&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dplyr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;genre_cat&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;group_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;imdb_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Genre&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;genre_runtime&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;genre_cat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Minutes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Runtime&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;genre_runtime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, I visualized the average duration for each genre:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight r"&gt;&lt;code&gt;&lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;genre_runtime&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;Genre&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;genre_runtime&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;Minutes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;ggplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;genre_runtime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;aes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Genre&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Minutes&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;geom_bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stat&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"identity"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"purple"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"green"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;ggtitle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Mean movie duration by genre"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I found that among genres &lt;strong&gt;Biographies are longest&lt;/strong&gt; (on average &lt;strong&gt;127 minutes&lt;/strong&gt;) and &lt;strong&gt;Crimes are shortest&lt;/strong&gt; (on average &lt;strong&gt;85 minutes&lt;/strong&gt;). This is not entirely surprising, since I think that, first, it is quite a challenge to pack a lifetime in a biographical movie, and second, there's only so much nerve-wrecking tension a person can take following a crime. However, I was expecting the average duration of &lt;strong&gt;Animations&lt;/strong&gt; to be shorter than &lt;strong&gt;110 minutes&lt;/strong&gt;, because they are produced mainly for children, who have a short attention span and low patience to sit through a two-hour movie. But then again, we are talking about the most popular movies of last year on IMDb, which means that adults made up the large audience.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2019%2F03%2Fmean_genre_runtime.png%3Fw%3D736" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Florenaciutacu.files.wordpress.com%2F2019%2F03%2Fmean_genre_runtime.png%3Fw%3D736" alt="mean_genre_runtime" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This is a simple web scraping project that can reveal a lot of information about people's movie preferences. It would be interesting to also analyze the total gross and see which movies and genres have sold best in 2018. Now &lt;strong&gt;you&lt;/strong&gt; could try to scrape and analyze this information with your preferred tool and let me know what you found out!&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>webscraping</category>
      <category>r</category>
      <category>github</category>
    </item>
    <item>
      <title>5 tasks you can automate with the new Notion API ⚡</title>
      <dc:creator>Lorena</dc:creator>
      <pubDate>Wed, 09 Jun 2021 15:03:56 +0000</pubDate>
      <link>https://dev.to/n8n/5-tasks-you-can-automate-with-the-new-notion-api-43i</link>
      <guid>https://dev.to/n8n/5-tasks-you-can-automate-with-the-new-notion-api-43i</guid>
      <description>&lt;p&gt;If you're into productivity and organisation hacks, you've probably heard of (and even got to love) &lt;a href="https://www.notion.so/" rel="noopener noreferrer"&gt;Notion&lt;/a&gt;, the all-in-one workspace app that allows you to take notes, create databases, manage projects, and schedule tasks–all with highly customisable designs.&lt;/p&gt;

&lt;p&gt;At n8n, we've been using Notion since day one for the internal organisation: from meeting notes and onboarding checklists to content calendars and product research. Imagine our excitement when we heard that &lt;a href="https://developers.notion.com/" rel="noopener noreferrer"&gt;Notion launched their API (beta)&lt;/a&gt;, thus opening new possibilities of using the app in a more personalised way!&lt;/p&gt;

&lt;h1&gt;
  
  
  Notion integrations
&lt;/h1&gt;

&lt;p&gt;Our developers got to work right away and &lt;a href="https://www.producthunt.com/posts/notion-n8n-integration" rel="noopener noreferrer"&gt;we launched&lt;/a&gt; two of the most awaited nodes: &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.notion/#basic-operations" rel="noopener noreferrer"&gt;&lt;strong&gt;Notion node&lt;/strong&gt;&lt;/a&gt; and &lt;a href="https://docs.n8n.io/nodes/n8n-nodes-base.notionTrigger/" rel="noopener noreferrer"&gt;&lt;strong&gt;Notion Trigger node&lt;/strong&gt;&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Notion node&lt;/strong&gt; has five basic operations that allow you to manage blocks, get and query databases, create and update records in a database, create and search for pages, and get users who are part of your workspace. The &lt;strong&gt;Notion Trigger node&lt;/strong&gt; allows you to check at regular intervals when a page is added to the database, then trigger a workflow.&lt;/p&gt;

&lt;p&gt;Now you can easily connect your tools to Notion to sync data and boost your productivity. We're super excited to automate some of our workflows, and even more so to see what other creative ideas the &lt;a href="http://community.n8n.io/" rel="noopener noreferrer"&gt;n8n community&lt;/a&gt; comes up with. In this article, we'll present to you &lt;strong&gt;5 Notion workflows&lt;/strong&gt; you can automate in n8n:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add candidates' profile assessment to Notion before an interview&lt;/li&gt;
&lt;li&gt;Check to-do's in Notion and notify the assignee in Slack&lt;/li&gt;
&lt;li&gt;Send notifications about new Notion notes to Mattermost&lt;/li&gt;
&lt;li&gt;Add positive feedback messages to a compliments table in Notion&lt;/li&gt;
&lt;li&gt;Add memorable articles to your Notion reading list&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Add candidates' profile assessment to Notion before an interview
&lt;/h2&gt;

&lt;p&gt;Scheduling interviews has become easier thanks to apps like Calendly, but evaluating candidates' skills and personality still requires a human touch and good psychological understanding. &lt;a href="https://humantic.ai" rel="noopener noreferrer"&gt;Humantic AI&lt;/a&gt; can complement recruiters' evaluation, by generating psychometric assessments (including &lt;a href="https://en.wikipedia.org/wiki/DISC_assessment" rel="noopener noreferrer"&gt;DISC&lt;/a&gt; and the &lt;a href="https://en.wikipedia.org/wiki/Big_Five_personality_traits" rel="noopener noreferrer"&gt;Big Five personality traits&lt;/a&gt;) from candidates' résumés.&lt;/p&gt;

&lt;p&gt;For this use case, we created &lt;a href="https://n8n.io/workflows/1107" rel="noopener noreferrer"&gt;a workflow&lt;/a&gt; that is triggered when an interview is scheduled via Calendly. The Humantic AI node retrieves the LinkedIn profile of the candidate from Calendly, creates their psychometric assessment, then the Notion node inserts this information into a dedicated page.&lt;/p&gt;

&lt;p&gt;By automating this process, you don't have to worry about your applicant database being up to date, and instead, you have more time to prepare for the meeting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2Feam2Ikgu4aGqo6mcnDKyq3viUjLGH5zs9XA-p2_5gIroCFtZ4GCTFYmHK79PtTjEplv4GyaZwB8WlfZn-vL64uNLO1h9JbL9ImDb8Y_vprrNJdFyv3vbLaOno_5hhJ3jHURhO_aY" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh3.googleusercontent.com%2Feam2Ikgu4aGqo6mcnDKyq3viUjLGH5zs9XA-p2_5gIroCFtZ4GCTFYmHK79PtTjEplv4GyaZwB8WlfZn-vL64uNLO1h9JbL9ImDb8Y_vprrNJdFyv3vbLaOno_5hhJ3jHURhO_aY" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Check to-do's in Notion and notify the assignee in Slack
&lt;/h2&gt;

&lt;p&gt;One of the most useful applications of Notion for teams is keeping to-do lists where you can assign tasks to specific people. A limitation is that the in-app or email notifications can easily be overseen and are out of context. To overcome this issue, you can use &lt;a href="https://n8n.io/workflows/1105" rel="noopener noreferrer"&gt;this workflow&lt;/a&gt; that regularly checks your to-do list in Notion and notifies the person in Slack when a new task is assigned to them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FcYTvsnlN0p8PbiuPd7DTC2Ravaw8WB9U4m9NIKNA2dRKlQZtHH9X77DBFasfLLhIDJXtHbY9yHVHB52MMnVmVY9IDNgWJRGunpAcX-qwRTbx3DxJMHOir6IFxe3cSnR2A1tPI6n5" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2FcYTvsnlN0p8PbiuPd7DTC2Ravaw8WB9U4m9NIKNA2dRKlQZtHH9X77DBFasfLLhIDJXtHbY9yHVHB52MMnVmVY9IDNgWJRGunpAcX-qwRTbx3DxJMHOir6IFxe3cSnR2A1tPI6n5" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of course, you can replace the Slack node with another messaging app of your choice. We prefer Mattermost and have built several &lt;a href="https://n8n.io/blog/5-workflow-automations-for-mattermost-that-we-love-at-n8n/" rel="noopener noreferrer"&gt;workflows around it&lt;/a&gt;, which substantially improve our communication and productivity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Send notifications about new Notion notes to Mattermost
&lt;/h2&gt;

&lt;p&gt;Another practical use of Notion pages is for taking notes in meetings and keeping them well-organised by topic or meeting type. After the meeting, you might first want to go get a coffee before taking care of the action items just discussed. But as it often happens, even a short break can interrupt your working flow, so you may easily forget what you needed to do or who you had to follow up with once you're back at your desk.&lt;/p&gt;

&lt;p&gt;Luckily, you can automate your chores away! &lt;a href="https://n8n.io/workflows/1089" rel="noopener noreferrer"&gt;This workflow&lt;/a&gt; is triggered whenever new meeting notes are added in Notion. If the property field in the notes mentions the Marketing team, a message about the new notes will be sent to the Marketing channel in Mattermost, so all team members are up to date. Now you can enjoy your long-awaited coffee break knowing that the main action items will be taken care of.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh6.googleusercontent.com%2FtlVM2zUetq3H-9LuD930xgz42RQqWGQvd-JxKCQIf6P1s4oAEYLopnfZASlmk0ySFhxMgcp6FoVLfS4uZPEqrcbAerLXvYKnNT2kVSMSDNQQ8nHbYmNt6pR1y23Mk-BZq4m71Zwm" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh6.googleusercontent.com%2FtlVM2zUetq3H-9LuD930xgz42RQqWGQvd-JxKCQIf6P1s4oAEYLopnfZASlmk0ySFhxMgcp6FoVLfS4uZPEqrcbAerLXvYKnNT2kVSMSDNQQ8nHbYmNt6pR1y23Mk-BZq4m71Zwm" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Add positive feedback messages to a compliments table in Notion
&lt;/h2&gt;

&lt;p&gt;We, humans, are prone to &lt;a href="https://en.wikipedia.org/wiki/Negativity_bias" rel="noopener noreferrer"&gt;negativity bias&lt;/a&gt;, meaning that we remember negative experiences more (strongly) than positive ones. For example, in a business context, when getting feedback from customers on your product or service, you are more likely to keep in mind that one negative review over the other tens of praise messages. This bias can impact your perception of your work and the team spirit, so it's important to highlight the good news.&lt;/p&gt;

&lt;p&gt;For this use case, we built &lt;a href="https://n8n.io/workflows/1109" rel="noopener noreferrer"&gt;a workflow&lt;/a&gt; that analyses the sentiment of feedback messages submitted via Typeform: messages with negative sentiment are added to a Trello board, while the ones with positive sentiment are added to a compliments table in Notion, then shared in a Slack channel. Looking once in a while at the impact you have on your customers is a wonderful way to keep your team motivated and inspired!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh6.googleusercontent.com%2F-z6QPNwDy4xAVe8u865TD9-Ba03wL4LMN4uj6BcRzNqZR92bwkK960oBEhVUcv6j3HEpjhH3Fo8v4L4QwLGvOQxulSRZeQ6FyX6EsnUSq1V9vKtBne8lwapynnH7mMI_vYYJHdwd" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh6.googleusercontent.com%2F-z6QPNwDy4xAVe8u865TD9-Ba03wL4LMN4uj6BcRzNqZR92bwkK960oBEhVUcv6j3HEpjhH3Fo8v4L4QwLGvOQxulSRZeQ6FyX6EsnUSq1V9vKtBne8lwapynnH7mMI_vYYJHdwd" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Add memorable articles to your Notion reading list
&lt;/h2&gt;

&lt;p&gt;"The modern day reading list includes more than just books.", writes Notion in their &lt;a href="https://handbook.wonder.me/Reading-List-cbc13be6a94343caaf2149d3704bef76" rel="noopener noreferrer"&gt;Reading List template&lt;/a&gt;, and we couldn't agree more. &lt;a href="https://wordpress.com/activity/posting/" rel="noopener noreferrer"&gt;Millions of blog posts&lt;/a&gt;, news articles, and research papers are published every day, serving you information on how to solve problems, improve various skills, make informed decisions--or are just food for thought. You'll most probably stumble on some articles worth sharing or saving for a later (re-)read.&lt;/p&gt;

&lt;p&gt;To help you manage your reading list, we designed &lt;a href="https://n8n.io/workflows/1110" rel="noopener noreferrer"&gt;a workflow&lt;/a&gt; that automatically adds important articles to a Notion page from Discord. When you type in &lt;a href="https://discord.com/developers/docs/interactions/slash-commands" rel="noopener noreferrer"&gt;Discord the slash command&lt;/a&gt;  /[URL], with the URL of the article you want to save, the workflow extracts the article title and adds the linked title to the reading list in your Notion page. When all is done, you get a confirmation message on Discord: "The link was added to Notion." Now you can focus only on reading and taking notes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2Fzy2CR3lyQClhHD8itnQjc1pTOApsKRH7LSqo-N63M8Z0xVE0yw6wHfSYs192NQr_GSwccSHykfqS9UTphbh8m145THKxvvq9nRr6GKxBX0w9JyyPsXU6ndmerBhy31db_VDlj0Mw" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Flh5.googleusercontent.com%2Fzy2CR3lyQClhHD8itnQjc1pTOApsKRH7LSqo-N63M8Z0xVE0yw6wHfSYs192NQr_GSwccSHykfqS9UTphbh8m145THKxvvq9nRr6GKxBX0w9JyyPsXU6ndmerBhy31db_VDlj0Mw" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Next Steps
&lt;/h1&gt;

&lt;p&gt;So, which workflow will you try first? On our blog, you can find more ideas of workflows that you can automate &lt;a href="https://n8n.io/blog/your-business-doesnt-need-you-to-operate/" rel="noopener noreferrer"&gt;in the workplace&lt;/a&gt; and &lt;a href="https://n8n.io/blog/workflow-automation-new-year-resolutions/" rel="noopener noreferrer"&gt;in your personal life&lt;/a&gt;, which you can now tweak to integrate Notion. Share your ideas with us on &lt;a href="https://twitter.com/n8n_io" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; or in the &lt;a href="http://community.n8n.io/" rel="noopener noreferrer"&gt;community forum&lt;/a&gt; — we're curious to see what you build! And if you want to get the latest content on automation with n8n, &lt;a href="https://n8n.io/blog/#subscribe" rel="noopener noreferrer"&gt;subscribe to our newsletter&lt;/a&gt; 💌&lt;/p&gt;

</description>
      <category>automation</category>
      <category>notion</category>
      <category>productivity</category>
      <category>api</category>
    </item>
    <item>
      <title>6 findings from analysing Oscars speeches with Python</title>
      <dc:creator>Lorena</dc:creator>
      <pubDate>Sun, 06 Jun 2021 11:36:44 +0000</pubDate>
      <link>https://dev.to/lorena/6-findings-from-analysing-oscars-speeches-with-python-370p</link>
      <guid>https://dev.to/lorena/6-findings-from-analysing-oscars-speeches-with-python-370p</guid>
      <description>&lt;p&gt;On the occasion of the 93rd Oscars Award Ceremony, I was curious to do some text mining on the acceptance speeches. Specifically, I analysed the speeches of the Best Directors between 1941 and 2019. I used a dataset from &lt;a href="https://www.kaggle.com/unanimad/the-oscar-award" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt; and added missing data for 2017, 2018, and 2019 directly from the &lt;a href="http://aaspeechesdb.oscars.org/" rel="noopener noreferrer"&gt;Academy Awards Acceptance Speech database&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In total, 74 Best Directors have been awarded and almost all of them gave acceptance speeches, which I analysed with Python and the NLTK library. You can find the Jupyter notebook &lt;a href="https://colab.research.google.com/drive/18lgeB3LHdXg2Ly6cjMi49W5VNlmkqZNc?usp=sharing" rel="noopener noreferrer"&gt;here&lt;/a&gt;. Let's see what the words reveal about the Best Directors and the Oscars!&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Average speech length
&lt;/h3&gt;

&lt;p&gt;The speech of a Best Director has 104 words on average, but speeches range widely from 8 to 267 words.&lt;/p&gt;

&lt;p&gt;Here's how to calculate the number of words in a text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;directing&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;words&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;directing&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Speech_clean&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Longest &amp;amp; shortest speeches
&lt;/h3&gt;

&lt;p&gt;The longest speech runs at 267 words and was given by Mel Gibson at the 68th Academy Awards in 1995 for his film &lt;em&gt;Braveheart&lt;/em&gt;. This guy had a looot of people to thank to and seems to have used up all his words for saying pretty much nothing.&lt;/p&gt;

&lt;p&gt;The shortest speech was summed up in 8 words by Delbert Mann at the 28th Academy Awards in 1955 for his film &lt;em&gt;Marty&lt;/em&gt;. I really like his efficient "I came. I won. I thanked." structured speech:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Thank you. Thank you very much. Appreciate it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here's how to find the longest and shortest text in a dataframe with pandas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;directing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;words&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Lexical richness
&lt;/h3&gt;

&lt;p&gt;Lexical richness is a measure of how many unique words are used in the text. Lexical richness is calculated as the total number of unique words divided by the total number of words. The higher the score, the richer the vocabulary–and vice-versa. Here's to calculate lexical richness for each speech in the dataframe with Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lexical_richness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;directing&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lex_rich&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;lexical_richness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;directing&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Speech_clean&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;directing&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The speech with the highest lexical richness (0.408) is Delbert Mann's, the director of &lt;em&gt;Marty&lt;/em&gt;, awarded in 1955. This means that 40.8% of the words he used are distinct.&lt;/p&gt;

&lt;p&gt;At the other end, the speech with the lowest lexical richness (0.034) is Mel Gibson's, the director of &lt;em&gt;Braveheart&lt;/em&gt;, awarded in 1995. This means that 3.4% of the words he used are distinct.  &lt;/p&gt;

&lt;h3&gt;
  
  
  4. Longest words
&lt;/h3&gt;

&lt;p&gt;The longest words used in directors' speeches have 15 words: &lt;em&gt;administrations&lt;/em&gt;, &lt;em&gt;cinematographer&lt;/em&gt;, and &lt;em&gt;czechoslovakian&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Here's how to select the longest words in a text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;long_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;all_speeches_tokenized&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;long_words&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Most common words
&lt;/h3&gt;

&lt;p&gt;The top 10 most common words in all acceptance speeches are: &lt;em&gt;thank&lt;/em&gt; (201 occurrences), &lt;em&gt;much&lt;/em&gt; (56), &lt;em&gt;like&lt;/em&gt; (50), &lt;em&gt;people&lt;/em&gt; (48), &lt;em&gt;want&lt;/em&gt; (42), &lt;em&gt;would&lt;/em&gt; (30), &lt;em&gt;movie&lt;/em&gt; (26), &lt;em&gt;film&lt;/em&gt; (26), &lt;em&gt;say&lt;/em&gt; (24), and &lt;em&gt;many&lt;/em&gt; (22).&lt;/p&gt;

&lt;p&gt;Interestingly, out of these 10 words, 3 are nouns (referring to &lt;em&gt;people&lt;/em&gt; and &lt;em&gt;film&lt;/em&gt;/&lt;em&gt;movie&lt;/em&gt;), 2 express large quantities (&lt;em&gt;much&lt;/em&gt; and &lt;em&gt;many&lt;/em&gt;), and 5 are verbs that express personal feelings (&lt;em&gt;want&lt;/em&gt;, &lt;em&gt;like&lt;/em&gt;) or actions (&lt;em&gt;say&lt;/em&gt;, &lt;em&gt;thank&lt;/em&gt;). It's also worth noting that the word &lt;em&gt;thank&lt;/em&gt; has a significantly higher frequency than the following common words, which is however understandable.&lt;/p&gt;

&lt;p&gt;Here's how to find the frequency distribution of words in a text with NLTK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nc"&gt;FreqDist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_speeches_tokenized&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;most_common&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. "Thank" to...
&lt;/h3&gt;

&lt;p&gt;Ok, winners thank a lot, but who do they thank to? It turns out... to &lt;em&gt;you&lt;/em&gt;, but also to &lt;em&gt;the Pacific Command of the United States&lt;/em&gt;, &lt;em&gt;Mr. harry Cohn&lt;/em&gt;, &lt;em&gt;Marlon&lt;/em&gt;, &lt;em&gt;the producers&lt;/em&gt;, and &lt;em&gt;each one of them&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Here's see the location of a word in context with NLTK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nc"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;word_tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_speeches&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;concordance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;thank&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's all, folks! Could/should I have analysed anything else? Let me know what you think in the comments below ⬇️&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>jupyter</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
