<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jason Skowronski</title>
    <description>The latest articles on DEV Community by Jason Skowronski (@mostlyjason).</description>
    <link>https://dev.to/mostlyjason</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F55648%2Ff5ad13f6-da68-4b9b-aae3-3086955cb02c.jpg</url>
      <title>DEV Community: Jason Skowronski</title>
      <link>https://dev.to/mostlyjason</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mostlyjason"/>
    <language>en</language>
    <item>
      <title>Fear database changes? Get them under control with CI/CD</title>
      <dc:creator>Jason Skowronski</dc:creator>
      <pubDate>Tue, 17 Dec 2019 16:15:31 +0000</pubDate>
      <link>https://dev.to/heroku/fear-database-changes-get-them-under-control-with-ci-cd-44n1</link>
      <guid>https://dev.to/heroku/fear-database-changes-get-them-under-control-with-ci-cd-44n1</guid>
      <description>&lt;p&gt;Developers often fear database changes because a mistake by anyone on your team can lead to a major outage and even data loss. The stakes are higher when changes are not backwards compatible, cannot be rolled back, or impact system performance. This can cause a lack of confidence and slow your team velocity. As a result, database changes are a common failure point in agile and DevOps. &lt;/p&gt;

&lt;p&gt;Databases are often created created manually and too often evolve through manual changes, informal process, and even testing in production. This makes your system more fragile. The solution is to include database changes in your source control and CI/CD pipeline. This lets your team document each change, follow the code review process, test it thoroughly before release, make rollbacks easier, and coordinate with software releases.&lt;/p&gt;

&lt;p&gt;Let’s look at an example of how to include database migrations in your CI/CD process and push a non-backwards-compatible database change successfully. We'll also look at testing your changes, progressive deployments, dealing with rollbacks, and a few helpful tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is CI/CD?
&lt;/h2&gt;

&lt;p&gt;CI/CD is a cornerstone of modern development and DevOps.&lt;/p&gt;

&lt;p&gt;CI—or Continuous Integration—is the practice of merging all working developer code into a shared repository throughout the day. Its purpose is to prevent integration problems by integrating often and early. Commonly, this integration kicks off an automated build and test.&lt;/p&gt;

&lt;p&gt;CD—or Continuous Delivery—is the practice of building, testing, and releasing software in short cycles, with the aim of ensuring that a working version of the software can be released at any time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is Your Database Ready For CI/CD?
&lt;/h2&gt;

&lt;p&gt;There are several key requirements to having your database ready for CI/CD. First, the database must be reproducible from scratch using one or more SQL scripts. This means that in addition to a script that creates the initial version of your database, you must also maintain scripts that make all required schema updates to your database.&lt;/p&gt;

&lt;p&gt;When you create these scripts, you have two options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Create one script per schema object, then update the corresponding script (state based) when making changes to the object.&lt;/li&gt;
&lt;li&gt; Create one original script that creates the entire database schema. Then, create a series of individual change scripts (migration based) for changes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To learn more, check out &lt;a href="https://dev.to/pesse/one-does-not-simply-update-a-database--migration-based-database-development-527d"&gt;this excellent article&lt;/a&gt; on state-based versus migration-based database updates.&lt;/p&gt;

&lt;p&gt;The second requirement for CI/CD is that the database schema (meaning, those scripts we just mentioned), just like your source code, must live in source control. You must treat your database schema changes as a controlled process just as you do with code.&lt;/p&gt;

&lt;p&gt;Third, always back up before performing any database migrations. If you're working with a live production database, consider a &lt;a href="https://devcenter.heroku.com/articles/heroku-postgres-follower-databases"&gt;Postgres follower database&lt;/a&gt; for your migration or upgrade.&lt;/p&gt;

&lt;p&gt;Lastly, changes that involve removing a database object, such as deleting a column as shown below, can be more difficult to deal with due to the loss of data. Many organizations develop strategies to deal with this, such as only allowing additive changes (e.g. adding a column), or having a team of DBAs that deals with such changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is Your Team Ready for CI/CD?
&lt;/h2&gt;

&lt;p&gt;Perhaps the best process for database changes and database CI/CD is ensuring you have a collaborative effort between DevOps and DBAs. Make sure your DBAs are part of the code review cycle; they can help to identify issues that only they may know about. DBAs have knowledge of the databases in each specific environment, including database specific dependencies such as ETL load jobs, database maintenance tasks, and more.&lt;/p&gt;

&lt;p&gt;Be sure to consult a database SME in setting up your database for CI/CD, and in any migration process, when possible. Be sure to also follow sensible DevOps processes, such as test your changes in a test environment, performing backups, mitigating risks, being prepared for rollbacks, and so on.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Your CI Tool Helps With Migrations
&lt;/h2&gt;

&lt;p&gt;When you create or update these scripts, and push them to source control, your CI tool (such as Jenkins or Heroku CI) will pull the changes and then:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Rebuild your database to the newest version of the scripts in a test or staging environment. Since the database is being rebuilt, be sure to export the look up/reference data, then import it back to the new schema. Although it is possible to export and import transactional data, transactional data is out of scope for this article. You can &lt;a href="https://www.isaca.org/Journal/archives/2012/Volume-1/Pages/Database-Backup-and-Recovery-Best-Practices.aspx"&gt;read more about best practices here&lt;/a&gt; if interested.&lt;/li&gt;
&lt;li&gt; Run your tests. For testing your database changes, one possible time saver is to have two sets of tests. The first set is a quick test that verifies your build scripts and runs a few basic functional tests (such as referential integrity, stored procedures unit tests, triggers, and so on). The second set includes migration of transactional data (possibly scrubbed production data) to run a more realistic full set of tests.&lt;/li&gt;
&lt;li&gt; Deploy your database changes to your production environment or another selected environment. (Depending on your migration strategy, the CI tool should also simultaneously deploy and test any code changes dependent on the database change.)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Watch Out for These Common Problems
&lt;/h2&gt;

&lt;p&gt;In many cases, when you're making a simple schema addition with bidirectionally compatible code, then you can push code and database changes at the same time. This shouldn't be an issue, as rollbacks in our case will be easy and predictable. This is often true when we are dealing with microservices with simple database components.&lt;/p&gt;

&lt;p&gt;However, in many scenarios, serious problems can happen with this simplistic approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Production data may be different from test/stage data and cause unforeseen issues.&lt;/li&gt;
&lt;li&gt;  A large number of changes in both code and database schema may be in the pipeline and need to be deployed simultaneously.&lt;/li&gt;
&lt;li&gt;  CI/CD processes may not be consistent through every environment.&lt;/li&gt;
&lt;li&gt;  You may be under a zero-downtime mandate.&lt;/li&gt;
&lt;li&gt;  Even using tools that help you to achieve zero-downtime (such as Heroku preboot) you can end up with two versions of the code running simultaneously.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are several strategies for addressing the above issues. Some popular solutions include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  If your changes are backwards-compatible, then use a tick-tock release pattern. This approach involves releasing the new database column then releasing the new code. You can identify problems early in this manner, with minimal production changes. Additionally, the rollback remains small and manageable, and can be accomplished with tools such as Heroku's Postgres rollback, as noted above.&lt;/li&gt;
&lt;li&gt;  If your provider supports it, use a blue/green rollout. In this pattern, an entirely new set of production servers is created side-by-side with the current production servers. Enable database synchronization and use a DNS or a proxyto cut over to the new servers/database. You can rollback by simply changing the proxy back to the original servers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A Simple Migration Example
&lt;/h2&gt;

&lt;p&gt;Let’s run through an example based on the the migration scripting option as explained above. Note that some frameworks (Rails, Django, ORM tools, and so on) abstract out or handle schema creation and migration for you. While the details may differ according to the framework you are using, the below example should still help you to understand these core concepts. For example, you may have a schema configuration file to include in your CI/CD process.&lt;/p&gt;

&lt;p&gt;For our example, we'll use Node.js, Postgres, and GitHub. We'll also use Heroku because it provides convenient tools including &lt;a href="https://devcenter.heroku.com/articles/heroku-ci"&gt;Heroku CI&lt;/a&gt; with deploy scripts for CI/CD, and easy Postgres rollbacks in case we make a mistake. If you need help deploying Node.js and Postgres on Heroku, &lt;a href="https://devcenter.heroku.com/articles/getting-started-with-nodejs?singlepage=true"&gt;here’s a quick walk-through&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here's the pertinent code for our example. We're going to create a simple database with a single table, and a Node.js file that writes to that database table on load.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Database creation SQL (we have just one simple table):&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;id&lt;/span&gt;           &lt;span class="nb"&gt;integer&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;firstname&lt;/span&gt;    &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;lastname&lt;/span&gt;     &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;enrolled&lt;/span&gt;     &lt;span class="nb"&gt;char&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;created_at&lt;/span&gt;   &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Node.js&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;INSERT INTO users 
  (id,firstname,lastname,enrolled,created_at) 
  values ($1,$2,$3,$4,$5) &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Becky&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Smith&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;y&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;()]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once these files are checked into GitHub and our repository is attached to a Heroku app, we can enable the &lt;a href="https://devcenter.heroku.com/articles/heroku-ci"&gt;Heroku CI tool&lt;/a&gt; on the Heroku dashboard:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xOmtr3Ur--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://thepracticaldev.s3.amazonaws.com/i/twqhtd8tkh07zflzni5s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xOmtr3Ur--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://thepracticaldev.s3.amazonaws.com/i/twqhtd8tkh07zflzni5s.png" alt="Heroku CI on the Heroku Dashboard" width="800" height="241"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The real work is done by the &lt;a href="https://devcenter.heroku.com/articles/procfile"&gt;Heroku Procfile&lt;/a&gt; and the &lt;a href="https://devcenter.heroku.com/articles/release-phase"&gt;Heroku release phase&lt;/a&gt;. Using those, we can tell the Heroku CI tool to run a database migration SQL file any time a new release is created (in other words, a successful compile). Here is the release line we need to include in the Heroku Procfile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;release: bash `./release-tasks.sh`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The content of the release-tasks file includes a list of SQL scripts to run. That list is updated with each release to include the needed schema modifications. For this very simple example, it will point to just one script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;psql &lt;span class="nt"&gt;-h&lt;/span&gt; &amp;lt;&lt;span class="nb"&gt;hostname&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &amp;lt;database&amp;gt; &lt;span class="nt"&gt;-U&lt;/span&gt; &amp;lt;user&amp;gt; &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; database/migrate.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(The database password can be supplied as a Heroku environment variable.)&lt;/p&gt;

&lt;p&gt;Typically, as we are using the migration-based strategy, we would add additional migration scripts for each set of changes. For a more robust solution, we could use a tool such as Liquibase, &lt;a href="https://pypi.org/project/alembic/"&gt;Alembic&lt;/a&gt; or &lt;a href="https://flywaydb.org/"&gt;Flyway.&lt;/a&gt; These tools add version control to your database, both generating the necessary change scripts between releases, and giving you the ability to easily roll back changes. For example, Flyaway creates scripts that allow you to migrate from any version of your database (including an empty database) to the latest version of the schema.&lt;/p&gt;

&lt;p&gt;To kick off the CI tool, we make two changes: drop a required column, and change the JavaScript to no longer reference that column. First, we update the SQL code in Node.js, taking out the column:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;INSERT INTO users 
  (id,firstname,lastname,created_at) 
  values ($1,$2,$3,$4) &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Becky&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Smith&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;()]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we create a migrate.sql file (referenced in the Procfile above) to alter the table and remove the column:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;enrolled&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we commit the code change and SQL file, and watch the CI magic. First, the integration tests run. If you are using a common testing framework, the Heroku CI tool probably works with your test suite.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QyLaRqTY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://thepracticaldev.s3.amazonaws.com/i/dvnf6v59ody5yl2ow49x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QyLaRqTY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://thepracticaldev.s3.amazonaws.com/i/dvnf6v59ody5yl2ow49x.png" alt="Tests run and pass" width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And now the CI tool creates a new release and deploys the app, which kicks off the migrate.sql file. (See the middle of the image below.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jYZqVO6q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://thepracticaldev.s3.amazonaws.com/i/dd7lfsdczwqa9q6sn86t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jYZqVO6q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://thepracticaldev.s3.amazonaws.com/i/dd7lfsdczwqa9q6sn86t.png" alt="CI tool deploy success" width="800" height="167"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can check to see that the column was removed by inspecting the database through the Heroku CLI tool:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MXqnN1OO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://thepracticaldev.s3.amazonaws.com/i/5ngr7e0phuwvpmwnohit.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MXqnN1OO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://thepracticaldev.s3.amazonaws.com/i/5ngr7e0phuwvpmwnohit.png" alt="Heroku CI tool" width="800" height="286"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It worked! There is no longer a column named 'enrolled'. Our CI tool ran our script and deleted the column.&lt;/p&gt;

&lt;p&gt;Some tools, like Liquibase, keep a detailed list of database changes. These tools allow you to easily see the last set of changes in cases like the above.&lt;/p&gt;

&lt;p&gt;Now, any time that code or an updated migrate.sql is committed in the future, the CI tool will kick off the tests. If the tests pass, this creates a new release and pushes it to staging. When there is a new release, the migrate.sql file runs against the staging database.&lt;/p&gt;

&lt;p&gt;We've taken a simple route here for demonstration purposes, but could have made this process more robust. For instance, when moving a new release to staging, we could wipe out the old version of the database, create a new one from scratch running the original creation script plus all migration scripts, and then populate the database with any reference data all through the Procfile and release phase. Also note that for simplicity sake, we are not running this migration with transactions in progress. In a real-world scenario, &lt;a href="https://devcenter.heroku.com/articles/release-phase#review-apps-and-the-postdeploy-script"&gt;Heroku recommends using an advisory lock&lt;/a&gt; to prevent concurrent migrations.&lt;/p&gt;

&lt;h2&gt;
  
  
  How To Do Rollbacks
&lt;/h2&gt;

&lt;p&gt;Even with the best planning and forethought, there will be times when you need to roll back your database. There are many approaches to rolling back failed deployments.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Create a SQL file that rolls back the changes quickly. (For example, while you are in staging, use a compare utility to generate the script.) This file should be part of the deployment package so that you can quickly run the rollback if there is an error.&lt;/li&gt;
&lt;li&gt;  Roll forward (quickly push a new build that fixes the issue).&lt;/li&gt;
&lt;li&gt;  Rely on source control and labels or branches to recreate and deploy the previous version.&lt;/li&gt;
&lt;li&gt;  Restore a full backup of your database. (Use a tool that ships with your database, such as pg_restore in Postgres.)&lt;/li&gt;
&lt;li&gt;  Use a tool provided by your platform, such as &lt;a href="https://devcenter.heroku.com/articles/heroku-postgres-rollback"&gt;Heroku Postgres Rollback&lt;/a&gt; and &lt;a href="https://devcenter.heroku.com/articles/releases#rollback"&gt;Heroku Release Rollback&lt;/a&gt; for code. As the name implies, Heroku Postgres Rollback allows you to easily roll back your database to a previous point in time, quickly and confidently moving your database back to a working release.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Be aware that all these solutions come with their own challenges, such as potential loss of new data (restoring a backup or redeploying) and introducing new bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Database changes and migrations can be scary, and can cause serious mistrust. However, if you place your database under CI/CD controls, you can not only confidently migrate your changes, but also move towards a better agile and DevOps experience. This is can be as simple as using source control for your database schema, having a good process in place with your DevOps and DBA teams, and using your existing CI tools to test and migrate your databases. Once you establish and train your team on the new process, future changes will be smoother and more automatic than your old manual process.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>devops</category>
      <category>heroku</category>
    </item>
    <item>
      <title>How Stream Processing Makes Your Event-Driven Architecture Even Better</title>
      <dc:creator>Jason Skowronski</dc:creator>
      <pubDate>Wed, 11 Dec 2019 16:29:02 +0000</pubDate>
      <link>https://dev.to/heroku/how-stream-processing-makes-your-event-driven-architecture-even-better-5ehg</link>
      <guid>https://dev.to/heroku/how-stream-processing-makes-your-event-driven-architecture-even-better-5ehg</guid>
      <description>&lt;p&gt;If you’re an architect or developer looking at event-driven architectures, stream processing might be just what you need to make your app faster, more scalable, and more decoupled.&lt;/p&gt;

&lt;p&gt;In this article—the third in a series about event-driven architectures—we will review a little of &lt;a href="https://dev.to/heroku/best-practices-for-event-driven-microservice-architecture-2lh7"&gt;the first article in the series,&lt;/a&gt; which outlined the benefits of event-driven architectures, some of the options, and a few patterns and anti-patterns. We will also review the &lt;a href="https://dev.to/heroku/scale-your-apps-with-an-easy-message-queue-on-redis-4glp"&gt;second article&lt;/a&gt;, which provided more detail on message queues and deployed a quick-start message queue using Redis and RSMQ.&lt;/p&gt;

&lt;p&gt;This article will also dive deeper into stream processing. We will discuss why you might pick stream processing as your architecture, some of the pros and cons, and a quick-to-deploy reference architecture using &lt;a href="https://kafka.apache.org/" rel="noopener noreferrer"&gt;Apache Kafka&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  What is an Event-Driven Architecture?
&lt;/h1&gt;

&lt;p&gt;Stream processing is a type of event-driven architecture. In event-driven architectures, when a component performs some piece of work that other components might be interested in, that component (called a producer) produces an event—a record of the performed action. Other components (called consumers) consume those events so that they can perform their own tasks as a result of the event.&lt;/p&gt;

&lt;p&gt;This decoupling of consumers and producers gives event-driven architectures several benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Asynchronous—Communications between components are asynchronous, avoiding any bottlenecks caused by synchronous, monolithic architectures.&lt;/li&gt;
&lt;li&gt;  Decoupled—Components don’t need to know about one another, and can be developed, tested, deployed, and scaled independently.&lt;/li&gt;
&lt;li&gt;  Easy Scaling—Since components are decoupled, bottleneck issues can be more easily tracked to a single component, and quickly scaled.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are two main kinds of event-driven architectures: message queues and stream processing. Let's dive into the differences.&lt;/p&gt;

&lt;h1&gt;
  
  
  Intro to Message Queues
&lt;/h1&gt;

&lt;p&gt;With message queues, the original event-driven architecture, the producer places a message into a queue &lt;em&gt;targeted to a specific consumer&lt;/em&gt;. That message is held in the queue (often in first-in, first-out order) until the consumer retrieves it, at which time the message is deleted.&lt;/p&gt;

&lt;p&gt;Message queues are useful for systems where you know exactly what needs to happen as a result of an event. When an issue occurs, your producer sends a message to the queue, targeted to some consumer(s). Those consumers obtain the message from the queue and then execute the next operation. Once that next step is taken, the event is removed from the queue forever. In the case of message queues, the flow is generally known by the queue, giving rise to the term “smart broker/dumb consumer”, which means the broker (queue) knows where to send a message, and the consumer is just reacting.&lt;/p&gt;

&lt;h1&gt;
  
  
  Intro to Stream Processing
&lt;/h1&gt;

&lt;p&gt;With stream processing, messages are &lt;em&gt;not&lt;/em&gt; targeted to a certain recipient, but rather are published at-large to a specific topic and available to all interested consumers. Any and all interested recipients can subscribe to that topic and read the message. Since the message must be available to all consumers, the message is not deleted when it is read from the stream.&lt;/p&gt;

&lt;p&gt;Producers and brokers don’t need or want to know what will happen as a result of a message, or where that message will go. The producer just sends the message to broker, the broker publishes it, and the producer and broker move on. Interested consumers receive the message and complete their processing. Because of this further decoupling, systems with event streaming can evolve easily as the project evolves.&lt;/p&gt;

&lt;p&gt;Consumers can be added and deleted and can change how and what they process, regardless of the overall system. The producer and the broker don’t need to know about these changes because the services are decoupled. This is often referred to as “dumb broker/smart consumer”—the broker (stream) is just a broker, and has no knowledge of routing. The consumers in message processing are the smart components; they are aware of what messages to listen for.&lt;/p&gt;

&lt;p&gt;Also, consumers can retrieve multiple messages at the same time and since messages are not deleted, consumers can replay a series of messages going back in time. For example, a new consumer can go back and read older messages from before that consumer was deployed.&lt;/p&gt;

&lt;p&gt;Stream processing has become the go-to choice for many event-driven systems. It offers several advantages over message queues including multiple consumers, replay of events, and sliding window statistics. Overall, you gain a major increase in flexibility.&lt;/p&gt;

&lt;h1&gt;
  
  
  Should You Use Stream Processing or Message Queues?
&lt;/h1&gt;

&lt;p&gt;Here are a several use cases for each:&lt;/p&gt;

&lt;h3&gt;
  
  
  Message Queues
&lt;/h3&gt;

&lt;p&gt;Message queues, such as &lt;a href="https://www.rabbitmq.com/" rel="noopener noreferrer"&gt;RabbitMQ&lt;/a&gt; and &lt;a href="https://activemq.apache.org/" rel="noopener noreferrer"&gt;ActiveMQ&lt;/a&gt; are popular. Message queues are particularly helpful in systems where you have known or complex routing logic, or when you need to guarantee a single delivery of each message.&lt;/p&gt;

&lt;p&gt;A typical use case for message queues is a busy ecommerce website where your services must be highly available, your requests must be delivered, and your routing logic is known and unlikely to change. With these constraints, message queues give you the powerful advantages of asynchronous communication and decoupled services, while keeping your architecture simple.&lt;/p&gt;

&lt;p&gt;Additional use cases often involve system dependencies or constraints, such as a system having a frontend and backend written in different languages or a need to integrate into legacy infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stream Processing
&lt;/h3&gt;

&lt;p&gt;Stream processing is useful for systems with more complex consumers of messages such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Website Activity Tracking&lt;/strong&gt;. Activity on a busy website creates a &lt;em&gt;lot&lt;/em&gt; of messages. Using streams, you can create a series of real-time feeds, which include page views, clicks, searches, and so on, and allow a wide range of consumers to monitor, report on, and process this data.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Log Aggregation&lt;/strong&gt;. Using streams, log files can be turned into a centralized stream of logging messages that are easy for consumers to consume. You can also calculate sliding window statistics for metrics, such as an average every second or minute. This can greatly reduce the output data volumes, making your infrastructure more efficient.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;IOT&lt;/strong&gt;. IOT also produces a &lt;em&gt;lot&lt;/em&gt; of messages. Streams can handle a large volume of messages, and publish them to a large number of consumers in a highly scalable and performant manner.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Event Sourcing&lt;/strong&gt;. As described in &lt;a href="https://dev.to/heroku/best-practices-for-event-driven-microservice-architecture-2lh7"&gt;a previous article&lt;/a&gt;, streams can be used to implement &lt;a href="https://martinfowler.com/eaaDev/EventSourcing.html" rel="noopener noreferrer"&gt;event sourcing&lt;/a&gt;, where updates and deletes are never performed directly on the data; rather, state changes of an entity are saved as a series of events.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Messaging&lt;/strong&gt;. Complex and highly-available messaging platforms such as Twitter and LinkedIn use streams (Kafka) to drive metrics, deliver messages to news feeds, and so on.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  A Reference Architecture Using Kafka
&lt;/h1&gt;

&lt;p&gt;In our previous article, we deployed a quick-to-stand-up message queue to learn about queues. Let’s do a similar example stream processing.&lt;/p&gt;

&lt;p&gt;There are many options for stream processing architectures, including the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Apache Kafka&lt;/li&gt;
&lt;li&gt;  Apache Spark&lt;/li&gt;
&lt;li&gt;  Apache Beam/Google Cloud Data Flow&lt;/li&gt;
&lt;li&gt;  Spring Cloud Data Flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We'll use the &lt;a href="https://devcenter.heroku.com/articles/event-driven-microservices-with-apache-kafka" rel="noopener noreferrer"&gt;Apache Kafka reference architecture on Heroku&lt;/a&gt;. &lt;a href="http://heroku.com/" rel="noopener noreferrer"&gt;Heroku&lt;/a&gt; is a cloud platform-as a service (PaaS) that offers &lt;a href="https://devcenter.heroku.com/categories/kafka" rel="noopener noreferrer"&gt;Kafka as an add-on&lt;/a&gt;. Their cloud platform makes it easy to deploy a streaming system rather than hosting or running your own. Since Heroku provides a &lt;a href="https://github.com/heroku-examples/edm-terraform" rel="noopener noreferrer"&gt;Terraform script&lt;/a&gt; that deploys all the needed code and configuration for you in one step, it's a quick and easy way to learn about stream processing.&lt;/p&gt;

&lt;p&gt;We won’t walk through the deployment steps here, as they are outlined in &lt;a href="https://devcenter.heroku.com/articles/event-driven-microservices-with-apache-kafka" rel="noopener noreferrer"&gt;detail on the reference architecture page.&lt;/a&gt; However, it deploys an example eCommerce system that showcases the major components and advantages of stream processing. Clicks to browse or purchase products are recorded as events to Kafka.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F955zdjsilchvpn47d2by.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F955zdjsilchvpn47d2by.png" alt="eCommerce example"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is a key snippet of code from &lt;a href="https://github.com/trevorscott/edm-relay/blob/master/index.js" rel="noopener noreferrer"&gt;edm-relay&lt;/a&gt;, which sends messages to the Kafka stream. It's quite simple to publish events to Kafka since it's only a matter of calling the producer API to insert a JSON object.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/produceClickMessage&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;function &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
     &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;KAFKA_PREFIX&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
     &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`topic: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
     &lt;span class="nx"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;produce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="nx"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="c1"&gt;// Message to send. Must be a buffer&lt;/span&gt;
       &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
       &lt;span class="c1"&gt;// for keyed messages, we also specify the key - note that this field is optional&lt;/span&gt;
       &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="c1"&gt;// you can send a timestamp here. If your broker version supports it,&lt;/span&gt;
       &lt;span class="c1"&gt;// it will get added. Otherwise, we default to 0&lt;/span&gt;
       &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
     &lt;span class="p"&gt;);&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
     &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;A problem occurred when sending our message&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
     &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;message&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Success!&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A real-time dashboard then consumes the stream of click events and displays analytics. This could be useful for business analytics to explore the most popular products, changing trends, and so on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fxrtxw5cv8cyruahpzdmp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fxrtxw5cv8cyruahpzdmp.png" alt="EDM Dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the code from &lt;a href="https://github.com/trevorscott/edm-stream/blob/master/index.js" rel="noopener noreferrer"&gt;edm-stream&lt;/a&gt; that subscribes to the topic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ready&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="nx"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subscribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;kafkaTopics&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  
   &lt;span class="nx"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;consume&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
   &lt;span class="nx"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
     &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Error in Kafka consumer: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
   &lt;span class="p"&gt;});&lt;/span&gt;
   &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Kafka consumer ready.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
   &lt;span class="nf"&gt;clearTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;connectTimoutId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
 &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and then consumes the message from the stream by calling an event handler for each message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
   &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`Offset: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`partition: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`consumerId: edm/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DYNO&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;localhost&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
   &lt;span class="nx"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sockets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;emit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;event&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
   &lt;span class="nx"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commitMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
 &lt;span class="p"&gt;})&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reference architecture is not just about buying coffee; it's a starting point for any web app where you want to track clicks and report in a real-time dashboard. It's open source, so feel free to experiment and modify it according to your own needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F8sqjtry1xa934awu6lmt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F8sqjtry1xa934awu6lmt.png" alt="kafka example implementation"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Stream processing not only decouples your components so that they are easy to build, test, deploy, and scale independently, but also adds yet another layer of decoupling by creating a “dumb” broker between your components.&lt;/p&gt;

&lt;h1&gt;
  
  
  Next Steps
&lt;/h1&gt;

&lt;p&gt;If you haven’t already, read our other articles in this series on the &lt;a href="https://dev.to/heroku/best-practices-for-event-driven-microservice-architecture-2lh7"&gt;advantages of event-driven architecture&lt;/a&gt; and &lt;a href="https://dev.to/heroku/scale-your-apps-with-an-easy-message-queue-on-redis-4glp"&gt;deploying a sample message queue using Redis and RSMQ&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Scale Your Apps with an Easy Message Queue on Redis</title>
      <dc:creator>Jason Skowronski</dc:creator>
      <pubDate>Mon, 09 Dec 2019 15:57:11 +0000</pubDate>
      <link>https://dev.to/heroku/scale-your-apps-with-an-easy-message-queue-on-redis-4glp</link>
      <guid>https://dev.to/heroku/scale-your-apps-with-an-easy-message-queue-on-redis-4glp</guid>
      <description>&lt;p&gt;If you’re a microservices developer considering communication protocols, choosing an event-driven architecture might just help you rest a little easier at night. With the right design, event-driven architecture can help you to create apps that are decoupled and asynchronous, giving you the major benefits of your app being both performant and easily scalable. &lt;/p&gt;

&lt;p&gt;We’ll create and deploy a simple, and quick to stand up, message queue using &lt;a href="http://heroku.com/" rel="noopener noreferrer"&gt;Heroku&lt;/a&gt;, &lt;a href="https://redis.io/" rel="noopener noreferrer"&gt;Redis&lt;/a&gt;, and &lt;a href="https://github.com/smrchy/rsmq" rel="noopener noreferrer"&gt;RSMQ&lt;/a&gt;. And we’ll look at how our system works, what it can do, and some advantages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Message Queues vs. Streams
&lt;/h2&gt;

&lt;p&gt;One of the first, and most important, decisions is whether to use message queues or streams. In message queues, a sender places a message targeted to a recipient into a queue. The message is held in the queue until the recipient retrieves it, at which time the message is deleted.&lt;/p&gt;

&lt;p&gt;Similarly, in streams, senders place messages into a stream and recipients listen for messages. However, messages in streams are not targeted to a certain recipient, but rather are available to any and all interested recipients. Recipients can even consume multiple messages at the same time, and can play back a series of messages through the streams history.&lt;/p&gt;

&lt;p&gt;If these are new concepts for you, learn more in our previous article on &lt;a href="https://dev.to/heroku/best-practices-for-event-driven-microservice-architecture-2lh7"&gt;best practices for event-driven architecutures&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Message Queues Are Helpful
&lt;/h2&gt;

&lt;p&gt;Message queues can be thought of as the original event-driven architecture. They drove the adoption of early event-driven designs and are still in use today. In these message queue designs, a client (or other component) traditionally creates a message when some action happens, then sends that message to a queue, targeted to a specific recipient. The recipient, which has been sitting idle waiting for work receives (or retrieves) the message from the queue, processes it, and does some unit of work. When the recipient is done with its work, it deletes the message from the queue.&lt;/p&gt;

&lt;p&gt;This traditional path is exactly what our example below will do. It’s a simple setup, but by placing a queue between the producer and consumer of the event, we introduce a level of decoupling that allows us to build, deploy, update, test, and scale those two components independently. This decoupling not only makes coding and dev ops easier (since our components can remain ignorant of one another), but also makes our app much easier to scale up and down. We also reduce the workload on the web dynos, which lets us respond back to clients faster, and allows our web dynos to process more requests per second. This isn't just good for the business, but it's great for user experience as well.&lt;/p&gt;

&lt;h2&gt;
  
  
  Our Example App
&lt;/h2&gt;

&lt;p&gt;Let's create a simple example app to demonstrate how a message queue works. We’ll create a system where users can submit a generic application through a website. This is a simple project you can use just to learn, as a real-world use case, or as a starting point for a more complicated project. We’re going to setup and deploy our simple yet powerful message queue using Heroku, Redis, Node.js, and RSMQ. This is a great stack that can get us to an event-driven architecture quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Heroku, Redis, and RSMQ—A Great Combination for Event-Driven
&lt;/h3&gt;

&lt;p&gt;&lt;a href="http://heroku.com/" rel="noopener noreferrer"&gt;Heroku&lt;/a&gt;, with its one-click deployments and “behind-the-scenes” scaling, and &lt;a href="https://redis.io/" rel="noopener noreferrer"&gt;Redis&lt;/a&gt;, an in-memory data store and message broker, are an excellent pair for quickly deploying systems that allow us to focus on business logic, not infrastructure. We can quickly and easily provision a Redis deployment (dyno) on Heroku that will scale as needed, and hides the implementation details we don’t want to worry about.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/smrchy/rsmq" rel="noopener noreferrer"&gt;RSMQ&lt;/a&gt; is an open-source simple message queue built on top of Redis that is easy to deploy. RSMQ has several nice features: it’s lightweight (just 500 lines of javascript), it’s fast (10,000+ messages per second), and it guarantees delivery of a message to just one recipient.&lt;/p&gt;

&lt;p&gt;We’ll also follow the “&lt;a href="https://devcenter.heroku.com/articles/background-jobs-queueing" rel="noopener noreferrer"&gt;Worker Dynos, Background Jobs, and Queuing&lt;/a&gt;” pattern, which is recommended by Heroku and will give us our desired decoupling and scalability. Using this pattern, we’ll deploy a web client (the browser in the below diagram) that handles the user input and sends requests to the backend, a server (web process) that runs the queue, and a set of workers (background service) that pull messages from the queue and do the actual work. We’ll deploy the client/server as a web dyno, and the worker as a worker dyno.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fbo06mzk11u5d65ggh6on.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fbo06mzk11u5d65ggh6on.png" alt="Worker Dynos, Background Jobs, and Queueing"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Let’s Get Started
&lt;/h3&gt;

&lt;p&gt;Once you’ve created your Heroku account and installed the Heroku CLI, you can create and deploy the project easily using the CLI. All of the source code needed to run this example &lt;a href="https://github.com/CapnMB/example-message-queue" rel="noopener noreferrer"&gt;is available on GitHub&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;git clone https://github.com/devspotlight/example-message-queue.git  
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;example-message-queue  
&lt;span class="nv"&gt;$ &lt;/span&gt;heroku create  
&lt;span class="nv"&gt;$ &lt;/span&gt;heroku addons:create heroku-redis  
&lt;span class="nv"&gt;$ &lt;/span&gt;git push heroku master  
&lt;span class="nv"&gt;$ &lt;/span&gt;heroku ps:scale &lt;span class="nv"&gt;worker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1  
&lt;span class="nv"&gt;$ &lt;/span&gt;heroku open
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you need help with this step, here a few good resources:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://devcenter.heroku.com/articles/getting-started-with-nodejs" rel="noopener noreferrer"&gt;Getting Started on Heroku with node.js&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[Using Redis with Heroku]((https:/elements.heroku.com/addons/heroku-redis)&lt;/p&gt;

&lt;h3&gt;
  
  
  System Overview
&lt;/h3&gt;

&lt;p&gt;Our system is made up of three pieces: the client web app, the server, and the worker. Because we are so cleanly decoupled, both the server and worker processes are easy to scale up and down as the need arises.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Client
&lt;/h3&gt;

&lt;p&gt;Our client web app is deployed as part of our web dyno. The UI isn’t really the focus of this article, so we’ve built just a simple page with one link. Clicking the link posts a generic message to the server.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F7lxrv65quqegw8fq2pts.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F7lxrv65quqegw8fq2pts.png" alt="Test queue"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our Simple Web UI&lt;/p&gt;

&lt;h3&gt;
  
  
  The Web Server
&lt;/h3&gt;

&lt;p&gt;The web server is a simple Express server that delivers the web client. It also creates the queue on startup (if the queue doesn’t already exist), receives new messages from the client, and adds new messages to the queue.&lt;/p&gt;

&lt;p&gt;Here is the key piece of code that configures the variables for the queue:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;rsmq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RedisSMQ&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;REDIS_HOST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;REDIS_PORT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;ns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NAMESPACE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;REDIS_PASSWORD&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and sets up the queue the first time the first server runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;rsmq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createQueue&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="na"&gt;qname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;QUEUENAME&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;queueExists&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;The queue exists. That's OK.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;queue created&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a client posts a message, the server adds it to the message queue like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/job&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sending message&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
   &lt;span class="nx"&gt;rsmq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;qname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;QUEUENAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Hello World at &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
   &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="p"&gt;});&lt;/span&gt;
   &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pushed new message into queue&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Worker
&lt;/h3&gt;

&lt;p&gt;The worker, which fittingly is deployed as a worker dyno, polls the queue for new messages, then pulls those new messages from the queue and processes them.&lt;/p&gt;

&lt;p&gt;We’ve chosen the simplest option here: The code reads the message, processes it, then manually deletes it from the queue. Note that there are more powerful options available in RSMQ, such as "pop”, which reads and deletes from the queue at the same time, and a “real-time” mode for pub/sub capabilities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;rsmq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;receiveMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;qname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;QUEUENAME&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Hey I got the message you sent me!&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="c1"&gt;// do lots of processing here&lt;/span&gt;
      &lt;span class="c1"&gt;// when we are done we can delete the message from the queue&lt;/span&gt;
      &lt;span class="nx"&gt;rsmq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deleteMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;qname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;QUEUENAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
         &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
         &lt;span class="p"&gt;}&lt;/span&gt;
         &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deleted message with id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;no message in queue&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We could easily fire up multiple workers by using Throng, if needed. &lt;a href="https://devcenter.heroku.com/articles/node-redis-workers" rel="noopener noreferrer"&gt;Here’s a good example of a similar setup&lt;/a&gt; as ours that uses this library.&lt;/p&gt;

&lt;p&gt;Note: When you deploy the worker dyno, be sure to scale the worker processes under the “Resources” tab in the Heroku Dashboard to at least one dyno so that your workers will run, if you haven’t already in the CLI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running the Example
&lt;/h2&gt;

&lt;p&gt;When we deploy and start our dynos, we see our server firing up, our queue being deployed, and our worker checking for new messages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fus4wrsyjmayzimjes4p8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fus4wrsyjmayzimjes4p8.png" alt="Worker"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And when we click our link on the client, you can see the server push the message onto the queue, and then the worker grab the message, process it, and delete it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fp7j5eu0jgb6983j7glog.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fp7j5eu0jgb6983j7glog.png" alt="Client"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We’ve built a quick-to-stand up, but powerful message queue with our example. We’ve built a system that separated our components so that they are unaware of one another, and are easy to build, test, deploy, and scale independently. This is a great start to a solid, event-driven architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;If you haven’t already, &lt;a href="https://github.com/devspotlight/example-message-queue" rel="noopener noreferrer"&gt;check out the code on Github&lt;/a&gt; and try it out yourself. &lt;/p&gt;

&lt;p&gt;Heroku also offers a great &lt;a href="https://devcenter.heroku.com/articles/event-driven-microservices-with-apache-kafka" rel="noopener noreferrer"&gt;event-driven reference architecture&lt;/a&gt;. You can get a running system in a single click, so it's another easy way to experiment and learn.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>node</category>
      <category>heroku</category>
    </item>
    <item>
      <title>Postgres Is Underrated—It Handles More than You Think</title>
      <dc:creator>Jason Skowronski</dc:creator>
      <pubDate>Wed, 09 Oct 2019 15:04:20 +0000</pubDate>
      <link>https://dev.to/heroku/postgres-is-underrated-it-handles-more-than-you-think-4ff3</link>
      <guid>https://dev.to/heroku/postgres-is-underrated-it-handles-more-than-you-think-4ff3</guid>
      <description>&lt;p&gt;Thinking about scaling beyond your Postgres cluster and adding another data store like Redis or Elasticsearch? Before adopting a more complex infrastructure, take a minute and think again. It’s quite possible to get more out of an existing Postgres database. It can scale for heavy loads and offers powerful features which are not obvious at first sight. For example, its possible to enable in-memory caching, text search, specialized indexing, and key-value storage.&lt;/p&gt;

&lt;p&gt;After reading this article, you may want to list down the features you want from your data store and check if Postgres will be a good fit for them. It’s powerful enough for most applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Adding Another Data Store is Not Always a Good Idea
&lt;/h2&gt;

&lt;p&gt;As Fred Brooks put it in &lt;em&gt;The Mythical Man-Month&lt;/em&gt;: "The programmer, like the poet, works only slightly removed from pure thought-stuff. [They] build castles in the air, from air, creating by exertion of the imagination."&lt;/p&gt;

&lt;p&gt;Adding more pieces to those castles, and getting lost in the design, is endlessly fascinating; however, in the real world, building more castles in the air can get in your way. The same holds true for the latest hype in data stores. There are several &lt;a href="http://boringtechnology.club/"&gt;advantages to choosing boring technology&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  If someone new joins your team, can they easily make sense of your different data stores?&lt;/li&gt;
&lt;li&gt;  When you or another team member come back a year later, could they quickly pick up how the system works?&lt;/li&gt;
&lt;li&gt;  If you need to change your system or add features, how many pieces do you have to move around?&lt;/li&gt;
&lt;li&gt;  Have you factored in maintenance costs, security, and upgrades?&lt;/li&gt;
&lt;li&gt;  Have you accounted for the unknowns and failure modes when running your new data store in production at scale?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Although it can be managed by thoughtful design, adding multiple datastores does increase complexity. Before exploring adding additional datastores, it's worth investigating what additional features your existing datastores can offer you. &lt;/p&gt;

&lt;h2&gt;
  
  
  Lesser-known but Powerful Features of Postgres
&lt;/h2&gt;

&lt;p&gt;Many people are unaware that Postgres offers way more than just a SQL database. If you already have Postgres in your stack, why add more pieces when Postgres can do the job?&lt;/p&gt;

&lt;h3&gt;
  
  
  Postgres caches, too
&lt;/h3&gt;

&lt;p&gt;There’s a misconception that Postgres reads and writes from disk on every query, especially when users compare it with purely in-memory data stores like Redis.&lt;/p&gt;

&lt;p&gt;Actually, Postgres has a beautifully designed caching system with pages, usage counts, and transaction logs. Most of your queries will not need to access the disk, especially if they refer to the same data over and over again, as many queries tend to do.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;shared_buffer&lt;/strong&gt; configuration parameter in the Postgres configuration file determines how much memory it will use for caching data. Typically it should be set to 25% to 40% of the total memory. That’s because Postgres also uses the operating system cache for its operation. With more memory, most recurring queries referring the same data set will not need to access the disk. Here is how you can set this parameter in the Postgres CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;SYSTEM&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;shared_buffer&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Managed database services like Heroku offer several &lt;a href="https://www.heroku.com/postgres"&gt;plans&lt;/a&gt; where RAM (and hence cache) is a major differentiator. The free hobby version does not offer dedicated resources like RAM. Upgrade when you’re ready for production loads so you can make better use of caching.&lt;/p&gt;

&lt;p&gt;You can also use some of the more advanced caching tools. For example, check the &lt;a href="https://www.postgresql.org/docs/current/pgbuffercache.html"&gt;pg_buffercache&lt;/a&gt; view to see what’s occupying the shared buffer cache of your instance. Another tool to use is the &lt;a href="https://www.postgresql.org/docs/current/pgprewarm.html"&gt;pg_prewarm&lt;/a&gt; function which comes as part of the base installation. This function enables DBAs to load table data into either the operating system cache or the Postgres buffer cache. The process can be manual or automated. If you know the nature of your database queries, this can greatly improve application performance.&lt;/p&gt;

&lt;p&gt;For the really brave at heart, &lt;a href="https://madusudanan.com/blog/understanding-postgres-caching-in-depth/"&gt;refer to this article&lt;/a&gt; for an in-depth description of Postgres caching.&lt;/p&gt;

&lt;h3&gt;
  
  
  Text searching
&lt;/h3&gt;

&lt;p&gt;Elasticsearch is excellent, but many use cases can get along just fine with Postgres for text searching. Postgres has a special data type, &lt;code&gt;&lt;a href="https://www.postgresql.org/docs/10/datatype-textsearch.html#DATATYPE-TSVECTOR"&gt;tsvector&lt;/a&gt;&lt;/code&gt;, and a set of functions, like &lt;code&gt;to_tsvector&lt;/code&gt; and &lt;code&gt;to_tsquery&lt;/code&gt;, to search quickly through text. &lt;code&gt;tsvector&lt;/code&gt; represents a document optimized for text search by sorting terms and normalizing variants. Here is an example of the &lt;code&gt;to_tsquery&lt;/code&gt; function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;to_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'The &amp;amp; Boys &amp;amp; Girls'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="n"&gt;to_tsquery&lt;/span&gt;   
&lt;span class="c1"&gt;---------------&lt;/span&gt;
 &lt;span class="s1"&gt;'boy'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="s1"&gt;'girl'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can sort your results by relevance depending on how often and which fields your query appeared in the results. For example, you can make the title more relevant than the body. Check the Postgres &lt;a href="https://www.postgresql.org/docs/11/textsearch-controls.html"&gt;documentation&lt;/a&gt; for details. &lt;/p&gt;

&lt;h3&gt;
  
  
  Functions in Postgres
&lt;/h3&gt;

&lt;p&gt;Postgres provides a powerful server-side function environment in multiple programming languages.&lt;/p&gt;

&lt;p&gt;Try to pre-process as much data as you can on the Postgres server with server-side functions.  That way, you can cut down on the latency that comes from passing too much data back and forth between your application servers and your database. This approach is particularly useful for large aggregations and joins.&lt;/p&gt;

&lt;p&gt;What’s even better is your development team can use its existing skill set for writing Postgres code. Other than the default PL/pgSQL (Postgres’ native procedural language), Postgres functions and triggers can be written in PL/Python, PL/Perl, PL/V8 (JavaScript extension for Postgres) and PL/R.&lt;/p&gt;

&lt;p&gt;Here is an example of creating a PL/Python function for checking string lengths:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;longer_string_length&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string1&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;string2&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
  &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpythonu&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Postgres offers powerful extensions
&lt;/h2&gt;

&lt;p&gt;Extensions are to Postgres what plug-ins mean in many applications. Suitable use of Postgres extensions can also mean you don’t have to work with other data stores for extra functionality. There are many extensions available and listed on the main &lt;a href="https://www.postgresql.org/docs/current/contrib.html"&gt;Postgres website&lt;/a&gt;. &lt;/p&gt;

&lt;h4&gt;
  
  
  Geospatial Data
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://postgis.net/"&gt;PostGIS&lt;/a&gt; is a specialized extension for Postgres used for geospatial data manipulation and running location queries in SQL. It’s widely popular among GIS application developers who use Postgres. A great beginner’s guide to using PostGIS can be found &lt;a href="https://medium.com/@tjukanov/why-should-you-care-about-postgis-a-gentle-introduction-to-spatial-databases-9eccd26bc42b"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The code snippet below shows how we are adding the PostGIS extension to the current database. From the OS, we run these commands to install the package (assuming you are using Ubuntu):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;add-apt-repository ppa:ubuntugis/ppa
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get update
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install &lt;/span&gt;postgis
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, log in to your Postgres instance and install the extension:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;postgis&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;postgis_topology&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to check what extensions you have in the current database, run this command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_available_extensions&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Key-Value Data Type
&lt;/h4&gt;

&lt;p&gt;The Postgres &lt;a href="https://www.postgresql.org/docs/current/hstore.html"&gt;hstore &lt;/a&gt;extension allows storing and searching simple key-value pairs. This &lt;a href="https://www.ibm.com/cloud/blog/new-builders/an-introduction-to-postgresqls-hstore"&gt;tutorial&lt;/a&gt; provides an excellent overview of how to work with hstore data type.&lt;/p&gt;

&lt;h4&gt;
  
  
  Semi-structured Data Types
&lt;/h4&gt;

&lt;p&gt;There are two native data types for storing semi-structured data in Postgres: &lt;a href="https://www.postgresql.org/docs/current/datatype-json.html"&gt;JSON&lt;/a&gt; and &lt;a href="https://www.postgresql.org/docs/current/datatype-xml.html"&gt;XML&lt;/a&gt;. The JSON data type can host both native JSON and its binary form (JSONB). The latter can significantly improve query performance when it is searched. As you can see below, it can convert JSON strings to native JSON objects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'{"product1": ["blue", "green"], "tags": {"price": 10, "discounted": false}}'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;json&lt;/span&gt;                       
&lt;span class="c1"&gt;---------------------------------------------------------------------&lt;/span&gt;
 &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;"product1"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;"blue"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"green"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nv"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"discounted"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tips for Scaling Postgres
&lt;/h2&gt;

&lt;p&gt;If you’re considering switching off Postgres due to performance reasons, first see how far you can get with the optimizations it offers. Here we'll assume you've done the basics, like creating appropriate indexes. Postgres offers many advanced features, and while the changes are small they can make a big difference, especially if it keeps you from complicating your infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Don’t over-index
&lt;/h3&gt;

&lt;p&gt;Avoid unnecessary indexes. Use multi-column indexes sparingly. Too many indexes take up extra memory that crowd out better uses of the Postgres cache, which is crucial for performance.&lt;/p&gt;

&lt;p&gt;Using a tool like &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; might surprise you by how often the query planer actually chooses sequential table scans. Since much of your table’s row data is already cached, oftentimes these elaborate indexes aren’t even used.&lt;/p&gt;

&lt;p&gt;That said, if you do find slow queries, the first and most obvious solution is to see if the table is missing an index. Indexes are vital, but you have to use them correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Partial indexes save space
&lt;/h3&gt;

&lt;p&gt;A partial index can save space by specifying which values get indexed. For example, you want to order by a user’s signup date, but only care about the users who have signed up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;user_signup_date&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;signup_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;is_signed_up&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Understanding Postgres index types
&lt;/h3&gt;

&lt;p&gt;Choosing the right index for your data can improve performance. Here are some common index types and when you should use each one. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://www.postgresql.org/docs/current/btree-intro.html"&gt;B-tree indexes&lt;/a&gt; 
B-tree indexes are balanced trees that are used to sort data efficiently. They’re the default if you use the &lt;code&gt;INDEX&lt;/code&gt; command. Most of the time, a B-tree index suffices. As you scale, inconsistencies can be a larger problem, so use the &lt;a href="https://www.postgresql.org/docs/11/amcheck.html"&gt;amcheck&lt;/a&gt; extension periodically. &lt;/li&gt;
&lt;li&gt;  &lt;a href="https://www.postgresql.org/docs/11/brin-intro.html"&gt;BRIN indexes&lt;/a&gt; 
A Block Range INdex (BRIN) can be used when your table is naturally already sorted by a column, and you need to sort by that column. For example, for a log table that was written sequentially, setting a BRIN index on the timestamp column lets the server know that the data is already sorted. &lt;/li&gt;
&lt;li&gt;  &lt;a href="https://www.postgresql.org/docs/11/bloom.html"&gt;Bloom filter index&lt;/a&gt; 
A bloom index is perfect for multi-column queries on big tables where you only need to test for equality. It uses a special mathematical structure called a bloom filter that’s based on probability and uses significantly less space.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt; &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;bloom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col3&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
 &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;col1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;col2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;col3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'x'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://www.postgresql.org/docs/11/textsearch-indexes.html"&gt;GIN and GiST indexes&lt;/a&gt; \
Use a GIN or GiST index for efficient indexes based on composite values like text, arrays, and JSON.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When Do You Need Another Data Store?
&lt;/h2&gt;

&lt;p&gt;There are legitimate cases for adding another datastore beyond Postgres.&lt;/p&gt;

&lt;h3&gt;
  
  
  Special data types
&lt;/h3&gt;

&lt;p&gt;Some data stores give you data types that you just can’t get on Postgres. For example, the linked list, bitmaps, and HyperLogLog functions in Redis are not available on Postgres.&lt;/p&gt;

&lt;p&gt;At a previous startup, we had to implement a frequency cap, which is a counter for unique users on a website based on session data (like cookies). There might be millions or tens of millions of users visiting a website. Frequency capping means you only show each user your ad once per day. &lt;/p&gt;

&lt;p&gt;Redis has a &lt;a href="https://redis.io/commands/pfcount"&gt;HyperLogLog data type&lt;/a&gt; that is perfect for a frequency cap. It approximates set membership with a very small error rate, in exchange for O(1) time and a very small memory footprint. &lt;code&gt;PFADD&lt;/code&gt; adds an element to a HyperLogLog set. It returns 1 if your element is not in the set already, and 0 if it is in the set.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;PFADD&lt;/span&gt; &lt;span class="n"&gt;user_ids&lt;/span&gt; &lt;span class="n"&gt;uid1&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;PFADD&lt;/span&gt; &lt;span class="n"&gt;user_ids&lt;/span&gt; &lt;span class="n"&gt;uid2&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;PFADD&lt;/span&gt; &lt;span class="n"&gt;user_ids&lt;/span&gt; &lt;span class="n"&gt;uid1&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Heavy real-time processing
&lt;/h3&gt;

&lt;p&gt;If you’re in a situation with many pub-sub events, jobs, and dozens of workers to coordinate, you may need a more specialized solution like Apache Kafka. LinkedIn engineers originally developed Kafka to handle new user events like clicks, invitations, and messages, and allow different workers to handle message passing and jobs to process the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Instant full-text searching
&lt;/h3&gt;

&lt;p&gt;If you have a real-time application under heavy load with more than ten searches going on at a time, and you need features like autocomplete, then you may benefit more from a specialized text solution like Elasticsearch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Redis, Elasticsearch, and Kafka are powerful, but sometimes adding them does more harm than good. You may be able to get the capabilities you need with Postgres by taking advantage of the lesser-known features we’ve covered here. Ensuring that you are getting the most out of Postgres can save you time and help you avoid added complexity and risks.  &lt;/p&gt;

&lt;p&gt;To save even more time and headaches, consider using a managed service like &lt;a href="https://www.heroku.com/postgres"&gt;Heroku Postgres&lt;/a&gt;. Scaling up is a simple matter of adding additional follower replicas, high availability can be turned on with a single click, and Heroku operates it for you. If you really need to expand beyond Postgres, the other data stores that we mentioned above, such as Redis, Apache Kafka and Elasticsearch, can all be easily provisioned on Heroku. Go ahead and build your castles in the air―but anchor them to a reliable foundation, so you can dream about a better product and customer experience.&lt;/p&gt;

&lt;p&gt;For more information on Postgres, listen to &lt;a href="https://softwareengineeringdaily.com/2019/05/06/cloud-database-workloads-with-jon-daniel/"&gt;Cloud Database Workloads with Jon Daniel&lt;/a&gt; on Software Engineering Daily.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>webdev</category>
      <category>devops</category>
      <category>database</category>
    </item>
    <item>
      <title>Best Practices for Event-Driven Microservice Architecture</title>
      <dc:creator>Jason Skowronski</dc:creator>
      <pubDate>Tue, 24 Sep 2019 15:08:11 +0000</pubDate>
      <link>https://dev.to/heroku/best-practices-for-event-driven-microservice-architecture-2lh7</link>
      <guid>https://dev.to/heroku/best-practices-for-event-driven-microservice-architecture-2lh7</guid>
      <description>&lt;p&gt;If you’re an enterprise architect, you’ve probably heard of and worked with a microservices architecture. And while you might have used REST as your service communications layer in the past, more and more projects are moving to an event-driven architecture. Let’s dive into the pros and cons of this popular architecture, some of the key design choices it entails, and common anti-patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Event-Driven Microservice Architecture?
&lt;/h2&gt;

&lt;p&gt;In event-driven architecture, when a service performs some piece of work that other services might be interested in, that service produces an event—a record of the performed action. Other services consume those events so that they can perform any of their own tasks needed as a result of the event. Unlike with REST, services that create requests do not need to know the details of the services consuming the requests.&lt;/p&gt;

&lt;p&gt;Here’s a simple example: When an order is placed on an ecommerce site, a single “order placed” event is produced and then consumed by several microservices:  &lt;/p&gt;

&lt;p&gt;1) the order service which could write an order record to the database&lt;br&gt;
2) the customer service which could create the customer record, and&lt;br&gt;
3) the payment service which could process the payment.&lt;/p&gt;

&lt;p&gt;Events can be published in a variety of ways. For example, they can be published to a queue that guarantees delivery of the event to the appropriate consumers, or they can be published to a “pub/sub” model stream that publishes the event and allows access to all interested parties. In either case, the producer publishes the event, and the consumer receives that event, reacting accordingly. Note that in some cases, these two actors can also be called the publisher (the producer) and the subscriber (the consumer).&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Use Event-Driven Architecture
&lt;/h2&gt;

&lt;p&gt;An event-driven architecture offers several advantages over REST, which include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Asynchronous – event-based architectures are asynchronous without blocking. This allows resources to move freely to the next task once their unit of work is complete, without worrying about what happened before or will happen next. They also allow events to be queued or buffered which prevents consumers from putting back pressure on producers or blocking them.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Loose Coupling – services don’t need (and shouldn’t have) knowledge of, or dependencies on other services. When using events, services operate independently, without knowledge of other services, including their implementation details and transport protocol. Services under an event model can be updated, tested, and deployed independently and more easily.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Easy Scaling – Since the services are decoupled under an event-driven architecture, and as services typically perform only one task, tracking down bottlenecks to a specific service, and scaling that service (and only that service) becomes easy.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Recovery support – An event-driven architecture with a queue can recover lost work by “replaying” events from the past. This can be valuable to prevent data loss when a consumer needs to recover.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course, event-driven architectures have drawbacks as well. They are easy to over-engineer by separating concerns that might be simpler when closely coupled; can require a significant upfront investment; and often result in additional complexity in infrastructure, service contracts or schemas, polyglot build systems, and dependency graphs.&lt;/p&gt;

&lt;p&gt;Perhaps the most significant drawback and challenge is data and transaction management. Because of their asynchronous nature, event-driven models must carefully handle inconsistent data between services, incompatible versions, watch for duplicate events, and typically do not support ACID transactions, instead supporting &lt;a href="https://en.wikipedia.org/wiki/Eventual_consistency" rel="noopener noreferrer"&gt;eventual consistency&lt;/a&gt; which can be more difficult to track or debug.&lt;/p&gt;

&lt;p&gt;Even with these drawbacks, an event-driven architecture is usually the better choice for enterprise-level microservice systems. The pros—scalable, loosely coupled, dev-ops friendly design—outweigh the cons.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use REST
&lt;/h2&gt;

&lt;p&gt;There are, however, times when a REST/web interface may still be preferable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  You need a synchronous request/reply interface&lt;/li&gt;
&lt;li&gt;  You need convenient support for strong transactions&lt;/li&gt;
&lt;li&gt;  Your API is available to the public&lt;/li&gt;
&lt;li&gt;  Your project is small (REST is much simpler to set up and deploy)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Your Most Important Design Choice – Messaging Framework
&lt;/h2&gt;

&lt;p&gt;Once you’ve decided on an event-driven architecture, it is time to choose your event framework. The way your events are produced and consumed is a key factor in your system. Dozens of proven frameworks and choices exist and choosing the right one takes time and research.&lt;/p&gt;

&lt;p&gt;Your basic choice comes down to message processing or stream processing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Message Processing
&lt;/h3&gt;

&lt;p&gt;In traditional message processing, a component creates a message then sends it to a specific (and typically single) destination. The receiving component, which has been sitting idle and waiting, receives the message and acts accordingly. Typically, when the message arrives, the receiving component performs a single process. Then, the message is deleted.&lt;/p&gt;

&lt;p&gt;A typical example of a message processing architecture is a Message Queue. Though most newer projects use stream processing (as described below), architectures using message (or event) queues are still popular. Message queues typically use a “store and forward” system of brokers where events travel from broker to broker until they reach the appropriate consumer. &lt;a href="https://activemq.apache.org/" rel="noopener noreferrer"&gt;ActiveMQ&lt;/a&gt; and &lt;a href="https://www.rabbitmq.com/" rel="noopener noreferrer"&gt;RabbitMQ&lt;/a&gt; are two popular examples of message queue frameworks. Both of these projects have years of proven use and established communities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stream Processing
&lt;/h3&gt;

&lt;p&gt;On the other hand, in stream processing, components emit events when they reach a certain state. Other interested components listen for these events on the event stream and react accordingly. Events are not targeted to a certain recipient, but rather are available to all interested components.&lt;/p&gt;

&lt;p&gt;In stream processing, components can react to multiple events at the same time, and apply complex operations on multiple streams and events. Some streams include persistence where events stay on the stream for as long as necessary.&lt;/p&gt;

&lt;p&gt;With stream processing, a system can reproduce a history of events, come online after the event occurred and still react to it, and even perform sliding window computations. For example, it could calculate the average CPU usage per minute from a stream of per-second events.&lt;/p&gt;

&lt;p&gt;One of the most popular stream processing frameworks is &lt;a href="https://kafka.apache.org/" rel="noopener noreferrer"&gt;Apache Kafka&lt;/a&gt;. Kafka is a mature and stable solution used by many projects. It can be considered a go-to, industrial-strength stream processing solution. Kafka has a large userbase, a helpful community, and an evolved toolset.&lt;/p&gt;

&lt;h3&gt;
  
  
  Other Choices
&lt;/h3&gt;

&lt;p&gt;There are other frameworks that offer either a combination of stream and message processing or their own unique solution. For example, &lt;a href="https://pulsar.apache.org/" rel="noopener noreferrer"&gt;Pulsar&lt;/a&gt;,  a newer offering from Apache, is an open-source pub/sub messaging system that supports both streams and event queues, all with extremely high performance. Pulsar is feature-rich—it offers multi-tenancy and geo-replication—and accordingly complex. It’s been said that Kafka aims for high throughput, while Pulsar aims for low latency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nats.io/" rel="noopener noreferrer"&gt;NATS&lt;/a&gt; is an alternative pub/sub messaging system with “synthetic” queueing. NATS is designed for sending small, frequent messages. It offers both high performance and low latency. However, NATS considers some level of data loss to be acceptable, prioritizing performance over delivery guarantees.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other Design Considerations
&lt;/h2&gt;

&lt;p&gt;Once you’ve selected your event framework, here are several other challenges to consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Event Sourcing&lt;/p&gt;

&lt;p&gt;It is difficult to implement a combination of loosely-coupled services, distinct data stores, and atomic transactions. One pattern that may help is &lt;a href="https://martinfowler.com/eaaDev/EventSourcing.html" rel="noopener noreferrer"&gt;Event Sourcing&lt;/a&gt;. In Event Sourcing, updates and deletes are never performed directly on the data; rather, state changes of an entity are saved as a series of events.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CQRS&lt;/p&gt;

&lt;p&gt;The above event sourcing introduces another issue: Since state needs to be built from a series of events, queries can be slow and complex. Command Query Responsibility Segregation (&lt;a href="https://www.martinfowler.com/bliki/CQRS.html" rel="noopener noreferrer"&gt;CQRS&lt;/a&gt;) is a design solution that calls for separate models for insert operations and read operations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Discovering Event Information&lt;/p&gt;

&lt;p&gt;One of the greatest challenges in event-driven architecture is cataloging services and events. Where do you find event descriptions and details? What is the reason for an event? What team created the event? Are they actively working on it?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Dealing with Change&lt;/p&gt;

&lt;p&gt;Will an event schema change? How do you change an event schema without breaking other services? How you answer these questions becomes critical as your number of services and events grows.&lt;br&gt;&lt;br&gt;
Being a good event consumer means coding for schemas that change. Being a good event producer means being cognizant of how your schema changes impact other services and creating well-designed events that are documented clearly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;On Premise vs Hosted Deployment&lt;/p&gt;

&lt;p&gt;Regardless of your event framework, you’ll also need to decide between deploying the framework yourself on premise (message brokers are not trivial to operate, especially with high availability), or using a hosted service such as &lt;a href="https://www.heroku.com/kafka" rel="noopener noreferrer"&gt;Apache Kafka on Heroku&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Anti-Patterns
&lt;/h2&gt;

&lt;p&gt;As with most architectures, an event-driven architecture comes with its own set of anti-patterns. Here are a few to watch out for.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Too much of a good thing&lt;/p&gt;

&lt;p&gt;Be careful you don’t get too excited about creating events. Creating too many events will create unnecessary complexity between the services, increase cognitive load for developers, make deployment and testing more difficult, and cause congestion for event consumers. Not every method needs to be an event.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Generic events&lt;/p&gt;

&lt;p&gt;Don’t use generic events, either in name or in purpose. You want other teams to understand why your event exists, what it should be used for, and when it should be used. Events should have a specific purpose and be named accordingly. Events with generic names, or generic events with confusing flags, cause issues.  &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Complex dependency graphs&lt;/p&gt;

&lt;p&gt;Watch out for services that depend on one another and create complex dependency graphs or feedback loops. Each network hop adds additional latency to the original request, particularly north/south network traffic that leaves the datacenter.  &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Depending on guaranteed order, delivery, or side effects&lt;/p&gt;

&lt;p&gt;Events are asynchronous; therefore, including assumptions of order or duplicates will not only add complexity but will negate many of the key benefits of event-based architecture. If your consumer has side effects, such as adding a value in a database, then you may be unable to recover by replaying events.  &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Premature optimization&lt;/p&gt;

&lt;p&gt;Most products start off small and grow over time. While you may dream of future needs to scale to a large complex organization, if your team is small then the added complexity of event-driven architectures may actually slow you down. Instead, consider designing your system with a simple architecture but include the necessary separation of concerns so that you can swap it out as your needs grow.  &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Expecting event-driven to fix everything  &lt;/p&gt;

&lt;p&gt;On a less technical level, don’t expect event-driven architecture to fix all your problems. While this architecture can certainly improve many areas of technical dysfunction, it can’t fix core problems such as a lack of automated testing, poor team communication, or outdated dev-ops practices.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Learn More
&lt;/h2&gt;

&lt;p&gt;Understanding the pros and cons of event-driven architectures, and some of their most common design decisions and challenges is an important part of creating the best design possible.&lt;/p&gt;

&lt;p&gt;If you want to learn more, check out this &lt;a href="https://devcenter.heroku.com/articles/event-driven-microservices-with-apache-kafka" rel="noopener noreferrer"&gt;event-driven reference architecture&lt;/a&gt;, which allows you to deploy a working project on Heroku with a single click. This reference architecture creates a web store selling fictional coffee products.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fsq0d80olj7mc20cxtlmw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fsq0d80olj7mc20cxtlmw.png" alt="Curated Cofee"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Product clicks are tracked as events and stored in Kafka. Then, they are consumed by a reporting dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fl59ja46lwxonkxeoxkfo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fl59ja46lwxonkxeoxkfo.png" alt="Button Clicks"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The code is open source so you can modify it according to your needs and run your own experiments.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>beginners</category>
      <category>webdev</category>
      <category>devops</category>
    </item>
    <item>
      <title>PaaS versus Serverless: Which to choose in 2019?</title>
      <dc:creator>Jason Skowronski</dc:creator>
      <pubDate>Wed, 11 Sep 2019 15:00:18 +0000</pubDate>
      <link>https://dev.to/heroku/paas-versus-serverless-which-to-choose-in-2019-5fep</link>
      <guid>https://dev.to/heroku/paas-versus-serverless-which-to-choose-in-2019-5fep</guid>
      <description>&lt;p&gt;Serverless computing is increasingly popular, so when building a new application in 2019 should you go serverless or stick with PaaS? AWS started the serverless movement in 2014 with the introduction of its AWS Lambda service. Back then it felt as revolutionary as when Steve Jobs  introduced the  first iPhone. The main idea was to completely get rid of infrastructure management. You just need to write a small piece of code, upload it and cloud will take care of the rest. It felt like PaaS 2.0. The bigger, better, more advanced version of PaaS.&lt;/p&gt;

&lt;p&gt;PaaS sought to make infrastructure easier to manage, so developers can focus on working within web frameworks rather than wasting time in dealing with the underlying infrastructure. The goal is to simplify the process of deployment and operation. At the same time, they still offer access to the infrastructure, which provides a balance between automation and flexibility to configure your servers. For example, Heroku today makes it as easy as a 1-line command to deploy, manage, and scale server apps.&lt;/p&gt;

&lt;p&gt;So which approach should we choose for building apps today? Should you make the switch to serverless? The first step is to look at all our options objectively, evaluate them for our specific situation, and make a reasoned choice. Both can solve basic development needs: delivering functionality quickly and reliably. Understanding the technical differences will help you determine which approach is best for a particular project.&lt;/p&gt;

&lt;p&gt;There are many serverless and PaaS platforms to use for our comparison. For serverless, popular options include AWS Lambda, Google Cloud Functions, Azure Functions and OpenWhisk. On the PaaS side we have Heroku, AWS Elastic Beanstalk, Google AppEngine and more. &lt;/p&gt;

&lt;p&gt;To simplify our comparison, let’s focus on AWS Lambda for serverless and Heroku for PaaS as prototypical examples. We'll try be as fair as possible in our comparison.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advantages of Serverless with Lambda
&lt;/h2&gt;

&lt;p&gt;Serverless offers some really great advantages. You don’t need to manage infrastructure or app servers, so you can focus on just coding functions. It also bills per-call basis rather than a per-hour basis.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;There’s no infrastructure to manage.&lt;/strong&gt;&lt;br&gt;
Just write a small piece of code, upload it to the cloud and the cloud service will do everything else. No more server setup and management. No more scaling and load balancing problems. No more troubleshooting servers and network. Sounds like a dream! While a PaaS solves some of these issues, you may still need to think about managing dynos, geo-distribution, fault tolerance and scaling.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Increase development velocity.&lt;/strong&gt;&lt;br&gt;
There is no need to write code or use frameworks for handling HTTP, parsing JSON and so on. You just need to write pure business logic in a function. Lambda will do everything rest for you. With Heroku, you still need to write code to implement your application server.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pay only for resources that are actually used.&lt;/strong&gt;&lt;br&gt;
With any PaaS provider, you are paying for the availability of specific compute resources, such as CPU and RAM, whether or not they are in use. With Lambda you are paying ONLY when you are actually using the resources. Using Lambda you will pay for a number of invocations, for the amount of consumed resources, and for execution time. This billing model itself is pushing developers to write compact and efficient code. With Heroku, you are billed for running dynos, even when they are not used. You can scale down unneeded dynos, but you must run at least one to have a functioning app.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integrate with other AWS services.&lt;/strong&gt;&lt;br&gt;
Lambda is very well integrated with many other AWS services. It’s easier to use if all your infrastructure is running on AWS. You may even be forced to use Lambda as a way to integrate AWS services with external services, such as forwarding Cloudwatch events to a monitoring service.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compute at the edge can lower average latency.&lt;/strong&gt;&lt;br&gt;
There is a completely different service from AWS called &lt;a href="mailto:Lambda@Edge"&gt;Lambda@Edge&lt;/a&gt;. The idea is to run custom JavaScript code on a CDN node close to the end user. It can be used to achieve lower latency for the client.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Advantages of PaaS with Heroku
&lt;/h2&gt;

&lt;p&gt;PaaS platforms offer an opinionated and standard infrastructure that makes them developer friendly. They simplified the management of the underlying infrastructure while still providing access to it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Simple to use as part of regular developer workflow.&lt;/strong&gt;&lt;br&gt;
Getting your app running on Heroku requires fewer steps, and could be as simple as a &lt;a href="https://devcenter.heroku.com/articles/git"&gt;Git push&lt;/a&gt;. As your app matures, Heroku also integrates other services in a convenient package like &lt;a href="https://www.heroku.com/continuous-delivery"&gt;continuous integration and review apps&lt;/a&gt;. With Lambda, you have to package your code into a zip, upload it, and configure triggers and permissions. You are not able to expose your app to the internet without configuring API Gateway.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unlimited HTTP requests for the same price.&lt;/strong&gt;&lt;br&gt;
Heroku doesn’t bill per request, so it’s more predictable, and a single dyno can handle thousands of requests per second, depending on your code. With Lambda you are forced to use API Gateway to expose it via HTTP. It can be expensive because you are paying for every request, on top of your Lambda request and bandwidth charges. At the time of writing this post, API Gateway alone costs &lt;a href="https://aws.amazon.com/api-gateway/pricing/"&gt;$3.50 per 1M requests&lt;/a&gt;. That may not seem expensive at first glance. But don’t forget that we live in a world where an app or a service can go viral and become hugely popular overnight. The question is, will you be able to pay the AWS bill after that?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fewer long tail latency issues.&lt;/strong&gt;&lt;br&gt;
Heroku dynos run continuously and are ready for requests at any moment. You can even use the &lt;a href="https://devcenter.heroku.com/articles/preboot"&gt;preboot&lt;/a&gt; feature to ensure the dyno is ready to receive traffic before routing to it. One of the biggest potential issues with Lambda is long tail latency. When Lambda is triggered, it will spin up a Firecracker-based microVM and run your code there. This will take some time to start, so the first response will be longer. All subsequent requests that land on that existing Lambda instance will be processed without a cold start and the associated latency. In my experience, for Java-based Lambda, the cold start can be around 10 seconds. This shows how JVM is a bad choice for this use case. With Go, the cold start in my use case got down to 1.5 seconds — a pretty significant improvement, but it’s still long.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You can run any code.&lt;/strong&gt;&lt;br&gt;
Heroku lets you deploy, run, and manage applications written in Ruby, Node.js, Java, Python, Clojure, Scala, Go, and PHP. You can even run Docker containers with any image you choose. Lambda, on the other hand, supports only a limited number of runtimes. AWS tried to address this during last year’s re:Invent with the &lt;a href="https://aws.amazon.com/about-aws/whats-new/2018/11/aws-lambda-now-supports-custom-runtimes-and-layers/"&gt;announcement of the Runtime API&lt;/a&gt;. Now, you can build a Linux-compatible binary in the programing language of your choice and run it on Lambda. However, not all programming languages allow you to do that.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;There’s no need to rewrite existing apps.&lt;/strong&gt;&lt;br&gt;
If your application is already using an application server, you usually don’t need to make many changes in order to run on Heroku. On the other hand, it may require more effort to port existing apps to Lambda — particularly those not written as serverless functions originally. That’s also assuming that Lambda supports the necessary runtime. With legacy applications, this can be a significant limitation.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You have longer execution time.&lt;/strong&gt;&lt;br&gt;
The current &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/limits.html"&gt;limit for Lambda&lt;/a&gt; execution is 15 minutes. That’s enough for many tasks, but not for all. Some specific tasks, for example, from an ETL domain, may require longer execution time than is available on Lambda. With Heroku, you can use a worker dyno to run tasks for many hours at a time. Nevertheless, it’s best practice to design your dynos to be &lt;a href="https://12factor.net/"&gt;stateless&lt;/a&gt;, so consider using a job queue and saving your work periodically.  &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What to choose when?
&lt;/h2&gt;

&lt;p&gt;In summary, Lambda can be a great solution when you've got a bit of code you just need to run with as little overhead as possible, for task-based background work (if you can fit it into Lambda limits), infrequently used functions, and instances where long tail latency is not a problem. It can also be the right tool if you need very fast average latency at the edge. You can easily use it for gluing AWS services together with some custom logic and for building infrastructure or automation tools. Think about tasks like decoding small videos, resizing pictures, or processing AWS events as a good fit for Lambda.&lt;/p&gt;

&lt;p&gt;On the other hand, Heroku is a good fit for new web applications because it integrates many common parts of the development lifecycle in a more convenient package, without requiring you to configure multiple services. Doing a Git push and getting a working app after a couple of minutes is a breeze. It's also better for compute operations that take longer than a few minutes or for frequently called functions. Visit the &lt;a href="https://www.heroku.com/platform"&gt;Heroku Platform description&lt;/a&gt; to learn more about how Heroku works.&lt;/p&gt;

&lt;p&gt;What is your opinion on which to choose in 2019? Let us know in the comments below.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>serverless</category>
      <category>heroku</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Identifying trolls and bots on Reddit with machine learning (Part 2)</title>
      <dc:creator>Jason Skowronski</dc:creator>
      <pubDate>Fri, 09 Aug 2019 15:02:43 +0000</pubDate>
      <link>https://dev.to/heroku/identifying-trolls-and-bots-on-reddit-with-machine-learning-part-2-2p0p</link>
      <guid>https://dev.to/heroku/identifying-trolls-and-bots-on-reddit-with-machine-learning-part-2-2p0p</guid>
      <description>&lt;p&gt;Trolls and bots are widespread across social media, and they influence us in ways we are not always aware of. Trolls can be relatively harmless, just trying to entertain themselves at others’ expense, but they can also be political actors sowing mistrust or discord. While some bots offer helpful information, others can be used to manipulate vote counts and promote content that supports their agenda. Bot problems are expected to grow more acute as machine learning technologies mature. &lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/heroku/trolls-and-bots-are-disrupting-social-media-here-s-how-ai-can-stop-them-part-1-55df"&gt;the first part&lt;/a&gt; of this two part series, we covered how to collect comment data from Reddit in bulk and build a dashboard to moderate suspected trolls and bots. In this second part, we’ll show you how we used machine learning to detect bots and trolls using Python and scikit-learn. We’ll then create an API using Flask to say whether comments on Reddit are likely to be bots or trolls for use in our moderator dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background on troll and bot detection
&lt;/h2&gt;

&lt;p&gt;Troll and bot detection is a relatively new field. Historically, companies have employed human moderators to detect and remove content that’s inconsistent their terms of service. However, this manual process is expensive, plus it can be emotionally tiring for humans to review the worst content. We will quickly hit the limits of human moderator efficaciousness as new technologies like &lt;a href="https://openai.com/blog/better-language-models/" rel="noopener noreferrer"&gt;OpenAI GPT-2&lt;/a&gt; natural language generation are unleashed. As bots improve, it is important to employ counter technologies to protect the integrity of online communities.&lt;/p&gt;

&lt;p&gt;There’ve been several studies done on the topic of bot detection. For example, one researcher found competing &lt;a href="https://www.oreilly.com/ideas/identifying-viral-bots-and-cyborgs-in-social-media" rel="noopener noreferrer"&gt;pro-Trump and anti-Trump bots on Twitter&lt;/a&gt;. Researchers at Indiana University have provided a tool to check Twitter users called &lt;a href="https://botometer.iuni.iu.edu/#!/" rel="noopener noreferrer"&gt;botornot&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There’s also been interesting research on online trolls. Research from Stanford has shown that just &lt;a href="https://snap.stanford.edu/conflict/" rel="noopener noreferrer"&gt;1% of accounts create 74% of conflict&lt;/a&gt;. &lt;a href="https://www.cc.gatech.edu/%7Eeshwar3/uploads/3/8/0/4/38043045/eshwar-norms-cscw2018.pdf" rel="noopener noreferrer"&gt;Researchers at Georgia Tech&lt;/a&gt; used a natural language processing model to identify users who violate norms with behavior like making personal attacks, misogynistic slurs, or even mansplaining.&lt;/p&gt;

&lt;h2&gt;
  
  
  Screening comments for moderation
&lt;/h2&gt;

&lt;p&gt;Our goal is to create a machine learning model to screen comments on the politics subreddit for moderators to review. It doesn't need to have perfect accuracy since the comments will be reviewed by a human moderator. Instead, our measure of success is how much more efficient we can make human moderators. Rather than needing to review every comment, they will be able to review a prescreened subset. We are not trying to replace the existing moderation system that Reddit provides, which allows moderators to review comments that have been reported by users. Instead, this is an additional source of information that can complement the existing system. &lt;/p&gt;

&lt;p&gt;As described in our part one article, we have created a dashboard allowing moderators to review the comments. The machine learning model will score each comment as being a normal user, a bot, or a troll.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fu65an5zd9i3t2tbgh6p4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fu65an5zd9i3t2tbgh6p4.png" title="Reddit Bot and troll dashboard" alt="Reddit Bot and troll dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Try it out for yourself at &lt;a href="https://reddit-dashboard.herokuapp.com/" rel="noopener noreferrer"&gt;reddit-dashboard.herokuapp.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To set your expectations, our system is designed as a proof of concept. It’s not meant to be a production system and is not 100% accurate. We’ll use it to illustrate the steps involved in building a system, with the hopes that platform providers will be able to offer official tools like these in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Collecting training data
&lt;/h2&gt;

&lt;p&gt;Our initial training dataset was collected from a list of known bots and trolls. We’ll use two lists of these &lt;a href="https://www.reddit.com/r/autowikibot/wiki/redditbots" rel="noopener noreferrer"&gt;393 known bots&lt;/a&gt; plus &lt;a href="https://www.reddit.com/r/botwatch/comments/1wg6f6/bot_list_i_built_a_bot_to_find_other_bots_so_far/cf1nu8p/" rel="noopener noreferrer"&gt;167 more&lt;/a&gt; from the botwatch subreddit. We’ll also use a list of 944 troll accounts from &lt;a href="https://www.reddit.com/r/announcements/comments/8bb85p/reddits_2017_transparency_report_and_suspect/" rel="noopener noreferrer"&gt;Reddit’s 2017 Transparency Report&lt;/a&gt; that were suspected of working for the Russian Internet Research Agency. &lt;/p&gt;

&lt;p&gt;We are using an event-driven architecture that consists of a process that downloads data from Reddit and pushes it in a Kafka queue. We then have a Kafka consumer that writes the data into a Redshift data warehouse in batches. We wrote a &lt;a href="https://github.com/devspotlight/Reddit-Kafka-Producer/blob/master/kafka-export.js" rel="noopener noreferrer"&gt;Kafka producer application&lt;/a&gt; to download the comments from the list of bots and trolls. As a result, our data warehouse contains not only the data from the known bots and trolls, but also real-time comments from the politics subreddit. &lt;/p&gt;

&lt;p&gt;While Reddit comments aren’t exactly private, you may have data that is private. For example, you may have data that’s regulated by HIPAA or PCI, or is sensitive to your business or customers. We followed a &lt;a href="https://devcenter.heroku.com/articles/peering-aws-rds-aws-redshift-with-heroku" rel="noopener noreferrer"&gt;Heroku reference architecture&lt;/a&gt; that was designed to protect private data. It provides a Terraform script to automatically configure a Redshift data warehouse and connect it to a Heroku Private Space. As a result, only apps running in the Private Space can access the data.&lt;/p&gt;

&lt;p&gt;We can either train our model on a dyno directly or run a one-off dyno to download the data to CSV and train the model locally. We’ll choose the latter for simplicity, but you’d want to keep sensitive data in the Private Space.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;heroku run bash &lt;span class="nt"&gt;-a&lt;/span&gt; kafka-stream-viz-jorge
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PGPASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;password&amp;gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"select * from reddit_comments"&lt;/span&gt; | psql &lt;span class="nt"&gt;-h&lt;/span&gt; tf-jorge-tf-redshift-cluster.coguuscncu3p.us-east-1.redshift.amazonaws.com &lt;span class="nt"&gt;-U&lt;/span&gt; jorge &lt;span class="nt"&gt;-d&lt;/span&gt; redshift_jorge &lt;span class="nt"&gt;-p&lt;/span&gt; 5439 &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; reddit.csv
&lt;span class="nb"&gt;gzip &lt;/span&gt;reddit.csv
curl &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"file=@reddit.csv.gz"&lt;/span&gt; https://file.io
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you prefer to use our training data to try it out yourself, you can download our &lt;a href="https://drive.google.com/file/d/1FDvHMLbJ8mXlsiiNnLgFCV6Yom1m_xbU/view?usp=sharing" rel="noopener noreferrer"&gt;CSV&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now we have comments from both sets of users and count a total of 93,668. The ratios between the classes are fixed at 5% trolls, 10% bots, and 85% normal. This is useful for training but likely underestimates the true percentage of normal users.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F2f8eub72hrsb8gugy1z6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F2f8eub72hrsb8gugy1z6.png" title="Trolls bots ratio" alt="Trolls bots ratio"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Selecting features
&lt;/h2&gt;

&lt;p&gt;Next, we need to select features to build our model. Reddit provides dozens of JSON fields for each user and comment. Some don’t have meaningful values. For example, &lt;code&gt;banned_by&lt;/code&gt; was null in every case, probably because we lack moderator permissions. We picked the fields below because we thought they’d be valuable as predictors or to understand how well our model performs. We added the column &lt;code&gt;recent_comments&lt;/code&gt; with an array of the last 20 comments made by that user.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  no_follow&lt;/li&gt;
&lt;li&gt;  link_id&lt;/li&gt;
&lt;li&gt;  gilded&lt;/li&gt;
&lt;li&gt;  author&lt;/li&gt;
&lt;li&gt;  author_verified&lt;/li&gt;
&lt;li&gt;  author_comment_karma&lt;/li&gt;
&lt;li&gt;  author_link_karma&lt;/li&gt;
&lt;li&gt;  num_comments&lt;/li&gt;
&lt;li&gt;  created_utc&lt;/li&gt;
&lt;li&gt;  score&lt;/li&gt;
&lt;li&gt;  over_18&lt;/li&gt;
&lt;li&gt;  body&lt;/li&gt;
&lt;li&gt;  is_submitter&lt;/li&gt;
&lt;li&gt;  controversiality&lt;/li&gt;
&lt;li&gt;  ups&lt;/li&gt;
&lt;li&gt;  is_bot&lt;/li&gt;
&lt;li&gt;  is_troll&lt;/li&gt;
&lt;li&gt;  recent_comments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some fields like “score” are useful for historical comments, but not for a real-time dashboard because users won’t have had time to vote on that comment yet.&lt;/p&gt;

&lt;p&gt;We added additional calculated fields that we thought would correlate well with bots and trolls. For example, we suspected that a user’s recent comment history would provide valuable insight into whether they are a bot or troll. For example, if a user repeatedly posts controversial comments with a negative sentiment, perhaps they are a troll. Likewise, if a user repeatedly posts comments with the same text, perhaps they are a bot. We used the &lt;a href="https://textblob.readthedocs.io/en/dev/" rel="noopener noreferrer"&gt;TextBlob&lt;/a&gt; package to calculate numerical values for each of these. We’ll see whether these features are useful in practice soon.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  recent_num_comments&lt;/li&gt;
&lt;li&gt;  recent_num_last_30_days&lt;/li&gt;
&lt;li&gt;  recent_avg_no_follow&lt;/li&gt;
&lt;li&gt;  recent_avg_gilded&lt;/li&gt;
&lt;li&gt;  recent_avg_responses&lt;/li&gt;
&lt;li&gt;  recent_percent_neg_score&lt;/li&gt;
&lt;li&gt;  recent_avg_score&lt;/li&gt;
&lt;li&gt;  recent_min_score&lt;/li&gt;
&lt;li&gt;  recent_avg_controversiality&lt;/li&gt;
&lt;li&gt;  recent_avg_ups&lt;/li&gt;
&lt;li&gt;  recent_avg_diff_ratio&lt;/li&gt;
&lt;li&gt;  recent_max_diff_ratio&lt;/li&gt;
&lt;li&gt;  recent_avg_sentiment_polarity&lt;/li&gt;
&lt;li&gt;  recent_min_sentiment_polarity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more information on what these fields are and how they are calculated, see the code in our Jupyter Notebooks in &lt;a href="https://github.com/devspotlight/botidentification" rel="noopener noreferrer"&gt;https://github.com/devspotlight/botidentification&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Building a machine learning model
&lt;/h2&gt;

&lt;p&gt;Our next step is to create a new machine learning model based on this list. We’ll use Python’s excellent &lt;a href="https://scikit-learn.org" rel="noopener noreferrer"&gt;scikit learn&lt;/a&gt; framework to build our model. We’ll store our training data into two data frames: one for the set of features to train in and the second with the desired class labels. We’ll then split our dataset into 70% training data and 30% test data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;input_x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we’ll create a &lt;a href="https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html" rel="noopener noreferrer"&gt;decision tree classifier&lt;/a&gt; to predict whether each comment is a bot, a troll, or a normal user. We’ll use a decision tree because the created rule is very easy to understand. The accuracy would probably be improved using a more robust algorithm like a random forest, but we’ll stick to a decision tree for the purposes of keeping our example simple.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="n"&gt;class_weight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;normal&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bot&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;2.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;troll&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
         &lt;span class="n"&gt;min_samples_leaf&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll notice a few parameters in the above code sample. We are setting the maximum depth of the tree to 3 not only to avoid overfitting, but also so that it’s easier to visualize the resulting tree. We are also setting the class weights so that bots and trolls are less likely to be missed, even at the expense of falsely labeling a normal user. Lastly, we are requiring that the leaf nodes have at least 100 samples to keep our tree simpler. &lt;/p&gt;

&lt;p&gt;Now we’ll test the model against the 30% of data we held out as a test set. This will tell us how well our model performs at guessing whether each comment is from a bot, troll, or normal user.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;crosstab&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rownames&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;True&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="n"&gt;colnames&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Predicted&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;margins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will create a &lt;a href="https://en.wikipedia.org/wiki/Confusion_matrix" rel="noopener noreferrer"&gt;confusion matrix&lt;/a&gt; showing, for each true target label, how many of the comments were predicted correctly or incorrectly. For example, we can see below that out of 1,956 total troll comments, we correctly predicted 1,451 of them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Predicted    bot        normal    troll        All
True                                 
bot          3677       585       33           4295
normal       197        20593     993          21783
troll        5          500       1451         1956
All          3879       21678     2477         28034
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In other words, the recall for trolls is 74%. The precision is lower; of all comments predicted as being a troll, only 58% really are.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Recall : [0.85611176 0.94537024 0.74182004]
Precision: [0.94792472 0.94994926 0.58578926]
Accuracy: 0.917493044160662
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can calculate the overall accuracy at 91.7%. The model performed the best for normal users, with about 95% precision and recall. It performed fairly well for bots, but had a harder time distinguishing trolls from normal users. Overall, the results look fairly strong even with a fairly simple model. &lt;/p&gt;

&lt;h2&gt;
  
  
  What does the model tell us?
&lt;/h2&gt;

&lt;p&gt;Now that we have this great machine learning model that can predict bots and trolls, how does it work and what can we learn from it? A great start is to look at which features were most important.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;feature_imp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature_importances_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;target&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;recent_avg_diff_ratio&lt;/span&gt;           &lt;span class="mf"&gt;0.465169&lt;/span&gt;
&lt;span class="n"&gt;author_comment_karma&lt;/span&gt;            &lt;span class="mf"&gt;0.329354&lt;/span&gt;
&lt;span class="n"&gt;author_link_karma&lt;/span&gt;               &lt;span class="mf"&gt;0.099974&lt;/span&gt;
&lt;span class="n"&gt;recent_avg_responses&lt;/span&gt;            &lt;span class="mf"&gt;0.098622&lt;/span&gt;
&lt;span class="n"&gt;author_verified&lt;/span&gt;                 &lt;span class="mf"&gt;0.006882&lt;/span&gt;
&lt;span class="n"&gt;recent_min_sentiment_polarity&lt;/span&gt;   &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;recent_avg_no_follow&lt;/span&gt;            &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;over_18&lt;/span&gt;                         &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;is_submitter&lt;/span&gt;                    &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;recent_num_comments&lt;/span&gt;             &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;recent_num_last_30_days&lt;/span&gt;         &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;recent_avg_gilded&lt;/span&gt;               &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;recent_avg_sentiment_polarity&lt;/span&gt;   &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;recent_percent_neg_score&lt;/span&gt;        &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;recent_avg_score&lt;/span&gt;                &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;recent_min_score&lt;/span&gt;                &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;recent_avg_controversiality&lt;/span&gt;     &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;recent_avg_ups&lt;/span&gt;                  &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;recent_max_diff_ratio&lt;/span&gt;           &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;span class="n"&gt;no_follow&lt;/span&gt;                       &lt;span class="mf"&gt;0.000000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Interesting! The most important feature was the average difference ratio in the text of the recent comments. This means if the text of the last 20 comments is very similar, it’s probably a bot. The next most important features were the comment karma, link karma, the number of responses to recent comments, and whether the account is verified.&lt;/p&gt;

&lt;p&gt;Why are the rest zero? We limited the depth of our binary tree to 3 levels, so we are intentionally not including all the features. Of note is that we didn’t consider the scores or sentiment of previous comments to classify the trolls. Either these trolls were fairly polite and earned a decent number of votes, or the other features had better discriminatory power.&lt;/p&gt;

&lt;p&gt;Let’s take a look at the actual decision tree to get more information.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;export_graphviz&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;estimator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;out_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tree.dot&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;feature_names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;target&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;class_names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;normal&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bot&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;troll&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;rounded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;proportion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;precision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fldk0tn490iusjdalzlvc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fldk0tn490iusjdalzlvc.png" title="decision tree" alt="decision tree"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we can get an idea of how this model works! You might need to zoom in to see the details. &lt;/p&gt;

&lt;p&gt;Let’s start at the top of the tree. When the recent comments are fairly similar to each other (the average difference ratio is high), then it’s more likely to be a bot. When they have dissimilar comments, low comment karma, and high link karma, they are more likely to be a troll. This could make sense if the trolls use posts of kittens to pump up their link karma, and then make nasty comments in the forums that either get ignored or downvoted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hosting an API
&lt;/h2&gt;

&lt;p&gt;To make our machine learning model available to the world, we need to make it available to our moderator dashboard. We can do that by hosting an API for the dashboard to call. &lt;/p&gt;

&lt;p&gt;To serve our API, we used &lt;a href="http://flask.pocoo.org/" rel="noopener noreferrer"&gt;Flask&lt;/a&gt;, which is a lightweight web framework for Python. When we load our machine learning model, the server starts. When it receives a POST request containing a JSON object with the comment data, it responds back with the prediction. &lt;/p&gt;

&lt;p&gt;Example for a **bot **user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"banned_by"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"no_follow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"link_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"t3_aqtwe1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"gilded"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"author"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"AutoModerator"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"author_verified"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"author_comment_karma"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;445850.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"author_link_karma"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1778.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"num_comments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"created_utc"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1550213389.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"over_18"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Hey, thanks for posting at &lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;/r&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;/SwitchHaxing! Unfortunately your comment has been removed due to rule 6; please post questions in the stickied Q&amp;amp;amp;A thread.If you believe this is an error, please contact us via modmail and well sort it out.*I am a bot"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"downs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"is_submitter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"num_reports"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"controversiality"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"quarantine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"false"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"ups"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"is_bot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"is_troll"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"recent_comments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"[...array of 20 recent comments...]"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response returned is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prediction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Is a bot user"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We deployed our API on &lt;a href="https://heroku.com/" rel="noopener noreferrer"&gt;Heroku&lt;/a&gt; because it makes it very easy to run. We just create a &lt;a href="https://devcenter.heroku.com/articles/procfile" rel="noopener noreferrer"&gt;Procfile&lt;/a&gt; with a single line telling Heroku which file to use for the web server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;web: python app.py ${port}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can then git push our code to heroku:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git push heroku master
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Heroku takes care of the hassle of downloading requirements, building the API, setting up a web server, routing, etc. We can now access our API at this URL and use &lt;a href="https://www.getpostman.com" rel="noopener noreferrer"&gt;Postman&lt;/a&gt; to send a test request:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://botidentification-comments.herokuapp.com/" rel="noopener noreferrer"&gt;https://botidentification.herokuapp.com/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  See it working
&lt;/h2&gt;

&lt;p&gt;Thanks to the great moderator dashboard we wrote in &lt;a href="https://dev.to/heroku/trolls-and-bots-are-disrupting-social-media-here-s-how-ai-can-stop-them-part-1-55df"&gt;part one&lt;/a&gt;, we can now see the performance of our model operating on real comments. If you haven’t already, check it out here: &lt;a href="https://reddit-dashboard.herokuapp.com/" rel="noopener noreferrer"&gt;reddit-dashboard.herokuapp.com&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fu65an5zd9i3t2tbgh6p4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fu65an5zd9i3t2tbgh6p4.png" title="Reddit bot and troll dashboard" alt="Reddit bot and troll dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’s streaming real live comments from the r/politics subreddit. You can see each comment and whether the model scored it as a bot, troll or normal user. &lt;/p&gt;

&lt;p&gt;You may see some comments labeled as bots or trolls, but it’s not obvious why after inspecting their comment history. Keep in mind that we used a simple model in order to keep our tutorial easier to follow. The precision for labeling trolls is only 58%. That’s why we designed it as a filter for human moderators to review.&lt;/p&gt;

&lt;p&gt;If you’re interested in playing with this model yourself, check out the code on GitHub at &lt;a href="https://github.com/devspotlight/botidentification" rel="noopener noreferrer"&gt;https://github.com/devspotlight/botidentification&lt;/a&gt;. You can try improving the accuracy of the model by using a more sophisticated algorithm such as a random forest. Spoiler alert: it’s possible to get 95%+ accuracy on the test data with more sophisticated models, but we’ll leave it as an exercise for you.&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Trolls and bots are disrupting social media—here’s how AI can stop them (Part 1)</title>
      <dc:creator>Jason Skowronski</dc:creator>
      <pubDate>Fri, 09 Aug 2019 14:59:25 +0000</pubDate>
      <link>https://dev.to/heroku/trolls-and-bots-are-disrupting-social-media-here-s-how-ai-can-stop-them-part-1-55df</link>
      <guid>https://dev.to/heroku/trolls-and-bots-are-disrupting-social-media-here-s-how-ai-can-stop-them-part-1-55df</guid>
      <description>&lt;p&gt;Trolls and bots have a huge and often unrecognized influence on social media. They are used to influence conversations for commercial or political reasons. They allow small hidden groups of people to promote information supporting their agenda and a large scale. They can push their content to the top of people’s news feeds, search results, and shopping carts. Some say they can even influence presidential elections. In order to maintain the quality of discussion on social sites, it’s become necessary to screen and moderate community content. Can we use machine learning to identify suspicious posts and comments? The answer is yes, and we’ll show you how.&lt;/p&gt;

&lt;p&gt;This is a two part series. In this part, we'll cover how to collect comment data from Reddit in bulk and build a real-time dashboard using Node and Kafka to moderate suspected trolls and bots. In &lt;a href="https://dev.to/heroku/identifying-trolls-and-bots-on-reddit-with-machine-learning-part-2-2p0p"&gt;part two&lt;/a&gt;, we'll cover the specifics of building the machine learning model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trolls and bots are huge pain for social media
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Internet_troll" rel="noopener noreferrer"&gt;Trolls&lt;/a&gt; are dangerous online because it's not always obvious when you are being influenced by them or engaging with them. Posts created by Russian operatives were seen by up to &lt;a href="https://www.washingtonpost.com/news/the-switch/wp/2017/11/01/how-russian-trolls-got-into-your-facebook-feed/?noredirect=on&amp;amp;utm_term=.82754f76267b" rel="noopener noreferrer"&gt;126 million Americans on Facebook&lt;/a&gt; leading up to the last election. Twitter released a massive data dump of over &lt;a href="https://www.vox.com/2018/10/19/17990946/twitter-russian-trolls-bots-election-tampering" rel="noopener noreferrer"&gt;9 million tweets&lt;/a&gt; from Russian trolls. And it’s not just Russia! There are also accounts of trolls attempting to &lt;a href="https://www.buzzfeednews.com/article/craigsilverman/reddit-coordinated-chinese-propaganda-trolls" rel="noopener noreferrer"&gt;influence Canada&lt;/a&gt; after the conflict with Huawei. The problem even extends to online shopping where &lt;a href="https://www.forbes.com/sites/emmawoollacott/2017/09/09/exclusive-amazons-fake-review-problem-is-now-worse-than-ever/#39c475cd7c0f" rel="noopener noreferrer"&gt;reviews on Amazon&lt;/a&gt; have slowly been getting more heavily manipulated by merchants.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Social_bot" rel="noopener noreferrer"&gt;Bots&lt;/a&gt; are computer programs posing as people. They can amplify the effect of trolls by engaging or liking their content en masse, or by posting their own content in an automated fashion. They will get more sophisticated and harder to detect in the future. Bots can now create entire paragraphs of text in response to text posts or comments. &lt;a href="https://openai.com/blog/better-language-models/" rel="noopener noreferrer"&gt;OpenAI’s GPT-2&lt;/a&gt; model can write text that feels and looks very similar to human quality. OpenAI decided not to release it due to safety concerns, but it’s only a matter of time before the spammers catch up. As a disclaimer, not all bots are harmful. In fact, the majority of bots on Reddit try to help the community by moderating content, finding duplicate links, providing summaries of articles, and more. It will be important to distinguish helpful from harmful bots.&lt;/p&gt;

&lt;p&gt;How can we defend ourselves from propaganda and spam posted by malicious trolls and bots? We could carefully investigate the background of each poster, but we don’t have time to do this for every comment we read. The answer is to automate the detection using big data and machine learning. Let’s fight fire with fire!&lt;/p&gt;

&lt;h2&gt;
  
  
  Identifying bots and trolls on Reddit
&lt;/h2&gt;

&lt;p&gt;We’ll focus on Reddit because users often complain of trolls in political threads. It’s easier for trolls to operate thanks to anonymous posting. Operatives can create dozens or hundreds of accounts to simulate user engagement, likes and comments. Research from Stanford has shown that just &lt;a href="https://snap.stanford.edu/conflict/" rel="noopener noreferrer"&gt;1% of accounts create 74% of conflict&lt;/a&gt;. Over the past few months, we’ve seen numerous comments like this one in the worldnews subreddit:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“Anyone else notice the false users in this thread? I recognise their language. It has very specific traits like appearing to have genuine curiosity yet backed by absurd statements. Calling for 'clear evidence' and questioning the veracity of statements (which would normally be a good thing but not under a guise). Wonder if you could run it through machine learning to identify these type of users/comments.” - &lt;strong&gt;koalefant&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.reddit.com/r/worldnews/comments/aciovt/_/ed8alk0/?context=1" rel="noopener noreferrer"&gt;https://www.reddit.com/r/worldnews/comments/aciovt/_/ed8alk0/?context=1&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F2sav2lwe343eqr8ch0c8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F2sav2lwe343eqr8ch0c8.png" title="challenge accepted" alt="challenge accepted"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are several existing resources we can leverage. For example, the &lt;a href="https://www.reddit.com/r/botwatch/" rel="noopener noreferrer"&gt;botwatch&lt;/a&gt; subreddit keeps track of bots on Reddit, true to its namesake! &lt;a href="https://www.reddit.com/r/announcements/comments/8bb85p/reddits_2017_transparency_report_and_suspect/" rel="noopener noreferrer"&gt;Reddit’s 2017 Transparency Report&lt;/a&gt; also listed 944 accounts suspected of being trolls working for the Russian Internet Research Agency. &lt;/p&gt;

&lt;p&gt;Also, there are software tools for analyzing Reddit users. For example, the very nicely designed &lt;a href="https://atomiks.github.io/reddit-user-analyser/" rel="noopener noreferrer"&gt;reddit-user-analyzer&lt;/a&gt; can do sentiment analysis, plot the controversiality of user comments, and more. Let’s take this a step further and build a tool that puts the power in the hands of moderators and users. &lt;/p&gt;

&lt;p&gt;In this article, the first of a two-part series, we’ll cover how to capture data from Reddit’s API for analysis and how to build the actual dashboard. In part two, we’ll dive deeper into how we built the machine learning model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a dashboard of suspected bots and trolls
&lt;/h2&gt;

&lt;p&gt;In this tutorial, you’ll learn how to create a dashboard to identify bots and trolls on Reddit comments in real time, with the help of machine learning. This could be a useful tool to help moderators of political subreddits identify and remove content from bots and trolls. As users submit comments to the r/politics subreddit, we’ll capture the comments and run them through our machine learning model, then report suspicious ones on a dashboard for moderators to review.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fm9hvz5pb296oe4wq3lkt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fm9hvz5pb296oe4wq3lkt.png" title="Reddit bot and troll dashboard" alt="Reddit bot and troll dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s a screengrab from our dashboard. Try it out yourself at &lt;a href="https://reddit-dashboard.herokuapp.com/" rel="noopener noreferrer"&gt;reddit-dashboard.herokuapp.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To set your expectations, our system is designed as a proof of concept. It’s not meant to be a production system and is not 100% accurate. We’ll use it to illustrate the steps involved in building a system, with the hopes that platform providers will be able to offer official tools like these in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  System architecture
&lt;/h2&gt;

&lt;p&gt;Due to the high number of posts and comments being made on social media sites, its necessary to use a scalable infrastructure to process them. We’ll design our system architecture using an example written by the Heroku team in &lt;a href="https://blog.heroku.com/event-streams-kafka-redshift-metabase" rel="noopener noreferrer"&gt;Managing Real-time Event Streams with Apache Kafka&lt;/a&gt;. This is an event-driven architecture that will let us produce data from the Reddit API and send it to Kafka. &lt;a href="https://kafka.apache.org/" rel="noopener noreferrer"&gt;Kafka&lt;/a&gt; makes it easy to process streaming data and decouple the different parts of our system. Reading this data from Kafka, our dashboard can call the machine learning API and display the results. We’ll also store the data in Redshift for historical analysis and use as training data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fbg2lfp4w3q2k3es4yp4y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fbg2lfp4w3q2k3es4yp4y.png" title="Apache kafka" alt="Apache kafka"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Collecting data from Reddit
&lt;/h2&gt;

&lt;p&gt;Our first step is to download the comments from the politics subreddit for analysis. Reddit makes it easy to access comments as structured data in JSON format. To get recent commits for any subreddit just request the following URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://www.reddit.com/r/${subreddit}/comments.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Likewise, we can access public data about each user, including their karma and comment history. All we need to do is request this data from a URL containing the username, as shown below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://www.reddit.com/user/${username}/about.json
https://www.reddit.com/user/${username}/comments.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To collect the data, we just looped through each comment in the r/politics subreddit, and then loaded the user data for each commenter. You can use whatever HTTP request library you like, but we used our examples will use &lt;a href="https://github.com/axios/axios" rel="noopener noreferrer"&gt;axios&lt;/a&gt; for Node.js. Also, we’ll combine data from both calls into a single convenient data structure that includes both the user information and their comments. This will make it easier to store and retrieve each example later. This functionality can be seen in the &lt;a href="https://github.com/devspotlight/Reddit-Kafka-Producer/blob/master/profile-scraper.js" rel="noopener noreferrer"&gt;profile-scraper.js&lt;/a&gt; file and you can learn more about how to run it in the &lt;a href="https://github.com/devspotlight/Reddit-Kafka-Producer" rel="noopener noreferrer"&gt;README&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-time event streaming in Kafka
&lt;/h2&gt;

&lt;p&gt;Now that the data has been collected from Reddit, we are ready to stream the comments into Kafka. Before connecting to the Kafka server you will need to &lt;a href="https://devcenter.heroku.com/articles/kafka-on-heroku#understanding-topics" rel="noopener noreferrer"&gt;create a topic&lt;/a&gt; in the Heroku dashboard. Click Add Topic and set the topic name with a single partition.&lt;/p&gt;

&lt;p&gt;To connect to the Kafka server as a &lt;a href="https://kafka.apache.org/documentation/#producerapi" rel="noopener noreferrer"&gt;Producer&lt;/a&gt; in Node.js you can use the &lt;a href="https://www.npmjs.com/package/no-kafka" rel="noopener noreferrer"&gt;no-kafka&lt;/a&gt; library with the connection information already set in the cluster created by Heroku:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Kafka&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;no-kafka&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;KAFKA_URL&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cert&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;KAFKA_CLIENT_CERT&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;KAFKA_CLIENT_CERT_KEY&lt;/span&gt;

&lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writeFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./client.crt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cert&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writeFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./client.key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;Kafka&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Producer&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;clientId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;reddit-comment-producer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;connectionString&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\+&lt;/span&gt;&lt;span class="sr"&gt;ssl/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;ssl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;certFile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./client.crt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;keyFile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./client.key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After you are connected to Kafka you can send messages to the topic you created&lt;/p&gt;

&lt;p&gt;earlier. For convenience, we decided to &lt;a href="https://www.w3schools.com/js/js_json_stringify.asp" rel="noopener noreferrer"&gt;stringify the JSON messages&lt;/a&gt; before sending them to Kafka in our live streaming app:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;northcanadian-72923.reddit-comments&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In our repo, the sample live streaming worker code is in the &lt;a href="https://github.com/devspotlight/Reddit-Kafka-Producer/blob/master/kafka-stream.js" rel="noopener noreferrer"&gt;kafka-stream.js&lt;/a&gt; file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a moderator dashboard
&lt;/h2&gt;

&lt;p&gt;Our sample dashboard is a JavaScript application based on a previous version of the &lt;a href="https://github.com/heroku/kafka-demo" rel="noopener noreferrer"&gt;twitter-display Kafka demo app&lt;/a&gt; by Heroku. We simplified the app by removing some dependencies and modules, but the general architecture remains: an Express app (server-side) to consume and process the Kafka topic, connected via a web socket with a &lt;a href="https://d3js.org/" rel="noopener noreferrer"&gt;D3&lt;/a&gt; front end (client-side) to display the messages (Reddit comments) and their classification in real time. You may find our open source code on &lt;a href="https://github.com/devspotlight/Reddit-Kafka-Consumers" rel="noopener noreferrer"&gt;https://github.com/devspotlight/Reddit-Kafka-Consumers&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;In the server-side Node app, we connect to Kafka as a simple &lt;a href="https://kafka.apache.org/documentation/#consumerapi" rel="noopener noreferrer"&gt;Consumer&lt;/a&gt;, subscribe to the topic, and broadcast each group of messages to our function which loads the prediction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Consumer&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;broadcast&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msgs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;predictBotOrTrolls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msgs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;constants&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;INTERVAL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;constants&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;KAFKA_TOPIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;connectionString&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;KAFKA_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;ssl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;cert&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./client.crt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./client.key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We then use &lt;strong&gt;unirest&lt;/strong&gt; (HTTP/REST request library) to send the unified data scheme from those messages to our machine learning API for real-time predictions on whether or not the author is a person or a bot or troll (more about that in the next section of this article). &lt;/p&gt;

&lt;p&gt;Finally, a WebSocket server is used &lt;a href="https://github.com/devspotlight/Reddit-Kafka-Consumers/blob/master/viz/app.js#L43" rel="noopener noreferrer"&gt;in our app.js&lt;/a&gt; so that the front end can get all the display data in real time. Since the subreddit comments stream in real time, the scaling and load balancing of each application should be considered and monitored.&lt;/p&gt;

&lt;p&gt;We use the popular D3 JavaScript library to update the dashboard dynamically as Kafka messages stream in. Visually, there is a special table bound to the data stream, and this table gets updated with the newest comments as they come (newest first), as well as the count of each user type detected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;d3&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;d3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DataTable&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;maxSize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tbody&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;d3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_maxSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;maxSize&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_rowData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_rowData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_rowData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_maxSize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_rowData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;splice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_rowData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_maxSize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;// Bind data rows to target table&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tbody&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;selectAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tr&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_rowData&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See &lt;a href="https://github.com/devspotlight/analytics-with-kafka-redshift-metabase/blob/master/viz/src/lib/data-table.js" rel="noopener noreferrer"&gt;data-table.js&lt;/a&gt; for more details. The code shown above is just an excerpt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Calling out to our ML API
&lt;/h2&gt;

&lt;p&gt;Our machine learning API is designed to examine features about the comment poster’s account and recent comment history. We trained our model to examine features like their Reddit “karma”, number of comments posted, whether they verified their account, and more. We also provided it with a collection of features that we hypothesize will be useful in categorizing users. We pass the collection to the model as a JSON object.  The model then returns a prediction for that user that we can display in our dashboard. Below are sample JSON data objects (using our unified data scheme) sent as requests to the HTTP API.&lt;/p&gt;

&lt;p&gt;Example for a &lt;strong&gt;bot&lt;/strong&gt; user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"banned_by"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"no_follow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"link_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"t3_aqtwe1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"gilded"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"author"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"AutoModerator"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"author_verified"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"author_comment_karma"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;445850.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"author_link_karma"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1778.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"num_comments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"created_utc"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1550213389.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"over_18"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Hey, thanks for posting at &lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;/r&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;/SwitchHaxing! Unfortunately your comment has been removed due to rule 6; please post questions in the stickied Q&amp;amp;amp;A thread.If you believe this is an error, please contact us via modmail and well sort it out.*I am a bot"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"downs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"is_submitter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"num_reports"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"controversiality"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"quarantine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"false"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"ups"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"is_bot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"is_troll"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"recent_comments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"[...array of 20 recent comments...]"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response returned is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prediction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Is a bot user"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Run it easily using a Heroku Button
&lt;/h2&gt;

&lt;p&gt;As you can see, our architecture has many parts—including producers, Kafka, and a visualization app—which might make you think that it’s difficult to run or manage. However, we have a &lt;a href="https://github.com/devspotlight/Reddit-Kafka-Consumers" rel="noopener noreferrer"&gt;Heroku button&lt;/a&gt; that allows us to run the whole stack in a single click. Pretty neat, huh? This opens the door to using more sophisticated architectures without the extra fuss.&lt;/p&gt;

&lt;p&gt;If you’re technically inclined, give it a shot. You can have a Kafka cluster running pretty quickly, and you only pay for the time it's running. Check out our documentation for the local development and the production deployment processes in our code’s &lt;a href="https://github.com/devspotlight/Reddit-Kafka-Consumers" rel="noopener noreferrer"&gt;README&lt;/a&gt; document.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;We’d like to encourage the community to use these types of techniques to control the spread of trolls and harmful bots.  It’s an exciting time to be alive and watch as trolls attempt to influence social media, while these communities develop better machine learning and moderation tools to stop them. Hopefully we’ll be able to keep our community forums as places for meaningful discussion.&lt;/p&gt;

&lt;p&gt;Check out our part two article “&lt;a href="https://dev.to/heroku/identifying-trolls-and-bots-on-reddit-with-machine-learning-part-2-2p0p"&gt;Detecting bots and trolls on Reddit using machine learning&lt;/a&gt;”, which will dive deeper into how we built the machine learning model and its accuracy.&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>node</category>
      <category>webdev</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Twelve-Factor Apps: A Retrospective and Look Forward</title>
      <dc:creator>Jason Skowronski</dc:creator>
      <pubDate>Wed, 17 Jul 2019 15:09:57 +0000</pubDate>
      <link>https://dev.to/heroku/twelve-factor-apps-a-retrospective-and-look-forward-4j4f</link>
      <guid>https://dev.to/heroku/twelve-factor-apps-a-retrospective-and-look-forward-4j4f</guid>
      <description>&lt;p&gt;If your team is creating apps for the cloud, chances are the &lt;a href="https://12factor.net/"&gt;Twelve-Factor App methodology&lt;/a&gt; has influenced the frameworks and platforms you’re using. Popular frameworks such as Spring Boot, Magento, and more credit the twelve factors as part of their design. Leading companies such as Heroku, Amazon, and Microsoft use and recommend the methodology. While new frameworks and methodologies are released every month, few have the far-reaching impact of this one. &lt;/p&gt;

&lt;p&gt;Let's take a look at what these factors are all about, the story behind the creation of this methodology seven years ago, and why they are just as important today.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creation of Twelve-Factor Apps
&lt;/h2&gt;

&lt;p&gt;In late 2011, &lt;a href="https://www.heroku.com/"&gt;Heroku&lt;/a&gt; co-founder Adam Wiggins knew there was a problem with the state of application development and deployment. Adam and his team had been personally involved in hundreds of apps and witnessed hundreds of thousands of app deployments into the cloud through the Heroku platform. Some of these apps were successful, taking advantage of features such as distributed architectures. But some of these apps had serious problems: they were not scalable, portable, or easy to maintain. The team recognized a common set of issues among these problem apps, and wanted to do something about it.&lt;/p&gt;

&lt;p&gt;Adam and his team came up with a set of guidelines for building successful cloud apps— guidelines that would minimize cost and time, maximize portability, enable continuous deployment, and scale up without changes to processes or architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Twelve-Factor Apps: A Methodology for SaaS App Development
&lt;/h2&gt;

&lt;p&gt;Let’s start with an overview of the twelve factors. You can see the full list and extra detail for each factor at &lt;a href="https://12factor.net/"&gt;12factor.net&lt;/a&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://12factor.net/codebase"&gt;Codebase&lt;/a&gt;&lt;br&gt;
Use source control. One codebase per application. Deploy to multiple environments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://12factor.net/dependencies"&gt;Dependencies&lt;/a&gt;&lt;br&gt;
Declare and isolate dependencies. Never rely on the existence of system packages. Never commit dependencies in the codebase repository.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://12factor.net/config"&gt;Config&lt;/a&gt;&lt;br&gt;
Keep configuration separate from codebase.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://12factor.net/backing-services"&gt;Backing Services&lt;/a&gt;&lt;br&gt;
Treat services the app consumes (database, caching, and so on) as attachable resources. You should be able to swap your database instance without code changes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://12factor.net/build-release-run"&gt;Build, Release, Run&lt;/a&gt;&lt;br&gt;
Deploy apps in three discrete steps: build (convert codebase into executable), release (combine build artifacts with config to create a release image), and run (use the same release image every time you launch).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://12factor.net/processes"&gt;Processes&lt;/a&gt;&lt;br&gt;
Processes should be stateless.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://12factor.net/port-binding"&gt;Port Binding&lt;/a&gt;&lt;br&gt;
Export services via port binding. Apps should be completely self-contained.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://12factor.net/concurrency"&gt;Concurrency&lt;/a&gt;&lt;br&gt;
Scale out by decomposing applications into individual processes that do specific jobs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://12factor.net/disposability"&gt;Disposability&lt;/a&gt;&lt;br&gt;
Apps should be quick to start, resilient to failure, and graceful to shut down. Expect servers to fail, be added, change.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://12factor.net/dev-prod-parity"&gt;Dev/prod parity&lt;/a&gt;&lt;br&gt;
Keep the development environment identical to all other environments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://12factor.net/logs"&gt;Logs&lt;/a&gt;&lt;br&gt;
Treat logs as events streams. Write to stdout and stderr.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://12factor.net/admin-processes"&gt;Admin Processes&lt;/a&gt;&lt;br&gt;
Run admin tasks (database migrations, background jobs, cache clearing, and so on) as one-off, isolated processes.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How the Twelve-Factor App Changed Application Development and Dev-Ops
&lt;/h2&gt;

&lt;p&gt;Over the past seven years, these twelve factors have guided tens of thousands of apps to success. In fact, the methodology has worked so well in creating apps that are maintainable, scalable, and portable that many of the twelve factors have now been adopted as “common sense” for cloud development and dev-ops. We all now know about explicit dependency declarations, scaling out versus up, and the benefits of a code repository. However, some of these factors were radical suggestions when first introduced.&lt;/p&gt;

&lt;p&gt;These battle-tested, industry-accepted factors have even been codified by some of the most successful cloud frameworks and tools. Most modern frameworks such as &lt;a href="https://spring.io/blog/2016/09/12/springone-platform-2016-replay-12-factor-or-cloud-native-apps-what-exactly-does-that-mean-for-spring-developers"&gt;Spring &amp;amp; Spring Boot&lt;/a&gt;, &lt;a href="https://symfony.com/blog/new-in-symfony-3-2-runtime-environment-variables"&gt;Symfony&lt;/a&gt;, and &lt;a href="https://devdocs.magento.com/guides/v2.3/config-guide/deployment/pipeline/"&gt;Magento&lt;/a&gt; (Adobe) embody the twelve factors as part of their design principles. Tools such as Docker images and Heroku slugs (build/release/run), Vagrant (dev/prod parity), Puppet and Vault (configuration), and Papertrail (logging) have been created to enforce, automate, and simplify management for apps using the factors.&lt;/p&gt;

&lt;p&gt;The Heroku platform also embodies the twelve factors. For example, Heroku requires apps to be decomposed into one or more lightweight, discrete containers (dynos)—a direct manifestation of the stateless factor. Heroku also enforces languages and frameworks to use an explicit list of app dependencies (such as Ruby’s bundler), allows admin processes to be run in isolation using one-off dynos, and aggregates the output streams of all running dynos in an application so that logs can be processed as a stream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Just As Important Today As Seven Years Ago
&lt;/h2&gt;

&lt;p&gt;The Twelve-Factor App methodology continues to be just as important today as when it was first released. Millions of people visited the Twelve-Factor App website in the last year. Companies such as &lt;a href="https://aws.amazon.com/blogs/compute/applying-the-twelve-factor-app-methodology-to-serverless-applications/"&gt;Amazon&lt;/a&gt;, &lt;a href="//docs.microsoft.com/en-us/dotnet/standard/modernize-with-azure-and-containers/modernize-existing-apps-to-cloud-optimized/what-about-cloud-native-applications"&gt;Microsoft&lt;/a&gt;, &lt;a href="https://sdtimes.com/webdev/twelve-factor-app-methodology-sets-guidelines-modern-apps/"&gt;IBM, and Pivotal&lt;/a&gt; continue to use and recommend the methodology.&lt;/p&gt;

&lt;p&gt;Newer architectures, such as microservices, serverless containers, and most cloud deployment still benefit from (and often critically enforce) the methodology. Amazon Lambda functions, for example, enforce factors such as stateless components, scaling out, disposability, isolated admin functions, and backing services.&lt;/p&gt;

&lt;p&gt;The Twelve-Factor App methodology has guided an enormous number of apps, frameworks, and platforms to success over the years. Taking these factors into consideration early in your design process will help you and your team architect scalable, portable, maintainable apps.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>devops</category>
    </item>
    <item>
      <title>Beginner's Guide to Using CDNs</title>
      <dc:creator>Jason Skowronski</dc:creator>
      <pubDate>Tue, 25 Jun 2019 15:17:28 +0000</pubDate>
      <link>https://dev.to/mostlyjason/beginner-s-guide-to-using-cdns-4bde</link>
      <guid>https://dev.to/mostlyjason/beginner-s-guide-to-using-cdns-4bde</guid>
      <description>&lt;p&gt;Websites have become larger and more complex over the past few years, and users expect them to load instantaneously even on mobile devices. The smallest performance drops can have big effects: just a &lt;a href="https://www.akamai.com/uk/en/about/news/press/2017-press/akamai-releases-spring-2017-state-of-online-retail-performance-report.jsp" rel="noopener noreferrer"&gt;100ms decrease in page load time can drop conversions by 7%&lt;/a&gt;. With competitors just a click away, organizations wishing to attract and retain customers need to make web performance a priority. One relatively simple method of doing this is by using content delivery networks (CDNs).&lt;/p&gt;

&lt;p&gt;In this article, we'll explain how CDNs help improve web performance. We'll explain what they are, how they work, and how to implement them in your websites.&lt;/p&gt;

&lt;h1&gt;
  
  
  What is a CDN?
&lt;/h1&gt;

&lt;p&gt;A CDN is a distributed network and storage service that hosts web content in different geographical regions around the world. This content can include HTML pages, scripts, style sheets, multimedia files, and more. This lets you serve content from the CDN instead of your own servers, reducing the amount of traffic handled by your servers.&lt;/p&gt;

&lt;p&gt;CDNs can also act as a proxy between you and your users, offering services such as load balancing, firewalls, automatic HTTPS, and even redundancy in case your origin servers go offline (e.g. &lt;a href="https://www.cloudflare.com/always-online/" rel="noopener noreferrer"&gt;Cloudflare Always Online&lt;/a&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Should I Use a CDN?
&lt;/h2&gt;

&lt;p&gt;CDNs offload traffic from your servers, reducing your overall load. They are also optimized for speed and in many cases offer faster performance, which can &lt;a href="https://webmasters.googleblog.com/2010/04/using-site-speed-in-web-search-ranking.html" rel="noopener noreferrer"&gt;improve your SEO rankings&lt;/a&gt;. Since CDNs host data in centers located around the world, they literally move your content closer to your users. This can greatly reduce latency for some users and avoid downtime caused by data center outages or broken routes.&lt;/p&gt;

&lt;h1&gt;
  
  
  How Do CDNs Work?
&lt;/h1&gt;

&lt;p&gt;A CDN consists of multiple data centers around the world called Points of Presence (PoPs). Each PoP is capable of hosting and serving content to users. CDNs route users to specific PoPs based on a number of factors including distance, PoP availability, and connection speed.&lt;/p&gt;

&lt;p&gt;A PoP acts as a proxy between your users and your origin server. When a user requests a resource from your website such as an image or script, they are directed to the PoP. The PoP will then deliver the resource to the user if it has it cached.&lt;/p&gt;

&lt;p&gt;But how do you get the content to your PoP? Using one of two methods: pushing, or pulling. Pushing requires you to send your content to the CDN beforehand. This gives you greater control over what content gets served by the CDN, but if a user requests content that you haven't yet pushed, they may experience errors. &lt;/p&gt;

&lt;p&gt;Pulling is a much more automatic method, where the CDN automatically retrieves content that it hasn't already cached. When a user requests content that isn't already cached, the CDN pulls the most recent version of the content from your origin server. After a certain amount of time, the cached content expires and the CDN refreshes it from the origin the next time it's requested.&lt;/p&gt;

&lt;h1&gt;
  
  
  How Do I Choose a CDN?
&lt;/h1&gt;

&lt;p&gt;While CDNs work the same way fundamentally, they differ in a number of factors including:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Price&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most CDNs charge based on the amount of bandwidth used. Some may also charge based on the number of cache hits (files served from cache), cache misses (retrievals from the origin), and refreshes. Others charge a fixed fee and allow a certain amount of bandwidth over a period of time. When comparing CDNs, you should estimate your bandwidth needs and anticipated growth to find the best deal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Availability and Reliability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;CDNs strive for 100% uptime, but perfect uptime is never guaranteed. Consider your availability needs and how each CDN supports them. Also, compare CDNs based on their PoP uptime rather than their overall uptime, especially in the regions you expect to serve. If possible, verify that your CDN offers fallback options such as routing around downed PoPs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PoP Locations (Regions Served)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Depending on where your users are located, certain PoPs can serve your users more effectively. Choose a CDN that manages PoPs close to your users, or else you'll miss out on many of the performance benefits that CDNs offer.&lt;/p&gt;

&lt;h1&gt;
  
  
  How Do I Add a CDN to My Website?
&lt;/h1&gt;

&lt;p&gt;The process of adding a CDN to your website depends on where and how your website is hosted. We'll cover some of the more common methods below.&lt;/p&gt;

&lt;h2&gt;
  
  
  Web Hosting Provider
&lt;/h2&gt;

&lt;p&gt;If your website is hosted by a provider such as inMotion Hosting, HostGator, or 1&amp;amp;1, your provider may offer a CDN as a built-in or extra service. For example, &lt;a href="https://my.bluehost.com/hosting/help/cloudflare" rel="noopener noreferrer"&gt;Bluehost&lt;/a&gt; provides Cloudflare for free and enabled by default for all plans. You can still use a CDN if your host doesn't explicitly support it, but it may fall under one of the following processes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Content Management System (CMS)
&lt;/h2&gt;

&lt;p&gt;Content management systems (CMSes) like WordPress and Squarespace often support CDNs through the use of plugins. For WordPress, &lt;a href="https://jetpack.com/" rel="noopener noreferrer"&gt;Jetpack&lt;/a&gt; provides support for its own CDN automatically. Others such as &lt;a href="https://blog.cloudflare.com/w3-total-cache-w3tc-total-cloudflare-integrat/" rel="noopener noreferrer"&gt;W3TC&lt;/a&gt;, &lt;a href="https://wordpress.org/plugins/wp-super-cache/" rel="noopener noreferrer"&gt;WP Super Cache&lt;/a&gt;, and &lt;a href="https://wordpress.org/plugins/wp-fastest-cache/" rel="noopener noreferrer"&gt;WP Fastest Cache&lt;/a&gt; let you choose which CDN to direct users to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Self-Hosted
&lt;/h2&gt;

&lt;p&gt;Websites that you host yourself offer the greatest flexibility in choosing a CDN. However, they also require more setup. As an example, let's enable Google Cloud CDN for a website hosted on the Google Cloud Platform (GCP).&lt;/p&gt;

&lt;p&gt;This example assumes you have a GCP account, a domain registered with a registrar, and a website hosted in Compute Engine, App Engine, or another GCP service. If you don't already have a GCP account, &lt;a href="https://console.cloud.google.com/" rel="noopener noreferrer"&gt;create one&lt;/a&gt; and log into the &lt;a href="https://console.cloud.google.com/" rel="noopener noreferrer"&gt;Google Cloud Console&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Configure your DNS records&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditionally, the way to route your users to a CDN was to change the resource URLs in your website to point to URLs provided by the CDN. Most modern CDNs avoid this by managing your DNS records for you, letting you redirect users without requiring changes to your website.&lt;/p&gt;

&lt;p&gt;To configure Cloud DNS, view the &lt;a href="https://cloud.google.com/dns/docs/quickstart" rel="noopener noreferrer"&gt;Cloud DNS quickstart document&lt;/a&gt; and follow the instructions for &lt;a href="https://cloud.google.com/dns/docs/quickstart#create_a_managed_public_zone" rel="noopener noreferrer"&gt;creating a managed public zone&lt;/a&gt;. Don't create a new record or a CNAME record yet, since we don't yet have an IP address to point the DNS record to. In the screenshot below, we created a new zone called &lt;em&gt;mydomain-example&lt;/em&gt; for the domain &lt;em&gt;subdomain.mydomain.com&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F556y3ltatq8ll4fbbibu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F556y3ltatq8ll4fbbibu.png" alt="Example DNS zone for website"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Creating a DNS zone in Cloud DNS. © 2019 Google, LLC. All rights reserved.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;After creating the zone, update your registrar's domain settings to point to the Cloud DNS name servers. This will let you manage your domain records through Cloud DNS instead of through your registrar. For more information, visit the Cloud DNS documentation page on &lt;a href="https://cloud.google.com/dns/docs/update-name-servers" rel="noopener noreferrer"&gt;updating your domain's name servers&lt;/a&gt; or refer to your registrar's documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Enable Cloud CDN&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With DNS configured, we now need to enable the CDN itself. With Cloud CDN, a load balancer must be selected as the origin. If you don't already have a load balancer, you can follow these &lt;a href="https://cloud.google.com/load-balancing/docs/how-to" rel="noopener noreferrer"&gt;how-to guides&lt;/a&gt; to create one. For a standard HTTP/S website, &lt;a href="https://cloud.google.com/load-balancing/docs/https/setting-up-https" rel="noopener noreferrer"&gt;follow this guide&lt;/a&gt; for specific instructions. &lt;/p&gt;

&lt;p&gt;With your load balancer created, follow &lt;a href="https://cloud.google.com/cdn/docs/using-cdn#enable_existing" rel="noopener noreferrer"&gt;these instructions&lt;/a&gt; to enable Cloud CDN for an existing backend service. Once your new origin is created, select it from the origin list. You will need the IP address displayed in the &lt;strong&gt;Frontend&lt;/strong&gt; table to configure Cloud DNS, so make sure you copy it or keep this window open. The following screenshot shows an example Cloud CDN origin:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fbk39yra30xk2off8mdhp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fbk39yra30xk2off8mdhp.png" title="Web Load Balancer" alt="Web Load Balancer"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Viewing origin details in Cloud CDN. © 2019 Google, LLC. All rights reserved.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;After retrieving your frontend IP address, return to Cloud DNS and select your zone. Create a new A record to point the domain to your origin's IP address. You can find instructions on the Cloud DNS quickstart documentation page under &lt;a href="https://cloud.google.com/dns/docs/quickstart#create_a_new_record" rel="noopener noreferrer"&gt;creating a new record&lt;/a&gt;. This is shown in the screenshot below. Optionally, you can also create a CNAME record to redirect users from a subdomain, such as &lt;em&gt;&lt;a href="http://www.yourdomain.com" rel="noopener noreferrer"&gt;www.yourdomain.com&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fecdmt9k8j8cdo2k9um5z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fecdmt9k8j8cdo2k9um5z.png" title="Domain create record set" alt="Domain create record set "&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Creating a new DNS record set in Cloud DNS. © 2019 Google, LLC. All rights reserved.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Configure your web server&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To ensure your content is properly cached, make sure your web server responds to requests with the correct HTTP headers. Cloud CDN only caches responses that &lt;a href="https://cloud.google.com/cdn/docs/caching" rel="noopener noreferrer"&gt;meet certain requirements&lt;/a&gt;, some of which are specific to Cloud CDN. You will need to view your web server's documentation to learn how to set these headers. &lt;a href="https://cloud.google.com/cdn/docs/caching" rel="noopener noreferrer"&gt;Apache&lt;/a&gt; and &lt;a href="https://www.nginx.com/blog/nginx-caching-guide/https://www.nginx.com/blog/nginx-caching-guide/" rel="noopener noreferrer"&gt;Nginx&lt;/a&gt; provide guides with best practices for configuring caching.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Upload content to the CDN&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For most website operators you don’t need to do anything to upload content. That’s because the CDN will automatically cache resources from your server as people access your site. This is also known as the “pull method”. Alternatively, Google does allow you to push specific content you want to host by manually uploading it.&lt;/p&gt;

&lt;h1&gt;
  
  
  How Does a CDN Impact Performance?
&lt;/h1&gt;

&lt;p&gt;To demonstrate the performance benefits of CDNs, we ran a performance test on a website hosted on the Google Cloud Platform. The website is a static single page website created with &lt;a href="https://startbootstrap.com/" rel="noopener noreferrer"&gt;Bootstrap&lt;/a&gt; and the &lt;a href="https://startbootstrap.com/template-overviews/full-width-pics/" rel="noopener noreferrer"&gt;Full Width Pics&lt;/a&gt; template and consists of seven high-resolution images, courtesy of &lt;a href="https://www.jpl.nasa.gov/spaceimages/" rel="noopener noreferrer"&gt;NASA/JPL-Caltech&lt;/a&gt;. The server is a Google Compute Engine instance located in the us-east1-b region running Nginx 1.10.3.&lt;/p&gt;

&lt;p&gt;We configured the instance to allow direct incoming HTTP traffic. We also set up Google Cloud CDN for the instance. You can see a screenshot of the web page and networking timing of the site below using a waterfall chart.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F3jlqhwsulfjubqx9bynz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F3jlqhwsulfjubqx9bynz.png" title="AIRS Captures Polar Vortex" alt="AIRS Captures Polar Vortex"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;A waterfall chart of the test site using &lt;a href="https://developers.google.com/web/tools/chrome-devtools/" rel="noopener noreferrer"&gt;Chrome DevTools&lt;/a&gt;. © 2019 Google, LLC. All rights reserved.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We then ran a performance test using SolarWinds Pingdom. Pingdom provides a &lt;a href="https://www.pingdom.com/product/page-speed/" rel="noopener noreferrer"&gt;page speed test&lt;/a&gt; that measures the time needed to fetch and render each element of a web page. We created two separate checks to test the origin server and CDN separately, then compared the results to see which method was faster. To maximize latency, we ran both checks from Pingdom's Eastern Asia location.&lt;/p&gt;

&lt;h2&gt;
  
  
  Origin Results
&lt;/h2&gt;

&lt;p&gt;Running a speed test on the origin server resulted in a page load time of 3.68 seconds. The time to download the first byte from the server (shown as a blue line) was 318 milliseconds, meaning users had to wait one-third of a second before their device even began receiving content. Rendering the page (indicated by the orange line) took an additional 679 ms, meaning users had to wait almost a full second to see anything on their screen. By the time the page finished rendering (green line), users had been waiting more than 3.5 seconds.&lt;/p&gt;

&lt;p&gt;Most of this delay was due to downloading the high-resolution images, but a significant amount of time was spent connecting to the server and waiting for content to begin transferring.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Ft3811jku3aa49granasp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Ft3811jku3aa49granasp.png" title="Page load timeline when connecting to our test origin server1" alt="Page load timeline when connecting to our test origin server"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Page load timeline when connecting to our test origin server. © 2019 SolarWinds, Inc. All rights reserved.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  CDN Results
&lt;/h2&gt;

&lt;p&gt;With a CDN, we immediately saw a substantial improvement in load time. The entire page loaded in just 1.04 seconds, more than 2 seconds faster than the origin server. The most significant change is in the time to first byte (blue line), which dropped to just 7 ms. This means our users began receiving content almost immediately after connecting to the CDN.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fbvctvqrl8o7tc3t8pht3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fbvctvqrl8o7tc3t8pht3.png" title="Page load timeline when connecting to Google Cloud CDN" alt="Page load timeline when connecting to Google Cloud CDN"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Page load timeline when connecting to Google Cloud CDN. © 2019 SolarWinds, Inc. All rights reserved.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;While there wasn't a significant improvement in the DOM content load time (orange line), the connection and wait times dropped significantly. We also saw content begin to appear on the page as early as 0.5 seconds into the page load time. We can confirm this by looking at the film strip, which shows screenshots of the page at various points in the loading process. This is compared to the 1.5 seconds it took for the origin server to begin rendering content.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Ft54s46v7aplrbe2n5fj9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Ft54s46v7aplrbe2n5fj9.png" title="Filmstrip" alt="Filmstrip"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Comparing the page rendering time with a CDN (bottom) and without a CDN (top). © 2019 SolarWinds, Inc. All rights reserved.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;CDNs offer a significant performance boost without much effort on the part of organizations. The biggest challenge is finding out which CDN provider to choose. If you're not sure which provider will benefit you the most, &lt;a href="https://royal.pingdom.com/benchmarking-cdns-cloudfront-cloudflare-fastly-and-google-cloud/" rel="noopener noreferrer"&gt;we benchmarked four of the most popular providers&lt;/a&gt; (Cloudflare, Fastly, AWS CloudFront, and Google CDN). While performance plays a major role in each provider's viability, we also encourage you to factor in additional features, security, and integrations offered by the CDN.&lt;/p&gt;

&lt;p&gt;After setting up your CDN, you can check the performance difference using &lt;a href="http://pingdom.com/" rel="noopener noreferrer"&gt;SolarWinds Pingdom&lt;/a&gt;. In addition to running one-time tests, you can use Pingdom to schedule periodic checks to ensure your website is always performing at its best. In addition, you can use Pingdom to constantly monitor your website's availability and usability.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally posted on the &lt;a href="https://royal.pingdom.com/a-beginners-guide-to-using-cdns/" rel="noopener noreferrer"&gt;Pingdom blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>devops</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Benchmarking Popular NodeJS Logging Libraries  </title>
      <dc:creator>Jason Skowronski</dc:creator>
      <pubDate>Thu, 20 Jun 2019 15:00:13 +0000</pubDate>
      <link>https://dev.to/mostlyjason/benchmarking-popular-nodejs-logging-libraries-4kk1</link>
      <guid>https://dev.to/mostlyjason/benchmarking-popular-nodejs-logging-libraries-4kk1</guid>
      <description>&lt;p&gt;Sometimes developers are hesitant to include logging due to performance concerns, but is this justified and how much does the choice of library affect performance? &lt;/p&gt;

&lt;p&gt;Let’s run some benchmarks to find out! We ran a series of performance tests on some of the most popular NodeJS libraries. These tests are designed to show how quickly each library processed logging and the impact on the overall application.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Contenders
&lt;/h2&gt;

&lt;p&gt;For this test, we investigated some of the most commonly used NodeJS logging libraries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://www.npmjs.com/package/log4js" rel="noopener noreferrer"&gt;Log4js&lt;/a&gt; 4.0.2&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://www.npmjs.com/package/winston" rel="noopener noreferrer"&gt;Winston&lt;/a&gt; 3.2.1&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://www.npmjs.com/package/bunyan" rel="noopener noreferrer"&gt;Bunyan&lt;/a&gt; 1.8.12&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We also used the following additional libraries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://www.npmjs.com/package/winston-syslog" rel="noopener noreferrer"&gt;winston-syslog&lt;/a&gt; 2.0.1 for syslog logging with Winston&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://www.npmjs.com/package/bunyan-syslog" rel="noopener noreferrer"&gt;bunyan-syslog&lt;/a&gt; 0.3.2 for syslog logging with Bunyan&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We benchmarked these libraries to test their performance, sending their logs to a console and file system. We also tested sending log info to a local rsyslog server over both TCP and UDP since it is common and probably wise to offload logs in a production environment.&lt;/p&gt;

&lt;p&gt;These tests were run using NodeJS 8.15.1.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology
&lt;/h2&gt;

&lt;p&gt;Our goal was to compare the performance between the logging libraries. Each library was run on its default configuration and the same system was used across all libraries and tests.&lt;/p&gt;

&lt;p&gt;Our test application logged a total of 1,000,000 log events of the phrase “Hello, world!” and it’s available on GitHub at &lt;a href="https://github.com/codejamninja/node-log-benchmarks" rel="noopener noreferrer"&gt;https://github.com/codejamninja/node-log-benchmarks&lt;/a&gt;. We strictly processed logs to create an isolated control group. &lt;/p&gt;

&lt;p&gt;We measured the results dedicating either a single logical processor or eight (4 cores with hyperthreading) to simulate a larger production server. NodeJS is often considered a single threaded program, but technically it's just the event loop that is single threaded. There are many NodeJS tasks that take place on parallel threads, such as garbage collection. It's also worth noting that the tty (terminal) was doing a bunch of work printing the logs to the screen, which would have most definitely executed on a separate thread. That’s why is so important to test with multiple CPUs typically found on production systems.&lt;/p&gt;

&lt;p&gt;Also, the NodeJS file system writes are nonblocking (asynchronous). The &lt;em&gt;unblocked&lt;/em&gt; time lets us know when the code used to schedule the filesystem writes is finished and the system can continue executing additional business logic. However, the file system will still be asynchronously writing in the background. So, the &lt;em&gt;done&lt;/em&gt; time lets us know how long it took to actually write the logs to the filesystem.&lt;/p&gt;

&lt;p&gt;The hardware we used is from Amazon AWS.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;strong&gt;Name&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;
&lt;strong&gt;Spec&lt;/strong&gt;
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Processors
   &lt;/td&gt;
   &lt;td&gt;Intel Core i7-7700 @ 2.80GHz (4 cores, 8 threads)
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Memory
   &lt;/td&gt;
   &lt;td&gt;32GB Ram
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Operating System
   &lt;/td&gt;
   &lt;td&gt;64-bit Ubuntu 18.04.2 LTS Server
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;NodeJS
   &lt;/td&gt;
   &lt;td&gt;8.15.1 LTS
   &lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Test Results
&lt;/h2&gt;

&lt;p&gt;For all tests, the results are measured in milliseconds. The smaller bars are better because it means the logs took less time to process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Console
&lt;/h3&gt;

&lt;p&gt;For the first set of test results, we benchmarked the performance of the libraries when logging to the console.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F4sln7j2frop2fa14dauh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F4sln7j2frop2fa14dauh.png" title="Console Log Benchmarks" alt="Console Log Benchmarks"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From these results, we can see additional CPUs had a significant effect on the amount of time it took NodeJS to log to the console. Winston is the clear winner for speed in multithreaded systems; however, Bunyan performed slightly better in a single-threaded system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Filesystem
&lt;/h3&gt;

&lt;p&gt;For the second set of test results, we benchmarked the performance of the libraries when writing the logs to the filesystem. Again, notice that each test result contains two times, &lt;em&gt;unblocked&lt;/em&gt; and &lt;em&gt;done&lt;/em&gt;. This is because the libraries sometimes asynchronously send the logs to syslog. The total time to log is the sum of these two times.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F97mdvefpj6qgtex8wjjm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F97mdvefpj6qgtex8wjjm.png" title="Filesystem Log Benchmarks" alt="Filesystem Log Benchmarks"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After seeing how much additional CPUs affected console logs, I was very surprised to see that logging to the filesystem performed roughly the same with additional CPUs. This is most likely because the work required to write files is much less than the work required to print to a tty device, so there was less multithreaded activity happening.&lt;/p&gt;

&lt;p&gt;Log4js seemed to have the worst results writing to a filesystem, sometimes taking over 5 times the amount of time to write to the filesystem. Winston unblocked the event loop the fastest, but Bunyan finished writing to the filesystem the fastest. So, if you're choosing a log library based on filesystem performance, the choice would depend on whether you want the event loop unblocked the fastest or if you want the overall program execution to finish first.&lt;/p&gt;

&lt;h3&gt;
  
  
  Syslog UDP
&lt;/h3&gt;

&lt;p&gt;For the third set of test results, we benchmarked the performance of the libraries when sending the logs to syslog over UDP.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F0sjwg9evzzieboih1jy9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F0sjwg9evzzieboih1jy9.png" title="Syslog UDP Log Benchmarks" alt="Syslog UDP Log Benchmarks"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fwcndba1ifhrpmrk2qa4i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fwcndba1ifhrpmrk2qa4i.png" title="Syslog UDP Log Drop Rate" alt="Syslog UDP Log Drop Rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Log4js and Bunyan both finished around the same time when using multiple CPUs; however, Log4js unblocked the event loop much sooner and performed better on a single CPU.&lt;/p&gt;

&lt;p&gt;Log4js also successfully sent all of its logs to syslog without dropping a single one. Although Bunyan had a low drop rate, it still managed to drop a few logs. I would say Log4js is a clear winner when sending logs to syslog over UDP.&lt;/p&gt;

&lt;p&gt;I had a terrible experience getting Winston to work with syslog over UDP. When it did work it took well over a minute to unblock the event loop, and took over two minutes to finish sending the logs to syslog. However, most of the times I tested it, I ran out of memory before I could finish. I am assuming that when using UDP, the library aggregates all the logs in the heap before sending them to syslog, instead of immediately streaming the logs over to syslog. At any rate, it sends the logs over to syslog over UDP in a way that does not work well when slammed with a million logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Syslog TCP
&lt;/h2&gt;

&lt;p&gt;For the fourth set of test results, we benchmarked the performance of the libraries when sending the logs to syslog over TCP. Again, notice that each test result contains two times, &lt;em&gt;unblocked&lt;/em&gt; and &lt;em&gt;done&lt;/em&gt;. This is because the libraries sometimes asynchronously send the logs to syslog.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fhhvp9ne0axsnd4p8jzk6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fhhvp9ne0axsnd4p8jzk6.png" title="Syslog TCP Log Benchmarks" alt="Syslog TCP Log Benchmarks"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F1w7v7r8ecw7j9kays5yv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F1w7v7r8ecw7j9kays5yv.png" title="Syslog TCP Log Drop Rate" alt="Syslog TCP Log Drop Rate"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since Bunyan was the only library that successfully sent logs to syslog over TCP without dropping any of them, it is the clear winner. Despite its somewhat slow performance when multiple CPUs were introduced, it still was relatively fast.&lt;/p&gt;

&lt;p&gt;Sadly I was not able to get Log4js to send logs to syslog over TCP. I believe there is a bug in their library. I consistently received the following error.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;br&gt;
(node:31818) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'trace' of undefined&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Winston was relatively fast when sending logs to syslog over TCP, however, it had a  horrific log drop rate. Most of the logs were either dropped or corrupted. Below is an example of one of the corrupted logs syslog received from Winston. You can see that the message was cut off.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Mar 17 19:21:42 localhost /home/codejamninja/.nvm/versions/node/v8.15.1/bin/node[22463]: {"mes&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The log was supposed to look like this:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Mar 17 19:21:42 localhost /home/codejamninja/.nvm/versions/node/v8.15.1/bin/node[22463]: {"message": "92342: Hello, world!"}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Bunyan performed relatively well when sending logs to syslog over TCP. It did not drop a single log and unblocked the event loop very quickly. One thing that did surprise me though is that additional CPUs consistently performed worse than running on a single CPU. I am baffled by that, though this is the only scenario in which that happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;These results really took me by surprise. I was thinking there would be an overall winner, but each library performed best in different areas under different conditions.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;
&lt;strong&gt;Output type&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;
&lt;strong&gt;Winner&lt;/strong&gt;
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Console
   &lt;/td&gt;
   &lt;td&gt;Winston
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;File
   &lt;/td&gt;
   &lt;td&gt;Winston and Bunyan tied
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Syslog UDP
   &lt;/td&gt;
   &lt;td&gt;Log4js
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Syslog TCP
   &lt;/td&gt;
   &lt;td&gt;Bunyan
   &lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Winston performed best when logging to the console. Winston and Bunyan both performed best in their own ways when logging to the file system. Log4js performed the best when sending logs to syslog over UDP. Bunyan had the best results when sending logs to syslog over TCP.&lt;/p&gt;

&lt;p&gt;If you care more about throughput for syslog, then Log4js with UDP is the best output type. If you only care unblocking the code then Winston writing to a filesystem is the best. In this case, logging averaged 0.0005 ms per log event which is blazing fast. If your typical response latency is 100 ms, then it's only 0.0005% of your total response time. That’s faster than running console.log(). As long as you don’t go overboard with too many log statements, the impact is very small.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;tr&gt;
   &lt;td&gt;
   &lt;/td&gt;
   &lt;td&gt;Console
   &lt;/td&gt;
   &lt;td&gt;File
   &lt;/td&gt;
   &lt;td&gt;Syslog TCP
   &lt;/td&gt;
   &lt;td&gt;Syslog UDP
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Log4js
   &lt;/td&gt;
   &lt;td&gt;24385 ms
   &lt;/td&gt;
   &lt;td&gt;31584 ms
   &lt;/td&gt;
   &lt;td&gt;N/A
   &lt;/td&gt;
   &lt;td&gt;
&lt;strong&gt;1195 ms &lt;/strong&gt;
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Winston
   &lt;/td&gt;
   &lt;td&gt;10756 ms
   &lt;/td&gt;
   &lt;td&gt;7438 ms
   &lt;/td&gt;
   &lt;td&gt;9362 ms 
   &lt;/td&gt;
   &lt;td&gt;142871 ms
   &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Bunyan
   &lt;/td&gt;
   &lt;td&gt;15062 ms 
   &lt;/td&gt;
   &lt;td&gt;4197 ms 
   &lt;/td&gt;
   &lt;td&gt;24984 ms 
   &lt;/td&gt;
   &lt;td&gt;12029 ms
   &lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Overall, I would recommend using Log4js with UDP library for the best performance. This will have a negligible impact on your overall response time. Tools like &lt;a href="http://loggly.com/" rel="noopener noreferrer"&gt;Loggly&lt;/a&gt; will store and organize those logs for you. It will alert you when the system encounters critical issues so you can deliver a great experience to your customers.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>node</category>
    </item>
    <item>
      <title>Six Strategies for Deploying to Heroku</title>
      <dc:creator>Jason Skowronski</dc:creator>
      <pubDate>Wed, 19 Jun 2019 15:37:20 +0000</pubDate>
      <link>https://dev.to/heroku/six-strategies-for-deploying-to-heroku-4b63</link>
      <guid>https://dev.to/heroku/six-strategies-for-deploying-to-heroku-4b63</guid>
      <description>&lt;p&gt;There are many ways of deploying your applications to Heroku—so many, in fact, that we would like to offer some advice on which to choose. Each strategy provides different benefits based on your current deployment process, team size, and app. Choosing an optimal strategy can lead to faster deployments, increased automation, and improved developer productivity.&lt;/p&gt;

&lt;p&gt;The question is: How do you know which method is the "best" method for your team? In this post, we'll present six of the most common ways to deploy apps to Heroku and how they fit into your deployment strategy. These strategies are not mutually exclusive, and you can combine several to create the best workflow for your team. Reading this post will help you understand the different options available and how they can be implemented effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying to Production with Git
&lt;/h2&gt;

&lt;p&gt;Our first method is not only the most common, but also the simplest: &lt;a href="https://devcenter.heroku.com/articles/git"&gt;pushing code from a Git repository to a Heroku app&lt;/a&gt;. You simply add your Heroku app as a &lt;a href="http://git-scm.com/book/en/Git-Basics-Working-with-Remotes"&gt;remote&lt;/a&gt; to an existing Git repository, then use git push to send your code to Heroku. Heroku then automatically builds your application and creates a new &lt;a href="https://devcenter.heroku.com/articles/releases"&gt;release&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Because this method requires a developer with full access to manually push code to production, it's better suited for pre-production deployments or for projects with small, trusted teams.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pros:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Simple to add to any Git-based workflow&lt;/li&gt;
&lt;li&gt;Supports &lt;a href="https://devcenter.heroku.com/articles/git-submodules"&gt;Git submodules&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Requires access to both the Git repository and Heroku app&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  GitHub Integration
&lt;/h2&gt;

&lt;p&gt;If your repository is hosted on GitHub, you can use &lt;a href="https://devcenter.heroku.com/articles/github-integration"&gt;GitHub integration&lt;/a&gt; to deploy changes directly to Heroku. After linking your repository to a Heroku app, changes that are pushed to your repository are automatically deployed to the app. You can configure automatic deployments for a specific branch, or manually trigger deployments from GitHub. If you use continuous integration (CI), you can even prevent deployments to Heroku until your tests pass.&lt;/p&gt;

&lt;p&gt;GitHub integration is also useful for automating &lt;a href="https://devcenter.heroku.com/articles/pipelines"&gt;pipelines&lt;/a&gt;. For example, when a change is merged into the master branch, you might deploy to a staging environment for testing. Once the change has been validated, you can then &lt;a href="https://devcenter.heroku.com/articles/pipelines#promoting"&gt;promote&lt;/a&gt; the app to production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pros:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Automatically deploys apps and keeps them up-to-date&lt;/li&gt;
&lt;li&gt;Integrates with &lt;a href="https://devcenter.heroku.com/articles/pipelines"&gt;pipelines&lt;/a&gt; and &lt;a href="https://devcenter.heroku.com/articles/github-integration-review-apps"&gt;review apps&lt;/a&gt; to create a continuous delivery workflow&lt;/li&gt;
&lt;li&gt;If you use a CI service (such as &lt;a href="https://devcenter.heroku.com/articles/heroku-ci"&gt;Heroku CI&lt;/a&gt;) to build/test your changes, Heroku can prevent deployment when the result is fail&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Requires administrator access to the repository, so it’s only useful for repositories you own&lt;/li&gt;
&lt;li&gt;Does not support Git submodules&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Heroku Review Apps
&lt;/h2&gt;

&lt;p&gt;When introducing a change, chances are you want to test it before deploying it straight to production. &lt;a href="https://devcenter.heroku.com/articles/github-integration-review-apps"&gt;Review Apps&lt;/a&gt; let you deploy any GitHub pull request (PR) as an isolated, disposable instance. You can demo, test, and validate the PR without having to create a new app or overwrite your production app. Closing the PR destroys the review app, making it a seamless addition to your existing workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pros:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Can automatically create and update apps for each PR&lt;/li&gt;
&lt;li&gt;Supports Docker images&lt;/li&gt;
&lt;li&gt;Supports &lt;a href="https://devcenter.heroku.com/articles/private-spaces"&gt;Heroku Private Spaces&lt;/a&gt; for testing changes in an isolated environment&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Requires both &lt;a href="https://devcenter.heroku.com/articles/pipelines"&gt;pipelines&lt;/a&gt; and &lt;a href="https://devcenter.heroku.com/articles/github-integration"&gt;GitHub integration&lt;/a&gt; to be enabled&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deploying with Docker
&lt;/h2&gt;

&lt;p&gt;Docker lets you bundle your apps into self-contained environments, ensuring that they behave exactly the same both in development and in production. This also gives you more control over the languages, frameworks, and libraries used to run your app. To &lt;a href="https://devcenter.heroku.com/categories/deploying-with-docker"&gt;deploy a container to Heroku&lt;/a&gt;, you can either push an image to the &lt;a href="https://devcenter.heroku.com/articles/container-registry-and-runtime"&gt;Heroku container registry&lt;/a&gt;, or &lt;a href="https://devcenter.heroku.com/articles/build-docker-images-heroku-yml"&gt;build the image automatically&lt;/a&gt; by declaring it in your app's &lt;code&gt;heroku.yml&lt;/code&gt; file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pros:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Automatically generate images, or push an existing image to the container registry&lt;/li&gt;
&lt;li&gt;Consistency between development and production&lt;/li&gt;
&lt;li&gt;Compatible with &lt;a href="https://devcenter.heroku.com/articles/github-integration-review-apps"&gt;Heroku Review Apps&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;If your app doesn’t already run in Docker, you’ll need to build an image&lt;/li&gt;
&lt;li&gt;Requires you to maintain your own &lt;a href="https://devcenter.heroku.com/articles/stack"&gt;stack&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Does not support &lt;a href="https://devcenter.heroku.com/articles/pipelines-using-the-platform-api#performing-a-promotion"&gt;pipeline promotions&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Using Hashicorp Terraform
&lt;/h2&gt;

&lt;p&gt;Infrastructure-as-code tools like &lt;a href="https://www.terraform.io/"&gt;Hashicorp Terraform&lt;/a&gt; can be helpful to manage complex infrastructure. Terraform can also be used to deploy a Heroku app. Despite it not being officially supported by Heroku, Terraform is being used by many Heroku users. &lt;a href="https://devcenter.heroku.com/articles/using-terraform-with-heroku"&gt;Using Terraform with Heroku,&lt;/a&gt; you can define your Heroku apps with a declarative configuration language called HCL. Terraform automates the process of deploying and managing Heroku apps while also making it easy to coordinate Heroku with your existing infrastructure. Plus, Terraform v0.12 now allows you to store Remote State in a PostgreSQL database. This means you can now run Terraform on a Heroku dyno storing Terraform state in a Heroku Postgres database.&lt;/p&gt;

&lt;p&gt;For an example, check out a &lt;a href="https://devcenter.heroku.com/articles/event-driven-microservices-with-apache-kafka"&gt;reference architecture&lt;/a&gt; using Terraform and Kafka.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pros:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Automates Heroku app deployments&lt;/li&gt;
&lt;li&gt;Allows you to deploy Heroku apps as code&lt;/li&gt;
&lt;li&gt;Simplifies the management of large, complex deployments&lt;/li&gt;
&lt;li&gt;Allows you to configure multiple apps, Private Spaces as well as resources from other cloud providers (e.g. AWS, DNSimple, and Cloudflare) to have a repeatable, testable, multi-provider architecture.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Requires learning Terraform and writing configuration if you don’t use it already&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The 'Deploy to Heroku' Button
&lt;/h2&gt;

&lt;p&gt;What if deploying your app was as easy as clicking a button? With the &lt;a href="https://devcenter.heroku.com/articles/heroku-button"&gt;'Deploy to Heroku' button&lt;/a&gt;, it is! It’s great for taking an app for a test run with default settings in a single click, or to help train new developers.&lt;/p&gt;

&lt;p&gt;This button acts as a shortcut allowing you to deploy an app to Heroku from a web browser. This is great for apps that you provide to your users or customers, such as open source projects. You can parameterize each button with different settings such as passing custom environment variables to Heroku, using a specific Git branch or providing OAuth keys. The only requirements are that your source code is hosted in a GitHub repository and that you add a valid &lt;a href="https://dev.to/scottw/heroku-appjson-487i-temp-slug-5009239"&gt;&lt;code&gt;app.json&lt;/code&gt;&lt;/a&gt; file to the project's root directory. We’ve even heard of one company that adds a button to the README for each of their internal services. This forces them to keep the deploy process simple and aids new hires getting up to speed with how services are deployed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://heroku.com/deploy?template=https://github.com/heroku/node-js-sample"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nR3i8Sj1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://www.herokucdn.com/deploy/button.svg" alt="Deploy" width="147" height="32"&gt;&lt;/a&gt;A 'Deploy to Heroku' button.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pros:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Easy to add to a project's README file or web page&lt;/li&gt;
&lt;li&gt;Easy to use: simply click the button to deploy the app&lt;/li&gt;
&lt;li&gt;Provides a template with preconfigured default values, environment variables, and parameters&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cons:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Does not support Git submodules&lt;/li&gt;
&lt;li&gt;Apps deployed via button do not auto-update when new commits are added to the GitHub repo from which it was deployed&lt;/li&gt;
&lt;li&gt;Not a good workflow for apps that you need to keep up to date because buttons can only create new apps and the deployed app is not automatically connected to the GitHub repo from which it came&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Which Should I Choose?
&lt;/h2&gt;

&lt;p&gt;The method you choose depends on your specific deployment process, your requirements, and your apps. For small teams who are just getting started, deploying with Git is likely to be your first deployment due to its simplicity. The Heroku Button is equally straightforward, letting you deploy entire apps with a single click. If you use continuous integration or release frequently, integrating with GitHub can simplify this process even more by doing automated deployments when you commit your code. This is a big improvement over deploying on an IaaS system because Heroku manages the entire process automatically.&lt;/p&gt;

&lt;p&gt;As your requirements get more sophisticated, add the other strategies as needed. When your application is running in a production environment and you need quality control, you may want to add pipelines to get the advantages of review apps, automated testing, and staging environments. If you need a custom stack, then you can do so with Docker. As you add more complex infrastructure components, then add Terraform.&lt;/p&gt;

&lt;p&gt;Advanced teams will use a combination of strategies: For example, you may choose to deploy a Docker image by creating a review app from a GitHub pull request, testing the review app, then manually deploying the final version using &lt;code&gt;git push&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Ready to give one of these methods a try? Sign up for a &lt;a href="https://www.heroku.com/free"&gt;free Heroku account&lt;/a&gt; and test them out.&lt;/p&gt;

</description>
      <category>deploy</category>
      <category>coding</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
