<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Matthew Schwartz</title>
    <description>The latest articles on DEV Community by Matthew Schwartz (@mattschwartz).</description>
    <link>https://dev.to/mattschwartz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F83443%2Fe6903e10-0014-4da1-b914-2d9d069b992b.jpeg</url>
      <title>DEV Community: Matthew Schwartz</title>
      <link>https://dev.to/mattschwartz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mattschwartz"/>
    <language>en</language>
    <item>
      <title>Jack of All Trades, Master of Some</title>
      <dc:creator>Matthew Schwartz</dc:creator>
      <pubDate>Thu, 02 Jul 2020 15:30:55 +0000</pubDate>
      <link>https://dev.to/mattschwartz/jack-of-all-trades-master-of-some-3dmn</link>
      <guid>https://dev.to/mattschwartz/jack-of-all-trades-master-of-some-3dmn</guid>
      <description>&lt;p&gt;I've been a team lead and software development manager for many years. I've coded, architected, and managed many web development projects, mostly large scale SaaS applications. There are obviously many things I look for when interviewing candidates. There's one particular quality I'd like to talk about today: jack of all trades while being a master at some.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full Stack vs Single Tier
&lt;/h2&gt;

&lt;p&gt;There's often debate in each organization if it's better to hire full stack developers or people specialized for each tier. There's pros and cons to both approaches.&lt;/p&gt;

&lt;p&gt;Someone dedicating themselves to one or two technologies is able to spend more time learning them. We all know how time consuming it is to learn the wide array of web technologies.&lt;/p&gt;

&lt;p&gt;But having a good understanding of the whole architecture means when problems come up that span tiers, a full stack developer is more capable of solving them.  They can be involved in more of the conversations that drive the product.  For example, there are times a back-end team needs direction from the front-end team to satisfy UI requirements.  Those front-end developers who can speak with understanding of the complete stack will be able to contribute best.&lt;/p&gt;

&lt;p&gt;Also consider the pure managerial problem of assigning developers to projects and teams.  A full stack developer will have more places they can contribute across the entire product. This opens career paths and also lets you move easily if you get bored or want to learn something new.&lt;/p&gt;

&lt;p&gt;That said, my experience has shown me the best developers are those who are true experts in a set of technologies while having a good understanding of the entire stack. They are generally the best problem solvers and also the most successful as they move up in their careers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Library vs Language
&lt;/h2&gt;

&lt;p&gt;Similarly, the most successful developers I've known have a deep understanding of the programming languages they focus on.  They grow expertise in various libraries and frameworks but know the fundamentals of the platform they're running on.&lt;/p&gt;

&lt;p&gt;Anyone remember the Prototype JS framework?  Then script.aculo.us which was built on it?  Most readers here probably don't because jQuery became more popular when it came out.  And now we have additions to JS and frameworks like React which are replacing jQuery.&lt;/p&gt;

&lt;p&gt;Libraries and frameworks come and go. Most programming languages stick around far far longer. And the fundamental concepts that span programming languages have existing for decades. Become an expert in the foundation and everything built on top of it is both easier to learn and quicker to master.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;There's a balance to being a great developer and a value to your employer.  Don't &lt;em&gt;just&lt;/em&gt; be a generalist or you'll never be the go-to person that people can rely one.  And don't be so focused on one thing to the exclusion of everything else or your contributions will be limited and you'll spend more time catching up with the next big technology.&lt;/p&gt;

&lt;p&gt;This is just one person's opinion. YMMV.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Check out my current project, SocialSentiment.io, an application which performs &lt;a href="https://socialsentiment.io/stocks/"&gt;social media sentiment analysis of stocks&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>career</category>
      <category>interview</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Machine Learning: Staying on Topic</title>
      <dc:creator>Matthew Schwartz</dc:creator>
      <pubDate>Tue, 30 Jun 2020 12:57:11 +0000</pubDate>
      <link>https://dev.to/mattschwartz/machine-learning-staying-on-topic-lj</link>
      <guid>https://dev.to/mattschwartz/machine-learning-staying-on-topic-lj</guid>
      <description>&lt;p&gt;I started &lt;a href="https://socialsentiment.io/"&gt;SocialSentiment.io&lt;/a&gt; with a somewhat simplistic machine learning algorithm.  I defined a &lt;a href="https://towardsdatascience.com/a-beginners-guide-on-sentiment-analysis-with-rnn-9e100627c02e"&gt;recurrent neural network&lt;/a&gt; to perform sentiment analysis of short texts from social media.  Its purpose, and therefore its training set, is focused on the topic of stocks, companies, and their products.  It returned only one floating point value representing a single prediction for each string.  I quickly fell into a common ML natural language processing trap: new text which is off-topic returns unpredictable and unhelpful results.  Worse yet, the results don't directly indicate the text it analyzed is off-topic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Examples of the Problem
&lt;/h2&gt;

&lt;p&gt;Search a social network such as Twitter for references to the &lt;a href="https://socialsentiment.io/stocks/symbol/INTC/"&gt;Intel Corporation&lt;/a&gt;.  Many posts, probably most, refer to the company as "Intel" and not "Intel Corporation" or "Intel Corp".  Therefore you're going to search for the word "Intel".&lt;/p&gt;

&lt;p&gt;Here are some posts that recently came back which are on-topic:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    What does Apple dumping Intel mean for Mac users? 

    So turns out, I have been running all the games from the Intel gpu instead of the nvidia...

    Amazon buys self-driving car company run by former Intel Oregon exec
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Along with them comes posts that aren't related to the company or its stock at all:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    It is common for different intel agencies to attach different degrees of confidence based on the manner on underlying intel...

    Goddammit man what action are YOU gonna take? You’re the chairman of the intel committee!!!
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Another example is &lt;a href="https://socialsentiment.io/stocks/symbol/GOOG/"&gt;Google / Alphabet&lt;/a&gt;.  Youtube is a company owned by Alphabet, so they are included in our social media searches.  Search social media for Youtube and the most popular posts are about music and music videos on the site.&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    [Song name] Officially Sets YouTube Record For Most Views In First 24 Hours

    [Band name] Smashes YouTube Record As [song name] Soars Past 100 Million Views
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;While these are referring to Youtube, they aren't on-topic for the kinds of posts we're interested in analyzing.&lt;/p&gt;

&lt;p&gt;Since our NN is trained on posts involving general business and stock opinions, plus specific industry sentiment like computing, it naturally returns widely varying results for these off-topic texts.  These posts aren't useful to our analysis at all, so how do we ignore them?&lt;/p&gt;

&lt;h2&gt;
  
  
  Garbage In, Garbage Out
&lt;/h2&gt;

&lt;p&gt;An old coworker of mine used to respond to bug reports in his software with "Garbage in, garbage out!"  &lt;/p&gt;

&lt;p&gt;Ideally we would filter these posts out before processing them by our RNN.  So we started with this approach by adding a negative filter to social media searches.  Ignore "house intel" and "senate intel", for example.  This of course helped.&lt;/p&gt;

&lt;p&gt;But there are more difficult filters.  "Intel community", for example, may refer to the company or the government.  "Intel chairman" might be the board chairman or a member of the US Congress.  We don't want to ignore these posts and lose valuable information.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Label Classification &amp;amp; Off-Topic Training
&lt;/h2&gt;

&lt;p&gt;We added another approach to solve this problem.  We changed our ML algorithm to perform multi-label text classification.  Instead of a binary label classification, returning a floating point number between 0 and 1, we redesign and retrained it to label things as positive, negative, neutral, and off-topic.&lt;/p&gt;

&lt;p&gt;Our original binary classification took the typical approach of its last dense layer having a unit size of 1 with a sigmoid activation to bound the result between 0 and 1.  The redesigned model with &lt;a href="https://medium.com/towards-artificial-intelligence/keras-for-multi-label-text-classification-86d194311d0e"&gt;multi-label classification&lt;/a&gt; ended with a dense layer the size of the number of labels.  By keeping the sigmoid activation we now get a prediction of each individual label.&lt;/p&gt;

&lt;p&gt;If the prediction of every label is low for a text, or actually below some threshold we choose to rely on, then we know the model is not well trained for this particular text.  We can choose to ignore it or hang onto it later for better training.&lt;/p&gt;

&lt;p&gt;We can also proactively train it on my off-topic texts which it otherwise classified.  If the prediction for the off-topic label is high we must have previously trained it on something similar.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Since switching to this multi-label text classification model for our machine learning algorithm we have much more accurate results.  We still catch and predict the sentiment of too many off-topic posts.  With more training and fine tuning it'll improve over time.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>nlp</category>
    </item>
    <item>
      <title>How to Add Subscription Based Throttling to a Django API</title>
      <dc:creator>Matthew Schwartz</dc:creator>
      <pubDate>Sun, 14 Jun 2020 15:55:03 +0000</pubDate>
      <link>https://dev.to/mattschwartz/how-to-add-subscription-based-throttling-to-a-django-api-28j0</link>
      <guid>https://dev.to/mattschwartz/how-to-add-subscription-based-throttling-to-a-django-api-28j0</guid>
      <description>&lt;p&gt;Python was a natural choice when I started &lt;a href="https://socialsentiment.io"&gt;SocialSentiment.io&lt;/a&gt;. It let me use the same language for both the machine learning algorithms and web development. And I had used &lt;a href="https://www.djangoproject.com/"&gt;Django&lt;/a&gt; previously for other projects. The &lt;a href="https://www.django-rest-framework.org/"&gt;Django Rest Framework&lt;/a&gt; (DRF) is a great package to quickly and easily extend a Django project to offer APIs. Today we'll look at how to extend its capabilities to support custom throttling based on user subscriptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Subscription Model
&lt;/h2&gt;

&lt;p&gt;First let's define our application's &lt;a href="https://socialsentiment.io/plans/"&gt;subscription model&lt;/a&gt; and throttling requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A free tier allowing a few hundred API requests per day&lt;/li&gt;
&lt;li&gt;A low cost paid tier offering a few thousand requests per day&lt;/li&gt;
&lt;li&gt;A higher cost tier offering unlimited requests&lt;/li&gt;
&lt;li&gt;All tiers limited to 5 requests per second&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a very common use case for a modern SaaS application.&lt;/p&gt;

&lt;h2&gt;
  
  
  Custom Throttling Class
&lt;/h2&gt;

&lt;p&gt;One great thing about Django Rest Framework is it includes many built-in options for authentication and throttling.  Each can be applied globally or to specific endpoints.  If you desire any type of dynamic throttling options you'll need to extend it. Fortunately the architecture of DRF lets you override just about any part of it.&lt;/p&gt;

&lt;p&gt;Let's start by writing a custom class that overrides DRF's &lt;code&gt;UserRateThrottle&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;rest_framework.throttling&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;UserRateThrottle&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SubscriptionRateThrottle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;UserRateThrottle&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Define a custom scope name to be referenced by DRF in settings.py
&lt;/span&gt;    &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"subscription"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;allow_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="s"&gt;"""
        Override rest_framework.throttling.SimpleRateThrottle.allow_request

        Check to see if the request should be throttled.

        On success calls `throttle_success`.
        On failure calls `throttle_failure`.
        """&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_staff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# No throttling
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_authenticated&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;user_daily_limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_user_daily_limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_daily_limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# Override the default from settings.py
&lt;/span&gt;                &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;86400&lt;/span&gt;
                &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_requests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_daily_limit&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# No limit == unlimited plan
&lt;/span&gt;                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

        &lt;span class="c1"&gt;# Original logic from the parent method...
&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_cache_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Drop any requests from the history which have now passed the
&lt;/span&gt;        &lt;span class="c1"&gt;# throttle duration
&lt;/span&gt;        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_requests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;throttle_failure&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;throttle_success&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;What we're doing is dynamically looking up the user-specific throttle at the key moment to override the default DRF picks up from your settings file.  Define a method &lt;code&gt;get_user_daily_limit&lt;/code&gt; to look up the value. I highly recommend using Django's cache methods if this is stored in a database for performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Settings
&lt;/h2&gt;

&lt;p&gt;Next let's see what's required in &lt;code&gt;settings.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;REST_FRAMEWORK&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s"&gt;'DEFAULT_AUTHENTICATION_CLASSES'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...],&lt;/span&gt;
    &lt;span class="s"&gt;'DEFAULT_PERMISSION_CLASSES'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s"&gt;'rest_framework.permissions.IsAuthenticated'&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s"&gt;'DEFAULT_THROTTLE_CLASSES'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s"&gt;'rest_framework.throttling.UserRateThrottle'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;'app.throttling.SubscriptionDailyRateThrottle'&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s"&gt;'DEFAULT_THROTTLE_RATES'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'5/second'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;'subscription'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;'200/day'&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Here we set up two types of throttling.  The built-in &lt;code&gt;UserRateThrottle&lt;/code&gt; will handle the global 5 requests per second limit. It finds that setting in &lt;code&gt;DEFAULT_THROTTLE_RATES&lt;/code&gt; with key &lt;code&gt;user&lt;/code&gt;. Our custom throttle class is also enabled and defaults to the &lt;code&gt;subscription&lt;/code&gt; value if a user subscription isn't found.  Of course the application should be written so this never happens, but it's good to have a fallback plan if a user isn't configured properly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Subscriptions
&lt;/h2&gt;

&lt;p&gt;How you code and model your subscriptions is up to you. In my case I wrote static classes that define the details of each subscription tier.  A &lt;code&gt;Subscription&lt;/code&gt; model links the user to a specific plan with details such as start time, payment details, etc. &lt;/p&gt;

&lt;p&gt;The nice thing is Django and DRF don't dictate how you design your user subscriptions.  Any way you choose to model it they'll handle because you can customize every aspect of authorization and throttling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;So far I only have good things to say about the flexibility of Django and DRF and the customizations they allow.  They took the right approach in offering a wide variety of built-in capabilities while allowing developers the opportunity to easily extend or override them.  It's been working great for SocialSentiment.io and &lt;a href="https://socialsentiment.io/api/v1/getting-started/"&gt;our APIs&lt;/a&gt;.  I'd like to hear how others have added their own features to Django Rest Framework in the comments below.&lt;/p&gt;

</description>
      <category>python</category>
      <category>django</category>
      <category>saas</category>
      <category>api</category>
    </item>
    <item>
      <title>Quickly find common phrases in a large list of strings</title>
      <dc:creator>Matthew Schwartz</dc:creator>
      <pubDate>Sat, 18 Jan 2020 18:47:03 +0000</pubDate>
      <link>https://dev.to/mattschwartz/quickly-find-common-phrases-in-a-large-list-of-strings-9in</link>
      <guid>https://dev.to/mattschwartz/quickly-find-common-phrases-in-a-large-list-of-strings-9in</guid>
      <description>&lt;p&gt;Python is very good at efficiently iterating over sets of data and gathering useful information.  This is often accomplished with a surprisingly short amount of code.&lt;/p&gt;

&lt;p&gt;I recently came across a use case within a Python application where I wanted to find repeated phrases in sets of social media posts.  This is an easily managed problem because these posts are relatively short, typically under 300 characters, and therefore we can process thousands directly in memory.&lt;/p&gt;

&lt;p&gt;NLTK includes text correlation utilities to solve this problem.  In my case I didn't get the results I require, so I found a simple solution that requires no libraries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ins and Outs
&lt;/h2&gt;

&lt;p&gt;First let's define the input and output.  Our function will take &lt;strong&gt;an iterable of strings&lt;/strong&gt;, a &lt;strong&gt;maximum phrase length&lt;/strong&gt; (default 3), and a &lt;strong&gt;minimum repeat count&lt;/strong&gt; (default 2).  As we'll soon see, the choice of phrase length will have a &lt;em&gt;huge&lt;/em&gt; impact on performance.  Minimum repeat count will have a smaller but still significant impact.&lt;/p&gt;

&lt;p&gt;The output will be a dictionary.  Each key will be a tuple of words which make up a found phrase.  Words returned will be all lower case.  Each value in the dictionary will be the number of times it's found.&lt;/p&gt;

&lt;p&gt;In many cases you'll want to &lt;strong&gt;ignore stop words&lt;/strong&gt;.  Useful lists can be found in a &lt;a href="https://gist.github.com/sebleier/554280"&gt;gist&lt;/a&gt; and the comments under it on Github.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;stopwords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"a"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"i"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"some"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"which"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"where"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h2&gt;
  
  
  The Algorithm
&lt;/h2&gt;

&lt;p&gt;Let's start our function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;re&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_common_phrases&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;maximum_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;minimum_repeat&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;phrases&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;First let's &lt;em&gt;break down the texts into phrases&lt;/em&gt;. These will be tuples between 1 and &lt;code&gt;maximum_length&lt;/code&gt; in size.  For each phrase we'll count how often we find it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Replace separators and punctuation with spaces
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;r'[.!?,:;/\-\s]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Remove extraneous chars
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;r'[\\|@#$&amp;amp;~%\(\)*\"]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Remove stop words and empty strings
&lt;/span&gt;    &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stopwords&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Look at phrases no longer than maximum_length words long
&lt;/span&gt;    &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;length&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;maximum_length&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;maximum_length&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="c1"&gt;# Walk over all sets of words
&lt;/span&gt;        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;phrase&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;phrase&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;phrase&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;phrase&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;phrases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;phrases&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;phrase&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;phrases&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;phrase&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You'll notice that as we increase &lt;code&gt;maximum_length&lt;/code&gt; this will trigger many more loop iterations.  &lt;strong&gt;The processing time will grow exponentially&lt;/strong&gt;.  So set the maximum to as small a number as reasonably possible.  In my case I found 3 to be a good value.&lt;/p&gt;

&lt;p&gt;Next we'll remove phrases found less than the minimum required number of times.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="n"&gt;phrases&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;phrases&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;minimum_repeat&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;And last we &lt;strong&gt;remove sub-phrases&lt;/strong&gt; unless they are found much more frequently than their longer counterparts. I found this to be the most interesting and useful problem to solve to get quality results.  I set a threshold of 25% deviation in count, meaning if the shorter sub-phrase is found often outside the longer phrase, we'll include both in the output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;longest_phrases&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="n"&gt;keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phrases&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;phrase&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;found&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;l_phrase&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;longest_phrases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# If the entire phrase is found in a longer tuple...
&lt;/span&gt;        &lt;span class="n"&gt;intersection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l_phrase&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;intersection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phrase&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;intersection&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phrase&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="c1"&gt;# ... and their frequency overlaps by 75% or more, we'll drop it
&lt;/span&gt;            &lt;span class="n"&gt;difference&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phrases&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;phrase&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;longest_phrases&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;l_phrase&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;longest_phrases&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;l_phrase&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;difference&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;found&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;found&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;longest_phrases&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;phrase&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;phrases&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;phrase&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;longest_phrases&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Let's test the output.  Here's sample input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"This is the first text where I want to catch some common phrases"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"This is a second text where I hope to catch some common phrases"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"This is a third text which should catch some common phrases"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"I'm a unique string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"A post with text"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The output of this function will be&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'catch'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'common'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'phrases'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'text'&lt;/span&gt;&lt;span class="p"&gt;,):&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'text'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'catch'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'common'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We've been running this algorithm on &lt;a href="https://socialsentiment.io"&gt;SocialSentiment.io&lt;/a&gt; for a few weeks with very positive results.  We track the sentiment of social media posts which reference publicly traded companies.  This function helps us find and display frequently found phrases in those posts.&lt;/p&gt;

</description>
      <category>python</category>
      <category>todayilearned</category>
    </item>
  </channel>
</rss>
