<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Eric P Green</title>
    <description>The latest articles on DEV Community by Eric P Green (@ericpgreen).</description>
    <link>https://dev.to/ericpgreen</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F521875%2F4e7db276-225e-47fd-8b18-a0d62c300b34.PNG</url>
      <title>DEV Community: Eric P Green</title>
      <link>https://dev.to/ericpgreen</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ericpgreen"/>
    <language>en</language>
    <item>
      <title>Analyzing the r/wallstreetbets hivemind — August 2021</title>
      <dc:creator>Eric P Green</dc:creator>
      <pubDate>Wed, 08 Sep 2021 10:06:03 +0000</pubDate>
      <link>https://dev.to/ericpgreen/analyzing-the-r-wallstreetbets-hivemind-august-2021-3gjm</link>
      <guid>https://dev.to/ericpgreen/analyzing-the-r-wallstreetbets-hivemind-august-2021-3gjm</guid>
      <description>&lt;p&gt;The activity in the Reddit r/wallstreetbets community is staggering. Each day, there are around 800 posts and 50,000 comments debating approximately 280 different stocks. But by just browsing Reddit, between the memes and degenerate gamblers, it can be hard to understand the full nature of the discussion.&lt;/p&gt;

&lt;p&gt;In this post, I’ve turned to a bit of SQL and Python to explore what’s happening in the wallstreetbets hivemind. I’ve analyzed &lt;strong&gt;stock popularity&lt;/strong&gt;, &lt;strong&gt;sophisticated but overlooked discussions&lt;/strong&gt;, and &lt;strong&gt;community influencers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you’re interested, here’s the &lt;a href="https://beneath.dev/examples/reddit"&gt;raw Reddit data&lt;/a&gt;, my &lt;a href="https://github.com/beneath-hq/beneath/blob/master/examples/wallstreetbets-analytics/stock-mentions/find_mentions_pipeline.py"&gt;data pipeline&lt;/a&gt;, the &lt;a href="https://beneath.dev/examples/wallstreetbets-analytics/"&gt;derived data&lt;/a&gt;, and my &lt;a href="https://github.com/beneath-hq/beneath/blob/master/examples/wallstreetbets-analytics/explore/explore.ipynb"&gt;Jupyter notebook&lt;/a&gt;. I’m using &lt;a href="https://about.beneath.dev/"&gt;Beneath&lt;/a&gt;, an open data platform I’m building, to stream and save the data.&lt;/p&gt;

&lt;p&gt;Btw, this isn’t investment advice… DYOR.&lt;/p&gt;

&lt;h2&gt;
  
  
  The meme stock rankings
&lt;/h2&gt;

&lt;p&gt;Let’s start with the basics. What are the most discussed stocks and how have they changed over time?&lt;/p&gt;

&lt;p&gt;The stocks on wallstreetbets can be broadly bucketed into two categories: long-standing wallstreetbets’ interests and stocks related to current events. We can see these two categories by inspecting a line graph of mentions over time (select stocks):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9Ai8Y82V--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/42y236j2l32kgn0dozbi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9Ai8Y82V--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/42y236j2l32kgn0dozbi.png" alt="ebbs-and-flows"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;a href="https://chart-studio.plotly.com/~ericpgreen/8.embed"&gt;Click here to view interactive chart&lt;/a&gt;





&lt;p&gt;The long-standing wallstreetbets interests jump out: these are the lines that occupy a significant percentage of mentions over the whole time period. These include staples like Gamestop (GME) and AMC, but the community has also long been tracking Clover Health (CLOV) and AMD, the semiconductor manufacturer. It doesn’t look like wallstreetbets will lose interest in these anytime soon.&lt;/p&gt;

&lt;p&gt;On the other hand, we see stocks that spike suddenly due to specific events, such as Robinhood (HOOD) and Microvast (MVST), a lithium-ion battery manufacturer. Both of these stocks went public at the end of July and received bursts of attention from the community, but the interest hasn’t lasted. As of September 1st, both stocks now have a near-zero percent share of daily mentions.&lt;/p&gt;

&lt;p&gt;In this next chart, we zoom in on the most discussed stocks in the month of August.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--fjqyMWYX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6ewmfe7ycqhb9s2a0qct.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fjqyMWYX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6ewmfe7ycqhb9s2a0qct.png" alt="top10"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;a href="https://chart-studio.plotly.com/~ericpgreen/6.embed"&gt;Click here to view interactive chart&lt;/a&gt;





&lt;p&gt;GME and AMC have long been community favorites, and even in August, they remain the most mentioned stocks. I’ve been collecting data from wallstreetbets since March, and the two companies have been the most discussed for 4 of these 6 months. But it’s also clear that it’s not a power law distribution, and contending stocks get significant discussion, too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discussions of the next big thing
&lt;/h2&gt;

&lt;p&gt;The NASDAQ includes over 4000 public equities, and the NYSE over 3000, so how does the community come to rally around certain stocks? One of my hypotheses is that some initial post triggers a deep and unique discussion that ultimately leads to community-wide attention. So, let’s try to find some interesting conversations.&lt;/p&gt;

&lt;p&gt;I’ve tried to uncover some under-appreciated discussions by filtering for posts with at least 15 comments and 25 upvotes, and sorting those posts by highest average words per comment. Here are the top 10 for the month of August:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jt5zhdhp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/74b9htfi6a8qijrlrkv8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jt5zhdhp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/74b9htfi6a8qijrlrkv8.png" alt="sophisticated"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;a href="https://chart-studio.plotly.com/~ericpgreen/12.embed"&gt;Click here to view interactive chart&lt;/a&gt;





&lt;p&gt;&lt;em&gt;If you’d like to read the discussions, the &lt;a href="https://chart-studio.plotly.com/~ericpgreen/12.embed"&gt;interactive chart&lt;/a&gt; includes a link to each post’s page on reddit.com. Note that the numbers might not reflect what you see on reddit.com because comments can be edited and deleted after-the-fact, and scores are continually changing.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The posts above reveal fairly educated discussions about storylines that, for the most part, haven’t yet hit the wallstreetbets front page. Stocks like Lordstown Motors (RIDE), Ford (FORD), and Proterra (PRTA) haven’t yet garnered much attention, but, in light of these deep discussions, they could be worth keeping an eye on.&lt;/p&gt;

&lt;p&gt;Another hypothesis I wanted to test is that the share of rocket emojis in a discussion could signal a stock’s momentum within the community. Here’s a ranking of the posts from August that had the highest percentage of commenters include a rocket emoji (filtered for posts with at least 25 comments):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MQ_q_yhg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3o0ktx9u5b09aknl69js.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MQ_q_yhg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3o0ktx9u5b09aknl69js.png" alt="rockets"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;a href="https://chart-studio.plotly.com/~ericpgreen/4.embed"&gt;Click here to view interactive chart&lt;/a&gt;





&lt;p&gt;Unsurprisingly, these posts reveal a number of meme stocks that have already made it to the front-page, like CLOV and WISH. But there are also companies like Pizza Hut (HUT) and Bitfarms (BITF) that haven’t made it to the front page (yet?). They’re definitely worth watching.&lt;/p&gt;

&lt;p&gt;Behind every post and comment is a member of the wallstreetbets community. Let’s find out which authors are leading the discussion.&lt;/p&gt;

&lt;h2&gt;
  
  
  The influencers of wallstreetbets
&lt;/h2&gt;

&lt;p&gt;To identify influencers, I wanted to find the active authors who get the most upvotes on substantial, forward-looking posts. To that end, I’ve applied a couple criteria. First, I’ve excluded posts labeled as a “Meme,” “Gain,” or “Loss,” which are mostly retrospective. Second, I’ve filtered for authors who have posted at least once since July 1st. One of the most popular Redditors of all time was u/DeepF***ingValue, but his last post was on April 15th, and I want this analysis to be current.&lt;/p&gt;

&lt;p&gt;Here are the top authors since I started collecting data in March:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2Uav16iZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/39t4n8jq2l6iulwcejz7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2Uav16iZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/39t4n8jq2l6iulwcejz7.png" alt="influencers"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;a href="https://chart-studio.plotly.com/~ericpgreen/1.embed"&gt;Click here to view interactive chart&lt;/a&gt;





&lt;p&gt;The influencers that I found can be split into two categories: the analysts and the hype men.&lt;/p&gt;

&lt;p&gt;The analysts, like the two top authors u/quantkim and u/nobjos, contribute breaking news, technical analysis, and quantitative reports. For example, u/quantkim shares articles about GameStop’s corporate turnaround, like &lt;a href="https://www.reddit.com/r/wallstreetbets/comments/oesywr/gamestop_continues_expansion_of_fulfillment/"&gt;this one&lt;/a&gt;, and has averaged 11,332 upvotes over 15 posts.&lt;/p&gt;

&lt;p&gt;Conversely, the hype men typically talk up their big positions in popular stocks. Here’s one from u/dumbledoreRothIRA about &lt;a href="https://www.reddit.com/r/wallstreetbets/comments/osaz9c/600k_yolo_on_clov_bullish_as_ever_still_diamond/"&gt;a $600k position in $CLOV&lt;/a&gt;, and one from u/lookshee laying out his &lt;a href="https://www.reddit.com/r/wallstreetbets/comments/loio0b/i_am_going_to_buy_gamestop_all_of_it_dont_upvote/"&gt;intention to buy the entirety of the GameStop company&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;All the authors above clearly influence the community, so, to jump ahead of the crowd, it’d be smart to set up notifications for whenever they post.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s next
&lt;/h2&gt;

&lt;p&gt;My analyses in this post really just scratch the surface of what you can infer from wallstreetbets data — there’s much more to do. To extend this work, I’m currently considering factoring in price movements, doing sentiment analysis, and creating a bot that mines for insights in real-time.&lt;/p&gt;

&lt;p&gt;Last week, &lt;a href="https://www.wsj.com/articles/wall-street-is-looking-to-reddit-for-investment-advice-11630056648"&gt;a Wall Street Journal article detailed that forward-thinking hedge funds are diving into the r/wallstreetbets data&lt;/a&gt;. By making this data public and queryable on Beneath, I hope I’ve made it more accessible to the everyday person!&lt;/p&gt;

&lt;p&gt;If you’re interested in any of this, come hangout in the Beneath &lt;a href="https://discord.gg/f5yvx7YWau"&gt;Discord community&lt;/a&gt;, follow me on twitter &lt;a href="https://twitter.com/ericpgreen2"&gt;@ericpgreen2&lt;/a&gt;, or &lt;a href="https://beneath.dev/examples/wallstreetbets-analytics/"&gt;jump right into the data&lt;/a&gt; yourself 🚀🚀🚀&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Evolution of a data system</title>
      <dc:creator>Eric P Green</dc:creator>
      <pubDate>Mon, 05 Jul 2021 18:16:00 +0000</pubDate>
      <link>https://dev.to/ericpgreen/evolution-of-a-data-system-oei</link>
      <guid>https://dev.to/ericpgreen/evolution-of-a-data-system-oei</guid>
      <description>&lt;p&gt;The holy grail of data work is putting data science into production. But without an extensive data engineering background, you might not know how to build a production data system. In this post, I'll show how you can turn a machine learning model into a production data app by laying out the high-level system design of a simple Reddit analytics tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s analyze the seriousness of Reddit posts
&lt;/h2&gt;

&lt;p&gt;Reddit is a serious place for serious people, but sometimes subreddits become corrupted by miscreants who spread useless banter. To avoid such unpleasantries, we want to build a web app that can advise us of the seriousness of different subreddits.&lt;/p&gt;

&lt;p&gt;For our project, we’ll use machine learning to score the seriousness of every individual Reddit post. We’ll aggregate the scores by subreddit and time, and we’ll expose the insights via an API that we can integrate with a frontend. We want our insights to update in near real-time so we’re reasonably up-to-date with the latest posts.&lt;/p&gt;

&lt;p&gt;So we’re clear on what the system should do, here’s the API interface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/subreddit/[name]&lt;/code&gt;: Returns a) a subreddit’s posts and their seriousness scores, b) an all-time seriousness score, and c) hourly seriousness scores for the last week&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/subreddits&lt;/code&gt;: Returns all subreddits we track and the all-time seriousness score for each&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s dive in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 1: building the data ingestion engine
&lt;/h2&gt;

&lt;p&gt;To start, we want to extract posts from Reddit and write it into our own storage system. Our storage system will have two components: a message queue and a database.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Message queue&lt;/strong&gt;: We’ll use a message queue to both store and enrich data in real-time. We’ll use RabbitMQ to keep things simple.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database&lt;/strong&gt;: We’ll use a database to permanently store and serve the data. Our API server will get its data from here. We’ll use Postgres, the do-it-all relational data store.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With our storage system in place (in theory), let’s write the first scripts of our data pipeline.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reddit scraper&lt;/strong&gt;: This script polls the Reddit API every second and writes new posts to a &lt;code&gt;posts&lt;/code&gt; topic in our message queue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;posts&lt;/code&gt; consumer&lt;/strong&gt;: This script reads data from the &lt;code&gt;posts&lt;/code&gt; topic and inserts it into our Postgres database.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We need a way to deploy and run our code in production. We like to do that with a CI/CD pipeline and a Kubernetes cluster.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD pipeline&lt;/strong&gt;: On every git commit, we’ll build our code as a Docker container, push it to a container registry, and deploy it to Kubernetes. GitHub Actions makes this easy to set up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes cluster&lt;/strong&gt;: Kubernetes is a platform for running containerized code. Kubernetes can also store our database and Reddit credentials, and inject them into our containers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’ll use a cloud provider to provision the message queue, database and Kubernetes cluster. We prefer managed services when they’re available, so we won’t deploy the message queue or database directly on Kubernetes.&lt;/p&gt;

&lt;p&gt;Here’s a diagram of what our system looks like so far:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---nYzNDIW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/p06zvae1rmc6jfqesdp8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---nYzNDIW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/p06zvae1rmc6jfqesdp8.png" alt="Phase 1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once all this is up and running, we need to validate that the data is flowing. An easy way to do that is to connect to our Postgres database and run a few SQL queries to check that new posts are continually added. When everything looks good, we’re ready to move on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 2: training the machine learning model
&lt;/h2&gt;

&lt;p&gt;Now that we have the raw data in Postgres, we’re ready to develop our moneymaker, the &lt;em&gt;seriousness&lt;/em&gt; scoring model. For this example, we’ll keep things simple and use a Jupyter notebook that pulls historical posts from the Postgres database.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Jupyter notebook&lt;/strong&gt;: Inside the notebook, we label some training data, train and assess our model, and save the model to a file. Then our production code will be able to load the file to make inferences.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that there are other ways to train a machine learning model. Fancy “MLaaS'' and “MLOps” tools can help you continuously train, monitor and deploy models. If you want to integrate with one of these tools, you’ll likely connect your database to enable training, and you’ll ping an API to make an inference.&lt;/p&gt;

&lt;p&gt;Here’s our system augmented with our ML development environment:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8c-i0rOg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/f13oe9uos0zei9dk0fle.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8c-i0rOg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/f13oe9uos0zei9dk0fle.png" alt="Phase 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 3: applying the model and aggregating the scores
&lt;/h2&gt;

&lt;p&gt;Now it’s time to build the workers that will apply the model to new posts, and write out the resulting seriousness scores. That’s two different scripts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;posts&lt;/code&gt; enrichment&lt;/strong&gt;. This script consumes the Reddit &lt;code&gt;posts&lt;/code&gt; topic, applies the predictive model, and writes the data back to another topic &lt;code&gt;posts-scores&lt;/code&gt;, which will contain post IDs and seriousness scores.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;post-scores&lt;/code&gt; consumer&lt;/strong&gt;. This script reads data from the &lt;code&gt;posts-scores&lt;/code&gt; topic and inserts them into (a separate table in) our Postgres database.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next up, we want to aggregate our results by subreddit and time. We’ll use dbt, which allows us to schedule periodic SQL queries. We’ll schedule two aggregating queries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Roll up new score&lt;/strong&gt;s: We’ll run this query every five minutes. On every run, it’ll calculate the mean scores of new posts and save the results to a table &lt;code&gt;subreddit-scores-5min&lt;/code&gt; in Postgres.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute total score&lt;/strong&gt;: This is a heavier query, so we’ll only run it once a day. It will compute each subreddit’s total seriousness score (across all time) and save the results to a table &lt;code&gt;subreddit-scores-total&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With that, we have all the data that we want for our app available in Postgres. Here’s what the system looks like now:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6yu-sfTH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1wis8nvudot1r8woqxn7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6yu-sfTH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1wis8nvudot1r8woqxn7.png" alt="Phase 3"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 4: completing a web app
&lt;/h2&gt;

&lt;p&gt;Our last step is creating the interfaces for accessing our Reddit insights. We need to set up a backend API server and write our frontend code.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;API server&lt;/strong&gt;. The API server will fetch the insights from Postgres and serve the results to the frontend. It’ll implement the routes we specified in the introduction. We’ll build the API server in Python using the FastAPI framework.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Frontend client&lt;/strong&gt;. The frontend will contain tables and charts for viewing and searching the insights. We’ll implement it with React and use a fancy charting library like Recharts.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Deploy the API server and frontend code to Kubernetes, and we have ourselves a full stack analytics application! Here’s what the final design looks like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kZDIqg_R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yn9fgil1180htbnen4vp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kZDIqg_R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yn9fgil1180htbnen4vp.png" alt="Phase 4"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Reviewing the stack
&lt;/h2&gt;

&lt;p&gt;Our Reddit analytics app is now ready to share with the world (at least on paper). We’ve set up a full stack that spans data ingest, model training, real-time predictions and aggregations, and a frontend to explore the results. It’s also a reasonably future proof setup. We can do more real-time enrichment thanks to the message queue, and we can do more aggregations thanks to dbt.&lt;/p&gt;

&lt;p&gt;But the system does have its limitations. For scalability, we’re limited by the throughput of Postgres and RabbitMQ. For latency, we’re limited by the batched nature of dbt. To improve, we could add BigQuery as a data warehouse, use Kafka as our message queue, and add Flink as a real-time stream processor, but these powerful systems also come at the cost of greater complexity.&lt;/p&gt;

&lt;p&gt;While there are always different tools you can use for the same job, this data system design is fairly standard. I hope it gives you perspective on what it takes to build a live analytics-centric web application.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>datascience</category>
      <category>database</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Turn a Pandas DataFrame into an API</title>
      <dc:creator>Eric P Green</dc:creator>
      <pubDate>Thu, 10 Jun 2021 13:20:22 +0000</pubDate>
      <link>https://dev.to/ericpgreen/turn-a-pandas-dataframe-into-an-api-57pk</link>
      <guid>https://dev.to/ericpgreen/turn-a-pandas-dataframe-into-an-api-57pk</guid>
      <description>&lt;p&gt;Pandas DataFrames are my favorite way to manipulate data in Python. In fact, the end product of many of my small analytics projects is just a data frame containing my results.&lt;/p&gt;

&lt;p&gt;I used to dump my dataframes to CSV files and save them to Github. But recently, I've been using &lt;a href="https://about.beneath.dev" rel="noopener noreferrer"&gt;Beneath&lt;/a&gt;, a data sharing service I'm building, to save my dataframes and simultaneously turn them into a full-blown API with a website. It's great when I need to hand-off a dataset to clients or integrate the data into a frontend.&lt;/p&gt;

&lt;p&gt;In this post, I'll show you how that works! I'm going to fetch GitHub commits, analyze them, and use Beneath to turn the result into an API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup Beneath
&lt;/h2&gt;

&lt;p&gt;To get started, you need to install the Beneath &lt;code&gt;pip&lt;/code&gt; module and login with a free Beneath account. It's pretty easy and the docs already cover it. Just follow &lt;a href="https://about.beneath.dev/docs/quick-starts/install-sdk/" rel="noopener noreferrer"&gt;these steps&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Make sure to remember your username as you'll need it in a minute!&lt;/p&gt;

&lt;h2&gt;
  
  
  Let's analyze some data
&lt;/h2&gt;

&lt;p&gt;I think Github activity is a fascinating, underexplored data source. Let's scratch the surface and look at commits to... Pandas! Here's a quick script to fetch the &lt;code&gt;pandas&lt;/code&gt; source code and aggregate some daily stats on the number of commits and contributors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;

&lt;span class="c1"&gt;# Get all Pandas commit timestamps
&lt;/span&gt;&lt;span class="n"&gt;repo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pandas-dev/pandas&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    if [ -d &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; ]; then rm -Rf &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; fi;
    git clone https://github.com/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.git repo;
    cd repo;
    echo &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp,contributor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;;
    git log --pretty=format:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%ad,%ae&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; --date=iso
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shell&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Group by day and count number of commits and contributors
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StringIO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;parse_dates&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;date_parser&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;contributor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;commits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contributors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nunique&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rename_axis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;day&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, the &lt;code&gt;df&lt;/code&gt; variable contains our insights. If you're following along, you can change the &lt;code&gt;repo&lt;/code&gt; variable to scrape another Github project. Just beware that some major repos can take a long time to analyze (I'm looking at you, &lt;a href="https://github.com/torvalds/linux" rel="noopener noreferrer"&gt;torvalds/linux&lt;/a&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Save the DataFrame to Beneath
&lt;/h2&gt;

&lt;p&gt;First, we'll create a new project to store our results. I'll do that from the command-line, but you can also use the web &lt;a href="https://beneath.dev/-/create/project" rel="noopener noreferrer"&gt;console&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;beneath project create USERNAME/github-fun
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Just replace &lt;code&gt;USERNAME&lt;/code&gt; with your own username.&lt;/p&gt;

&lt;p&gt;Now, we're ready to publish the dataframe. We do it with a simple one-liner directly in Python (well, I split it over multiple lines, but it's still just one call):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;beneath&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;beneath&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_full&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;table_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USERNAME/github-fun/pandas-commits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;day&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Daily commits to https://github.com/pandas-dev/pandas&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are a few things going on here. Let's go through them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;table_path&lt;/code&gt; gives the full path for the output table, including our username and project.&lt;/li&gt;
&lt;li&gt;We use the &lt;code&gt;records&lt;/code&gt; parameter to pass our DataFrame.&lt;/li&gt;
&lt;li&gt;We provide a &lt;code&gt;key&lt;/code&gt; for the data. The auto-generated API uses the key to &lt;a href="https://about.beneath.dev/docs/reading-writing-data/index-filters/" rel="noopener noreferrer"&gt;index the data&lt;/a&gt; so we can quickly filter records. By default, Beneath will use our DataFrame's index as the key, but I prefer setting it manually.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;description&lt;/code&gt; parameter adds some documentation to the dataset that will be shown at the top of the table's page.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And that's it! Now let's explore the results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Explore your data
&lt;/h2&gt;

&lt;p&gt;You can now head over to the &lt;a href="https://beneath.dev/?noredirect=1" rel="noopener noreferrer"&gt;web console&lt;/a&gt; and browse the data and its API docs. Mine's at &lt;a href="https://beneath.dev/epg/github-fun/table:pandas-commits" rel="noopener noreferrer"&gt;https://beneath.dev/epg/github-fun/table:pandas-commits&lt;/a&gt; (if you used the same project and table names, you can just replace my username &lt;code&gt;epg&lt;/code&gt; for your own).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp6wmu2lv3cp5ccpa2bac.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp6wmu2lv3cp5ccpa2bac.png" alt="explore"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also share or publish the data. Permissions are managed at the project layer, so just head over to the project page and add members or flip the project settings to &lt;code&gt;public&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use the API
&lt;/h2&gt;

&lt;p&gt;Now that the data is in Beneath, anyone with access can use the API. On the "API" tab of the table page, we get auto-generated code snippets for integrating the dataset.&lt;/p&gt;

&lt;p&gt;For example, we can load the dataframe back into Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;beneath&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;beneath&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_full&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USERNAME/github-fun/pandas-commits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or we can query the REST API and get the commit info every day in May 2021:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://data.beneath.dev/v1/USERNAME/github-fun/pandas-commits &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;index &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nv"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{"day":{"_gte":"2021-05-01","_lt":"2021-06-01"}}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-G&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use the React hook to read data directly into the frontend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;useRecords&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;beneath-react&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;App&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;records&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;loading&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useRecords&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;table&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;USERNAME/github-fun/pandas-commits&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;index&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;{"day":{"_gte":"2021-05-01","_lt":"2021-06-01"}}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check out the &lt;a href="https://beneath.dev/epg/github-fun/table:pandas-commits/-/api" rel="noopener noreferrer"&gt;API tab&lt;/a&gt; of my dataframe in the Beneath console to see all the ways to use the data.&lt;/p&gt;

&lt;h2&gt;
  
  
  That's it
&lt;/h2&gt;

&lt;p&gt;That's it! We used Beneath to turn a Pandas DataFrame into an API. If you have any questions, I'm online most of the time in Beneath's &lt;a href="https://discord.gg/f5yvx7YWau" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; (I love to chat about data science, so you're also welcome to just say hi 👋). And let me know if you publish a cool dataset that I can spotlight in the featured projects!&lt;/p&gt;

</description>
      <category>python</category>
      <category>serverless</category>
      <category>datascience</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
