Oscar Leo

Posted on Mar 6, 2024

How I Extract Data From My Medium Stories

#python #medium #tutorial #datascience

Hi, my name is Oscar Leo, and I'm learning about technology, entrepreneurship, and online content creation.

It's a lot of fun, and if you agree, you should have a look at my free newsletter! 😄

In this post, I want to show you how I extract and use data from my stories on Medium.com.

Step 1: Go to stats and select a story

First, you choose a story of interest on your stats page. When you do, you only see the data from the current month, but you can change that using the monthly dropdown.

Next, you open the developer tools (for example, by right-clicking anywhere on the page and clicking select).

If you change the selected month and go to Network → GraphQL → Response, you can see the JSON data from the server.

I have a folder for each story I want to look at, and I create a JSON file for each month, as shown in the screenshot below.

It's a bit too manual, but it's good enough since historical data doesn't change.

Step 2: Reading the data in Python

I use Python to look at the data, and the first step is to utilize the following function that lists all JSON files in my story folder.



def get_files(story_folder):
    files = os.listdir(story_folder)
    files = [f for f in files if ".json" in f]
    files.sort()
    return files

I have one function to structure the earnings from the JSON data that looks like this.



def get_earnings(data):
    earnings = data["post"]["earnings"]["dailyEarnings"]
    earnings = [(
        str(pd.to_datetime(e["periodStartedAt"], unit="ms").date()), e["amount"]
    ) for e in earnings]

    return earnings

And one that works for all other stats.



def get_stats(data):
    stats = {}

    for d in data["postStatsDailyBundle"]["buckets"]:
        date = str(pd.to_datetime(d["dayStartsAt"], unit="ms").date())

        if date not in stats.keys():
            stats[date] = {"member": {}, "nonmember": {}}

        stats[date][d["membershipType"].lower()] = {
            "readersThatRead": d["readersThatReadCount"],
            "readersThatViewed": d["readersThatViewedCount"],
            "readersThatClapped": d["readersThatClappedCount"],
            "readersThatReplied": d["readersThatRepliedCount"],
            "readersThatHighlighted": d["readersThatHighlightedCount"],
            "readersThatFollowed": d["readersThatInitiallyFollowedAuthorFromThisPostCount"],
        }

    return stats

I call these three functions using the following primary function, which returns a dictionary with all the data for that story.



def get_story_stats(story_folder):
    files = get_files(story_folder)
    story_stats = {}

    for file in files:
        with open(story_folder + file) as f:
            data = json.load(f)[0]["data"]
            story_stats = {**story_stats, **get_stats(data)}
            for date, amount in get_earnings(data):
                story_stats[date]["earning"] = amount

    return story_stats

Here's what it looks like in my Jupyter Notebook.

Now, I run my analysis or create a data visualization.

Example: Visualizing my story data

I will cover the code for creating my story data visualization in another post, but here's what it looks like.

I have created visualizations like this one for my most successful stories, allowing me to detect patterns and differences quickly.

Conclusion

With a few minutes of manual labor, you can extract data from your stories quickly (until there's a better alternative).

With access to raw data, you can better understand why some of your stories perform better than others.

And you can create beautiful data visualizations that tell you everything you need to know in one look.

That's it, see you next time.

Top comments (1)

Nishu Jain • Sep 13 '24

Great article, Oscar!

Check out mediumapi.com. I think you'll love it!

DEV Community

How I Extract Data From My Medium Stories

Step 1: Go to stats and select a story

Step 2: Reading the data in Python

Example: Visualizing my story data

Conclusion

Top comments (1)

Read next

GO:lack of synchronization

Unlock the Power of Custom Formatting in Go: A Deep Dive into the Formatter Interface

Resolving Client Secret Expiry Issues in Microsoft Graph Data Connect for SharePoint

FLUX: Breakthrough 1.58-bit Neural Network Compression Maintains Full Accuracy While Slashing Memory Use by 20x