Extracting Publication Data from Substack

#web #substack #browser #api

If you have a substack account, like me, you are probably wondering how the articles are stored and how to fetch them manually, right? Well, it will be a short post trying to explain that.

The end-point to get the JSON object describing the post is /api/v1/drafts/${post_id} where post_id is an integer (a reference to your post).

curl https://${account_name}.substack.com/api/v1/drafts/${post_id}

The headers used for doing this request:

Accept: */*
Accept-Encoding: gzip, delfate, br, zstd
Accept-Language: en-US
Cookie: ${your_cookie}
Priority: u=1, i
sec-ch-ua: 
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Linux"
sec-fetch-dest: 
sec-fetch-mode: cors
sec-fetch-site: same-origin
sec-gpc: 1
User-Agent: ${your_user_agent}

If successful, a JSON should be returned containing a huge amount of fields. Your post is stored in draft_body as stringified JSON.

{
   ...
   "draft_body": "",
   ...
}

The body of your post can be extracted now, it's a JSON object using the SubstackPost Document Model, if you want more information about it, you can check the DeepWiki page on that or Substack document format from can3p/substack-api-notes.

Anyway, I need to extract few values from that recently, I thought maybe it would help someone one day.

Cover Image by Tim Wildsmith on Unsplash