This post was originally published on my blog, jacklyons.me
Just recently I was asked to scrape a Wordpress blog for a client to audit of all their posts. Naturally, the first thought was to just export all the posts, however, after a quick google I stumbled upon the Wordpress REST API. Using the API allows you to make direct requests to any wordpress site and retreive a list of blog posts as a JSON object.
Give it a try right now. Punch this into your browser and you should get a list of my 10 most recent blog posts:
https://jacklyons.me/wp-json/wp/v2/posts
It's that easy! Inside each post object there is a huge amount of data. You can extract things like post date, post status, and much more. The API documetation states that you can only retreive a maximum of 100 posts per request. In this post I'll show you how to create a function that will get all your posts in a single go! This can be helpful when the site you're scraping has hundreds or thousands of posts.
Below I created a super simple HTML snippet that you can copy and paste into a basic HTML file. Note that I'm using some modern browser and ES2017 features so you'll have to use Chrome or Firefox. Also, it may take a little while if you are scraping a site with a few hundred or thousand posts.
If you have any questions, comments or feedback to improve, please just leave a comment :)
Discussion (9)
You can also get the
x-wp-totalpages
by making a HEAD request for the posts URL (/wp-json/wp/v2/posts/). This will return all the headers for the request and none of the content. If you need the total number of posts, there's another header you can get,x-wp-total
.To anyone stumbling across this post and trying the above code. Make sure you give enough time for the posts to be pulled. I kept thinking it wasn't working when it was actually still loading them all.
Wow! this is great! can i do this in react?
in componentDidMount(){} i guess.
regards
Sure if you wanna do this in react just pop it in a lifecycle hook. Let me know if you have any issues :)
Awesome Jack!
Quick question: How can I authenticate to access a secure site that I have a login and account for?
Not sure tbh ... but I'm sure some googling might provide an answer :) Otherwise you would need to make the content publicly accessible
Wow, a wealth of data. Thank you!
I am trying to fetch data from my free Wordpress blog but its not working
Can you share your code or the error message?