DEV Community

Robin Moffatt
Robin Moffatt

Posted on • Originally published at on

Some of my favourite public data sets

Readers of a certain age and RDBMS background will probably remember northwind, or HR, or OE databases - or quite possibly not just remember them but still be using them. Hardcoded sample data is fine, and it’s great for repeatable tutorials and examples - but it’s boring as heck if you want to build an example with something that isn’t using the same data set for the 100th time.

I’ve written before about one of my favourite resources for mocking data, Mockaroo, and how you can even use it to stream mock data into Kafka. Other mock data generators for Kafka include kafka-connect-datagen and Voluble.

Sometimes though, you just want some real, live, warts-and-all data. And there is fortunately a real shift in governments and public bodies in recent years to Open data. Here is a list of some of my (UK-centric) resources. Many have a mix of live and static datasets.

slide 3

  • Transport for London (TfL) - Great source of data about the capital’s transport system, including lots of live feeds

  • Network Rail - a nice feed of data all about the UK rail network. I had fun with this data here :)

What are your go-to sources for real data? Comment below or let me know on Twitter.

Top comments (3)

saubury profile image
Simon Aubury

Worth highlighting the Splitgraph Data Delivery Network. It hs 40,000 public datasets. It's essentially a PostgreSQL proxy - so you can access with any PostgreSQL client.

juliannicholls profile image
Julian Nicholls • Edited

My favourite of the UK government data pages is Highways Agency Roadworks.

I have an app running here which I update each week with the latest data. Code on Github.

I've also done a local travel app here, based on the data at Transport API.

joelgregory11 profile image
Joel Gregory

I loved the idea! Car Detailing Services Thanks for all you do for the community!