There are several reasons you may want to obtain municipal data.
You can present them nicely on your website in a form of an interactive map, or you can perform some geospatial analysis like finding distances to the closest facilities.
Or maybe you just want to play with F# language on real, tangible datasets?
In this post, I explain how to get any data from OpenStreetMap, convert it to geojson, and parse for wanted places so you can take them and have further adventures with F# and real data.
You can download the notebook here
*You still have to generate geojson file on your own as described in the post
Open Street Map / Open City Data
OpenStreetMap is an excellent source of city data that is absent or not covered enough by the open data portal maintained by city governors.
This is especially useful for small cities and villages (that don't have data portal at all) or to quickly get data that have potential commercial value.
Be aware, that in contradiction to open city data, OSM data are not complete and hence not perfect for many business scenarios. The data are most often injected by plain users, which puts the OSM data schema robustness ...at risk.
Even if you are among lucky citizens where the data portal shines (please check ~200 geojson files for Rostock (Germany): https://www.opendata-hro.de/dataset/?res_format=GeoJSON) you can still find OSM valuable and complementary to aforementioned.
Examples
I'm using data for London, as it is a huge city that still lacks some publicly available open data. Also, the data volume itself requires special steps not needed for smaller areas. I parse rivers, addresses, shops, and leisure; however, any facility for any city can be obtained in the same (or almost the same) manner.
What I do present here is zero rocket science, just grabbing data, converting to geojson, and parsing. I'm sharing it as it can be tedious for someone who never worked with geojson in C#/F#.
Yes, you can write an idiomatic functional parser, "type provider" as well, however, I'm not sure it will make working with these data more pleasant and approachable to the average programmer (taking into account OSM data schema flaws)
Getting data from Open Street Map
Pythonists have plenty of OSM (and all relevant geospatial) libraries at their disposal, there are some libraries for dotnet as well.
However, I like to work primarily with raw geojson files as it is a well-known, human-readable format so I can go through the file to understand how to process the wanted data.
Go to OSM
and you will see some map tiles:
You have at least two options to select the city that is relevant to you:
- scroll accordingly to cover the wanted area and press export -> overpass API
- or type wanted area name and then click 'export'.
Be aware that these options have limitations in size, you cannot download too large areas this way. For example, I'm not able to download the whole London data.
In such scenarios, you can download the appropriate datasets via plenty of pre-generated files. For London you can find it here:
https://download.geofabrik.de/europe/great-britain/england/greater-london.html
Converting to geojson
After download, we have ~1.5 GB OSM file
To convert it to geojson I'm using npm package called ... osmtogeojson. For those who are not working with nodejs on a daily basis it is very simple: just install the tool globally
https://github.com/tyrasd/osmtogeojson
For London file I'm getting out of memory error hence need to run the program, based on the clue from the documentation:
The conversion takes ~1-2 minutes for such large file.
For smaller cities it can be simpler to download data in *.shp format and convert to geojson via mapsharper.com, https://mygeodata.cloud/, ArcGis or many, many other options.
As we have a file, we need something to process it.
In my case, it is .NET Interactive extension for Visual Studio Code.
.NET Interactive
You can now open downloaded notebook in Visual Studio Code.
The notebook (*.dib) I provide for you expect the geojson file to be in same place (otherwise please adjust the path accordingly).
The very fist notebook cell loads geojson file and parse it to get the items. There are almost 2mln of items processed in 6 seconds. Not that bad.
Now get some city goodies, starting with ...water.
But only rivers that have some name assigned.
This ...TryGetProperty.. from JSON deserializer is not perfect here, definately could be wrapped somehow but I just want here to show how things work without DSL.
Output of the code depends on extension you have installed. Unless you have some renderer extension from marketplace you will see just raw text.
My favorite one is the Unfolded Map Renderer extension created by Taras Novak, however it has been made private recently (unless you are supporter as I'm). You can still use one of his many other extensions available here:
https://marketplace.visualstudio.com/publishers/RandomFractalsInc
If you want to store the result of your filter in a separate geojson file, you can do it by
:
and host on your website, open and play with any other tool like geojson.io
You can query whatever facility you want:
... using as many functions from collection modules as you want
Random addresses
What about addresses ? Well, I rarely need to process them this way as typically municpal portals enables them or they are available via Open Addresses. Apparently London does neither of them ( I believe this is because of a private sector etc).
Addresses in OSM are (from my best knowledge) NOT a standalone object, it means they are just attributes of associated objects.
It can end up with many objects having the same address, and what is more important: they can be either points of polygons. And we most likely want to have unified shapes like point.
First, let get them with a familiar approach that additionally does some distinctions.
Is this 113k addresses a correct number of real addresses ? I don't think soo but it still is usefull dataset for experiments.
Using NetTopologySuite for geospatial analysis
We already did a lot, but in order to do more we have to introduce dedicated library/package. In dotnet it is NetTopologySuite.
You can achieve a lot from the geospatial perspective with this library, here we just want to get centroids for polygons. Additionally I'm getting rid of unwanted properties, keeping only street, number and city.
Important remark!
NetTopologySuite doesn't work nicely with .NET Interactive, it will hang forever unless you are using old interactive version.
So if you want to run the rest of cells
you have to temporary rollback to mentioned version:
There is a workaround suggested from .NET Interactive team to make it work with latest version, I will update the post as soon as I know how to apply it.
We started to process data with System.Text.Json.
NetTopologySuite has equivalent version for processing geojson called NetTopologySuite.IO.GeoJSON4STJ. However here I'm using the regular one as "STJ" doesn't work properly with my advanced processing (not covered here). Hopefully all at some point will be unified.
*Lets rand some addresses. *
Please note than when using projection library build on WebGL (like unfolded), displaying hundreds-of-thousands items is very smooth
What next
I presented how to obtain and process geojson municipal data.
Actually it should be called "preprocess" as I haven't touch any interesting processing from the user perspective. Now we could connect this with actual "domain" processing, join with other data sources and add true F# expressiveness on top of that.
But this is for another story.
Top comments (4)
Realy nice work.
Great to see you describing these data filtering tricks for the F# community. I've never done any F# myself, but I remember giving a workshop using the supermarket example myself way back in 2012.
Here I was describing the use of the "osmosis" tool, which lets you apply some filters to OSM data. osmconvert also does this. Folks might find that approach more useful if the geojson conversion you describe is too heavyweight (out-of-memory errors etc). Instead take the "osm.pbf" data, and use a separate command line tool to filter it down to the shop=supermarket tag before converting to the less compact geojson.
"The data are most often injected by plain users, which puts the OSM data schema robustness ...at risk".
Contributed by folks like you and me! If you didn't try contributing to OpenStreetMap yet, give it a go. e.g. maybe you can spot some little individual data fixes that can be made when doing this kind of filtering. In fact supermarkets are a good example of data that should be pretty complete and well mapped across the UK I would expect. They're an important type of POI, which are not too numerous for the community manage to reach a good >99% complete level. As you've mentioned in your code samples, you might also want to include shop=convenience in a supermarket analysis (depending on use case) which will pick up much smaller shops, but... mapping of these will be less complete. We need more folks joining in mapping such small shops in some areas of London!
The London OpenStreetMap community (of mappers and data users) get together in a pub quite often. Follow OSMLondon if you fancy it!
Thanks for the hints how to work better with OSM tooling as I barely know it. I can see that your notes will be valuable during my next adventures.
I wanted to work directly with the whole geojson just to find out what it contains through plain text search. That is how I have chosen supermarkets, not only seeing it is large but very tangible. Apparently good choice, as you did it the same.
My primarily goal was to enable playing with F# lang with tangible, day to day data and to switch between them immediately (like by changing property name in the filter in the F# notebook). However if they/I decide to stick to a particular dataset then using suggested way to convert will be very handy.
Indeed, where I live (Wrocław) there is also o lot of small shops missing which is a pity as nearby facilities can impact application to improve/analyze neighborhoods etc.
I'm preparing other samples for London and I joined the group, thanks for sharing!
Hey, there is also OsmSharp github.com/OsmSharp/core for reading/writing from/to different OSM formats.