DEV Community: Stefan Wagner

POC Custom Street View for Vienna

Stefan Wagner — Tue, 04 Apr 2023 20:32:52 +0000

TLDR

A POC of a Street View client using open data: https://webkappa.karten.wien

Intro

In 2020 my beautiful hometown - the City of Vienna, Austria ("Stadt Wien") completed mapping the streets with Lidar Scanners and Cameras mounted on cars.

The good thing: The collected data is completely Open Data and so you can create your own "Street View" based on it, completely independent from Google. Using the laser scanner data even more useful things can be created, the City of Vienna will use it e.g. for traffic planning in the future.

I tried to create a "Street View" client based on the provided images.

Data

The images where taken by 6 cameras, all in all ~250 megapixels.

Data can be requested via an online tool - after some hours or days of internal processing, you get a link of a .tar file containing images and meta data:

The file image_meta.txt is a TSV of all images taken for the requested set:

It contains the following columns (unused columns are omitted):

trajectory_id ID of trajectory (part of folder name)
sensor_id ID of the camera (essential to get the direction of the camera/part in the cubemap - see later)
x_m/y_m Latitude and longitude/position of the car (projection EPSG:31256)
rz_rad Z-Rotation = facing of car in radiants

The photos are stored in way, that a "cubemap" consisting of 6 single photos can be created (negative X,Y,Z and positive X,Y,Z) - the "sensor" = camera used are mapped to the cubemap position as follows:

sensor id ending in 0: positive Y
ending in 1: positive Z
ending in 2: positive X
ending in 3: negative Z
ending in 4: negative X
ending in 5: negative Y

The unfolded cubemap would look like this:

Each photo has a resolution of 7130x7130 and has about 5MB, thus the complete cubemap has up to 60MB. So even about 100 meters of street consists of about 2 GB of raw photo data (one cubemap every ~ 3 meters). That's why for this POC the images are scaled down to 2048x2048 later.

Compiling the data

Enough of theory, let's get physical

A node.js was created to do the following:

Retrieve list of images from image_meta.txt TSV
Scale images down to 2048x2048 using graphicsmagick and rename to a unique filename (<hash of position>_{p,n}{x,y,z}.jpg)
Transform position of cubemap to EPSG:4326
Create a JSON file for each cubemap to store meta data (e.g. facing)
Save position, image filenames to postgres/postgis database

The backend

Now that we have the data in place, there's a need of a small backend to provide the frontend with 2 sets of information

Get all possible positions to mark them on a map
Get the nearest position based on a given coordinate

As we have all the information for that in our database, a small express based backend was set up fast.

One API route just queries SELECT array[st_y(position), st_x(position)] as ll FROM cubemaps and returns the positions.
The second route basically does the some, but orders by distance to the given position using st_distance

The frontend

A full screen map based on leaflet was set up. It initialy retrieves information about all possible positions and displays it as blue circles using L.circle.

A click on the map gets the nearest position from the backend and opens a panorama viewer based on three.js and panolens.

A small map in the bottom right corner shows the current facing of the panorama viewer and also all other possible positions within 500 meters.

Navigation is done by clicking the positions there - you can also navigate by clicking inside the cubemap - but that's not done yet and kind of buggy right now.

Where can I see it?

Here (It's german, sorry): https://webkappa.karten.wien

Still todo

Cleanup, cleanup, refactor, refactor
Think about mobile version
Find memory leaks (switching between cubemaps is done pretty dumb)
Import more Cubemaps (Unfortunately the download tool seems to be overloaded currently)
Make navigation within the panorama possible (like Google does it...)

Thanks!

Creator and maintainers of leaflet
Creator and maintainers of panolens.js
Anita Graser for exploring the sample data of 2020 blog article
The City of Vienna for making this data open!

Archive PostGIS location data and keep it queryable with AWS Athena

Stefan Wagner — Wed, 23 Nov 2022 20:40:10 +0000

tl;dr

AWS Athena is great :)

The task

Backup tens of millions of location data of cars stored in PostgreSQL/PostGIS, but keep it queryable easily.

Side condition: Use as less code as possible

The input

A PostgreSQL/PostGIS Table in more or less the following structure

id: integer
timestamp: timestamp
location: postgis.geometry(Point, 4326)
speed: double
carid: integer

At the time of starting the task, the table was filled with tens of millions of lines, bloating indexes and making querying very slow

The data is essential, but accessed rarely, especially for data older than a month.

The postgres query

After some struggeling with Athena data types, I finally got the following query to backup data and created a view:

CREATE VIEW v_locations_athena as 
SELECT 
to_char(TIMESTAMP, 'YYYY-MM-DD HH24:MI:ss') /* timestamp in format for Athena* /,
postgis.st_x(LOCATION) AS longitude /* Longitude */, postgis.st_y(LOCATION) AS latitude /* Latitude */,
speed,
carid
FROM locations

The export

A simple bash script was created, doing the query above, export it to CSV, gzip it and transfer it to an AWS S3 bucket

psql -c "\copy (SELECT * from v_locations_athena where timestamp < '2022-08-01')  TO '/tmp/locations_2022-08-01.csv' DELIMITER ',' CSV"
gzip /tmp/locations_2022-08-01.csv
aws s3 cp /tmp/locations_2022-08-01.csv s3://bucketname/locations/

The Athena table

On Athena, I created a table in the same structure as the exported CSV:

create EXTERNAL TABLE IF NOT EXISTS locations_live.locations (
  `timestamp` timestamp,
  `longitude` double,
  `latitude` double,
  `speed` float,
  `carid` int
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ','
) LOCATION 's3://bucketname/locations/'
TBLPROPERTIES ('has_encrypted_data'='false');

Note to use the same delimiter in both Athena and CSV Export!

The Athena query

Now that everything is in place, we can start querying data and as we use Athena without setting up any kind of server!

Example to query all locations within 100 meters of a specific point within a given time period:

SELECT
*
FROM locations_live.locations 
where timestamp >=cast('2022-01-01' as timestamp) and timestamp < cast('2022-02-01' as timestamp)
and st_distance(to_spherical_geography(st_point(longitude, latitude)), to_spherical_geography(st_point(12.3456, 49.12345))) < 100

The Athena result

As you can see, tens of millions of rows where scanned within < 30s without any server setup (forget the speed column, it had the wrong unit :-) ) and all that for about 0,006$

The cronjob

Now everything I need to do is upload a CSV e.g. every month, delete old data from database and I'm able to query old data at a very low price (~ 0,0245$ per GB per month in EU)

The Good

Low pricing for storage (~ 0,0245$ per GB per month in EU)
No effort or cost to setup server / query infrastructure
Low pricing for query (~ 5$ per TB scanned) - so in the example above one query costs only about 0,006$

The Bad

Would it be personal customer data (it's not) - it would cause GDPR troubles - even if EU is selected for S3 and Athena; Amazon is a US company, so unfortunately a no-go in terms of GDPR.
Query times vary a bit
Query results are automatically stored to S3 again - so pay attention to that!

The learnings

gzip/compress your data - scanning is faster and pricing is lower (you always pay per scanned data, so less data -> lower price). Additionaly S3 upload is faster. So it's a win/win/win situation :)
pay attention to timestamp formats, bugged me a lot!
Athena has different SerDe (Serialization/Deserialization) options - I chose a simple CSV, but for future use I will definitely play with one of these: https://docs.aws.amazon.com/athena/latest/ug/serde-about.html

The outlook

Leverage Partioning (https://docs.aws.amazon.com/athena/latest/ug/partitions.html) by splitting CSV by month/year/day(?) to speed up querying and reduced costs even more
I started to do the same for a table with heavy jsonb fields usage - interesting new problems here :)