DEV Community

Cover image for Node.js for GIS: from google locations to GeoJSON using Streams
MatteoDiPaolo for One Beyond

Posted on

Node.js for GIS: from google locations to GeoJSON using Streams

Dealing with huge files has always been a challenging task to take care of. The memory consumption that this kind of processing requires is something to take into account independently from the language we are using and Node.js is no exception.

Let's see how node's streams can make this task bearable even for a process with minimal memory availability. Specifically we'll take advantage of streams in order to run a process that converts a Google Takeout Location History JSON into a GeoJSON.

The problem

We have as an input an array of locations that are not defined according to any of the Geographic Information System standards so we want to define them.

Google Takeout Location History input example:

{
  "locations": [
    {
      "timestampMs": "1507330772000",
      "latitudeE7": 419058658,
      "longitudeE7": 125218684,
      "accuracy": 16,
      "velocity": 0,
      "altitude": 66,
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

GeoJson output example:

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [ 12.5218684, 41.9058658 ]
      },
      "properties": {
        "timestamp": "2017-10-06T22:59:32.000Z",
        "accuracy": 16,
        "velocity": 0,
        "altitude": 66
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

The transformation that we want to perform is quite straightforward, we would like to apply to the entries of the locations array the following function.

const toGeoJson = googleTakeoutLocation => ({
  type: 'Feature',
  geometry: {
    type: 'Point',
    coordinates: [
      googleTakeoutLocation.longitudeE7 / 10000000,
      googleTakeoutLocation.latitudeE7 / 10000000,
    ]
  },
  properties: {
    timestamp: new Date(Number(googleTakeoutLocation.timestampMs)),
    accuracy: googleTakeoutLocation.accuracy
    velocity: googleTakeoutLocation.velocity
    altitude: googleTakeoutLocation.altitude
  }
})
Enter fullscreen mode Exit fullscreen mode

This could be achieved using a simple Array.map(), however if we try to process a 2GB Google Takeout Location History JSON in order to apply a map() over the locations array we are going to face the following outcome:

  • Error message: Cannot create a string longer than 0x3fffffe7 characters
  • Error code: ERR_STRING_TOO_LONG

We are dealing with a file that is far too large to be loaded into memory at once. Node cannot buffer the file for us because of the size of it.

The solution

The only way of dealing with these huge files is using a divide and conquer approach. Instead of loading them in memory all at once we are going to create a stream of data that is going to flow from the input file to the output one. This technique will allow us to manipulate small bits of data at a time, resulting in a slow but reliable processing that is not going to eat up all our memory.

Node.js Streams are the best tool to implement this technique. They allow us to create different pipes through which our data stream will flow and it can be steered and manipulated according to our needs.

There are four streams (pipes) types:

  1. Readable: data emitters, a given data source becomes a stream of data.
  2. Writable: data receivers, a given stream of data ends up into a data destination.
  3. Transform: data transformers, a given data stream is mutated into a new one.
  4. Duplex: data emitters and receivers at the same time

In order to accomplish our goal, what we will rely on is:

  • One readable stream (pipe) in order to get the data out of the Google Takeout Locations JSON.
  • A set of different transform streams (pipes) in order to modify our locations.
  • One writable stream (pipe) in order to store mutated locations into a GeoJSON output file.

Here is how the different pipes of our stream processing approach are going to look:

Image description

Let's see why we need so many pipes and what role each one of them plays:

  1. [Read] fileToStream → Input file to stream.
  2. [Transform] streamParser → Consumes text, and produces a stream of data items corresponding to high-level tokens.
  3. [Transform] streamPicker → It is a token item filter, it selects objects from a stream ignoring the rest and produces a stream of objects (the locations field in our case).
  4. [Transform] streamArrayer → It assumes that an input token stream represents an array of objects and streams out those entries as assembled JavaScript objects (locations array entries in our case).
  5. [Transform] streamGeoJsoner → It transforms google takeout locations into GeoJson locations.
  6. [Transform] streamStringer → It stringifies GeoJson locations.
  7. [Write] streamToFile → Stream to Output file.

The actual transformation to GeoJSON happens at point five and it looks like this:

const streamGeoJsoner = new Transform({
  objectMode: true,
  transform({ key, value }, _, done) {
    const googleTakeoutLocation = toGeoJson(value);
    count++;
    done(null, { key: count++, value: googleTakeoutLocation })
  }
});
Enter fullscreen mode Exit fullscreen mode

As you can see we are implementing our own version of a transform pipe in order to deal with objects coming from the pipe number 4 (streamArrayer) and to apply to them the mutation defined above in the article (toGeoJson).

Now that we have all the pieces (pipes) in our hands it is time to connect them and make our data flow into them. We are going to do that using the pipeline utility as follows:

pipeline(
  fileToStream,
  streamParser,
  streamPicker,
  streamArrayer,
  streamGeoJsoner,
  streamStringer,
  streamToFile
);
Enter fullscreen mode Exit fullscreen mode

Running the above pipeline is what is going to make us reach our goal, any google takeout location JSON, no matter how big it is, can be translated into a GeoJSON avoiding huge memory consumption.

If you are interested in the whole code you can find it here. What follows is the outcome of the described solution over 5 different input files - check out the logs and have a look at file size and processing time.

Image description

Top comments (1)

Collapse
 
servatj profile image
Josep

Nice Post!