DEV Community

Timothy Spann.   πŸ‡ΊπŸ‡¦
Timothy Spann. πŸ‡ΊπŸ‡¦

Posted on • Originally published at datainmotion.dev on

QuickTip: Ingesting Google Analytics API with Apache NiFi

QuickTip: Ingesting Google Analytics API with Apache NiFi

Design your query / test the API here:

https://ga-dev-tools.appspot.com/query-explorer/

Building this NiFi flow is trivial.

Add your URL with tokens from the Query Explorer console.

You will need to reference the JRE that NiFi is using and it's cacerts if you don't want to build your own trust store. The default password for JDK 8 is changeit. No really.

Here are our results in clean JSON

Here are some attributes NiFi shows.

Example JSON Results

{

** "kind": "analytics#gaData",**

** "id": "https://www.googleapis.com/analytics/v3/data/ga?ids=ga:33&metrics=ga:users,ga:percentNewSessions,ga:sessions&start-date=30daysAgo&end-date=yesterday",**

** "query": {**

** "start-date": "30daysAgo",**

** "end-date": "yesterday",**

** "ids": "ga:33",**

** "metrics": [**

** "ga:users",**

** "ga:percentNewSessions",**

** "ga:sessions"**

** ],**

** "start-index": 1,**

** "max-results": 1000**

** },**

** "itemsPerPage": 1000,**

** "totalResults": 0,**

** "selfLink": "https://www.googleapis.com/analytics/v3/data/ga?ids=ga:33&metrics=ga:users,ga:percentNewSessions,ga:sessions&start-date=30daysAgo&end-date=yesterday",**

** "profileInfo": {**

** "profileId": "333",**

** "accountId": "333",**

** "webPropertyId": "UA-333-3",**

** "internalWebPropertyId": "33",**

** "profileName": "monitorenergy.blogspot.com/",**

** "tableId": "ga:33"**

** },**

** "containsSampledData": false,**

** "columnHeaders": [**

** {**

** "name": "ga:users",**

** "columnType": "METRIC",**

** "dataType": "INTEGER"**

** },**

** {**

** "name": "ga:percentNewSessions",**

** "columnType": "METRIC",**

** "dataType": "PERCENT"**

** },**

** {**

** "name": "ga:sessions",**

** "columnType": "METRIC",**

** "dataType": "INTEGER"**

** }**

** ],**

** "totalsForAllResults": {**

** "ga:users": "0",**

** "ga:percentNewSessions": "0.0",**

** "ga:sessions": "0"**

** }**

}

You should have a lot more data depending on what you have Google Analytics pointing to. From here you can use QueryRecord or another record processor to automatically covert, query or route this data. You can infer a schema or build up a permanent one and store it in Cloudera Schema Registry. I recommend doing that if this is a frequent process.

Download a reference NiFi flow here:

https://github.com/tspannhw/flows

References:

https://developers.google.com/analytics/devguides/reporting/core/v4

https://developers.google.com/analytics

Top comments (0)