loading...

Introduction to Metabase

peterfication profile image Peter Gundel ・4 min read

When you start a company, setting up a business intelligence (BI) solution is not the first thing most founding teams implement, I guess. Now that I'm past the step of implementing such a solution, I think it is something that should be implemented in the very early days to build a culture of measuring everything.

The main reason why it wasn't implemented directly from the beginning at my company was that no one had the experience with a good solution yet. Therefore, reports were conducted manually against the database in raw SQL or implemented into the internal admin area to make them a little bit more accessible (but without graphs of course because it would have cost too much time). As you might guess, this was some work and couldn't be changed or extended fast.

In the fall of 2015, AWS QuickSight was announced. It seemed to be a very good solution for us as our databases are hosted on AWS. In August 2016 we got access to the preview version. It wasn't really easy to use and the user management was not really good. Reports and datasources weren't shared accross users or I didn't find the right way to do it. In most cases we use UUIDs as our primary keys and AWS QuickSight had no support for UUIDs back then. This lead to the situation that I didn't put more time and effort in using AWS QuickSight. When AWS QuickSight was finally released, I took another look at it but I still wasn't satisfied with the look and feel. (This is very subjective and might be different if I gave it another try today!)

At the same time of the release of AWS QuickSight, my former CTO at fotograf.de, Marco Beinbrech, showed me their BI solution, an open source software called Metabase. He told me that it is super simple to setup and easy to use for all team members (especially for non-technical team members 😉).

And indeed, he is right. Metabase provides different ways to deploy it anywhere. Most of our servers were at Heroku then and Metabase provides a one click deployment to Heroku. Pretty neat! But at that time we were also undertaking our first steps with Kubernetes and Metabase seemed to a be a perfect fit to play around with Kubernetes. So we wrote the Kubernetes objects for it and voilà , it was up and running. No hassle at all!

Further setup

A collection of things you should consider when setting up Metabase

Using the production PostgreSQL user creates a security risk because this user can change data in the database. As Metabase is only for reporting, and therefore reading data, we created a read-only PostgreSQL user for our production database.

It is not a good idea to give Metabase direct access to your production database. When you create huge and slow queries, it will directly affect your production servers. Fortunately, AWS RDS makes it super easy to create a read replica of your production database.

Metabase needs a database it can read and write from for itself in order to manage data about the users, reports and so on. To make the database of Metabase persistent across Kubernetes deployments, we created another database on our RDS instance. This way, the Metabase data is also backed up by our current backup procedure. You just have to provide the PostgreSQL connection details to the Metabase deployment in Kubernetes and it works out of the box. Then you can create and rebuild the cluster without loosing any Metabase internal data.

We use Google Apps at my company and Metabase has a Google OAuth login built in. So you just have to follow the guide and all your colleagues can register with their Google Apps account.

User management can be handled with user groups and collections (folders for your reports in Metabase). Marco gave me the tip to set up collections and and user groups according to the department structure. This way you can implement user access on a collection level very easily.

First steps

I really recommend reading the docs of Metabase. They are really good, especially for the first steps.

In Metabase reports are called questions. So in order to create the first report you click on New Question, select the database, select the table, add filters, specify what you want to see (eg. count of users) and add a grouping (eg. created_at by month). And all of that in a simple click interface that can be used by a lot of users and not only the SQL pros in the company. (It still gives you the possibility to create custom SQL queries).

Me encanta! 😊

Finally, you can put multiple questions on a dashboard and have a nice overview of what's going on in the company.

Conclusion

All in all, Metabase was really easy to setup for the tech team. If we had used Heroku, it would have been even easier. And most importantly, it is really easy and fun to use for all team members. 😊

Thanks Marco, for all the tips here 😊 In further posts I will go into detail on how we create advanced reports at store2be.


Hi there, we’re store2be, a Berlin based startup that builds a SaaS enabled marketplace for short term retails space. If you like what we are posting you might wanna check out the store2be tech page or follow our Medium channel.

Posted on Apr 6 '17 by:

peterfication profile

Peter Gundel

@peterfication

CTO at store2be.com in Berlin, Ruby ninja with JS experience, interested in Rust and Kubernetes.

Discussion

markdown guide
 

Metabase is not just for startups, at my company we use both Metabase and Power BI for different purposes: the former for personalized insights, alerts and data exploration, while the latter for more formal reporting for executives and managers as well as for tracking company goals.

 

Nice, cool to know :) I will try to have a look at PowerBI as well.

 

My team DBA using it "daily monitoring tool" :D
He handling almost 30 DBs; oracle, mssql and maria.

 

Does the Metabase store the whole database in a local file or cache or in-memory to get the queries faster or every time, it goes to request the database ??

 

When I started using Metabase I think they had no cache so every request hit the database. But the current version has a cache which cases not the database but the requests. I mitigated this issue by creating a read replica of my database.