Doug Black

Posted on Oct 20, 2018

Things I Learned Building an Analytics Engine

#analytics #sidehustle #sideprojects

Oh man, am I excited. This side project has been awhile coming.

I just released Engauge Analytics (https://engaugeanalytics.com/), a web analytics engine that allows you to get meaningful data from your website without stalking your users like some other large entities cough cough Google and Facebook cough cough.

What’s more, the paid tiers of Engauge allow you to get full, automated SEO evaluations of your sites, AND machine-learning driven content evaluations that help you know how your audience will react to your content BEFORE THEY VISIT.

It’s the geekiest thing I’ve ever made, and I’m crazy excited for it. But it was PAINFUL to make.

Here are some of the lessons I learned while building it.

Think About What You're Creating Before Building It

I was deep into building the UI for the app, with the analytics engine running and testing. A friend of mine got me a call with a friend who worked on product in a major company.

I walked him through the product from start to finish, using fancy words like "proprietary".

His first question: "So who are you making this for?"

Great question. I hadn't really thought about it.

So, side projects are fun and challenging. The biggest lesson I learned up front: If you're going to market it, think about who you're marketing it to.

Ask For Feedback

Two weeks before release, I invited a number of users to test it in alpha and give feedback. The feedback was awesome, and lead to a lot of common-sense features that I had missed.

Some of it was a little tough to hear. This was my baby, and I was incubating it from scratch and asking folks for feedback on how that baby should be raised was sometimes painful.

But, with every suggestion came an opportunity to grow the app. Not every suggestion got implemented, but some of it showed me that the app would go nowhere without it.

The Data Got Big Quick

So, the way the app is set up, I knew the database would get big quick. I just didn't realize HOW big.

It was so big with the alpha users that it crashed the entire thing. I had to quickly scale, and think about scaling in a much bigger way.

Scaling the server and DB was easy, but I'm still not totally pleased with this. Any feedback for a relational database that scales large and gives quick queries?

Enjoy the Journey

Maybe you've been here, too: "Side project is done, now what's next? Let's build something else! A space monitor API that looks at dark matter, utilizing....webpack!"

This seems to be the first side project in a long while (maybe ever) that hasn't given me the itch to move on immediately. This is built, and I want to see it grow and succeed and scale like crazy.

I have to admit, I'm still really learning about it. I'll post some more, maybe some how-tos, on how I built some of this. But, I know there are still areas of growth in this that I haven't even touched. Do I copyright it? Patent the algorithms? Scale to a different database structure? Multi-tenancy?

Time to keep learning!

Top comments (4)

Adrian B.G. • Oct 21 '18

Any feedback for a relational database that scales large and gives quick queries?

You can migrate to MariaDB and scale your DB. It has multiple storage engines, and a master-to-master replication, sharding or the traditional master-slave. With the columnar storage I think it can handle PB of data.

But all the big analytics players uses simpler key-value (columnar) solutions, so they can scale horizontally. Collecting events and running crunching jobs to aggregate and enrich them is better than squeezing performance from a SQL query.

Side-projects are fun for us, devs, the problem arises when we want money out of it. Then all the stuff come that we do not want to handle, from laws to marketing, from customer support to hosting bills.

Doug Black • Oct 21 '18

Thank you! I went with Percona off the bat for this one...it's just soooooo fast.

Tell me more about the key-value solutions! This may be just what I'm looking for!

Adrian B.G. • Oct 21 '18

At an abstract level:

Getting rid of the relationships, and using simple documents, you can shard better, with specific Storages like Cassandra.

Sharding an SQL, most of the times, it requires to get rid of the relationships and Joins. Even if it does not, it will add an overhead because it will query and group data from different shards, in a cascading effect.

If the "sharding" algorithm has to take into consideration data relationships, and wants to keep data as local as possible, then you will have "Hot" spots and unbalanced shards.

I don't say it is impossible to scale SQL, I say that it will be harder and more expensive, if you can afford Spanner from Google or a big setup of Vintess, or 5-8 big servers behind a Galera go ahead!

Bottom line, if you want to go beyond a few TBs of data, I would suggest rethink your structure in a Columnar way, and less SQLish.

Doug Black • Oct 21 '18

This is a great place to start! Thank you!