Hi Christine! That's an amazing stat (100terabytes of data from 15 billion emails) - how do you think about growing Nylas' API? There's so many older API methodologies and I'd imagine its hard to extract that data from old emails.
Horizontal scaling, basically. :) Our biggest scaling bottleneck is our data storage. We've scaled that out by using "email account" as our sharding key and each email account lives on a specific database cluster. Then services talking to the database clusters multiplex connections using a MySQL proxy service called ProxySQL: proxysql.com/
It's not rocket science, but there's plenty of fun stuff to figure out when you're dealing with so much data and request volume!
Another fun fact, we don't actually use microservices at all; our application architecture is still fairly monolithic. We have different services which each have their own dedicated capacity and provisioning logic, but all services talk to the same database clusters and share a significant amount of code, like ORM models.
We're hoping to loosen the coupling between our services & our database clusters in the future by putting something like a Kafka data bus in between services that need to do extra processing on mail data, so we can fan out to more services & keep the mail ingestion logic lean. If that kind of project is exciting to you I'd love to talk. :)
There are some more details in this talk if you're curious:
Hi Christine. I'm a high school student currently working on an open source project that scaled so large we are in need of horizontal scaling. My collaborators and I have recently opted to use RabbitMQ mainly because of how simple and easy it is to use. Out of curiosity, what are some of the reasons you want to use Kafka over other message brokers?
We're a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.