This page is an ode to the middle, properly scaled, efficient, and manageable systems. It pays homage to building services that will be useful, profitable, and won't leave you crying at midnight.
Who I am: I am Camille Clayton. I have 8 years of system design and engineering experience, sprinkling in application development, user training, documentation writing, and some other twists and turns. Through this article, I hope to share a bit about building human-centered systems, question the status quo of monolith vs. microservices, and share a bit of love for well-rounded systems!
What Is Unique About a Medium System
- A defining characteristic is a system with essential business functions, value, and known boundaries regarding maximum user input and interaction. It will have known and manageable external constraints. This does not apply to public user-facing apps with an unbounded maximum of users and need to scale under unexpected load; see the Slashdot effect.
- A medium system may need to scale dramatically, but itโs usually during planned or known times, like financial systems during tax season.
- A medium system may have once been a large-scale system, especially in government, finance, scientific research, and other early computing adopting industries. As user levels and logic needs stay relatively static, these industries may not need major system resource increases, especially for internal systems.
-
Know Your Environment
- If you aren't at the scale of Google or Amazon, where you're registering millions and possibly billions of transactions per second, you can likely get away with a simpler design pattern. This will save you time, energy and headaches.
- If you are near that scale or actively facing scaling problems, read Google or Amazonโs whitepapers on their design choices. There are clear reasons why these large, complex, and global scale systems exist; it will also exemplify times and opportunities to use them. Ex: CDNs (Content Delivery Networks).
-
Be Willing to Sacrifice
- Not every service, piece of code, or server stood up will be useful in one or more years down the road.
- Work to develop a process to sunset services, deprecate systems, and annotate documentation when these items are no longer in use.
- Keeping your system lean enables you to develop new features more easily and minimize the amount of upgrading and patching for under or unused systems.
-
Design the Systems You Need
- It's easy to be fascinated by new tooling, paradigms, and technologies, but the reality is most software already has a solution.
- By relying on known technologies, and ones you and your team are comfortable with, you will significantly reduce your internal friction and ramp up time. In the long term, this gives you a runway for future innovation.
- Dan McKinley talks more about relying on foundational technologies in their essay "Choose Boring Technologies"
-
Choose a Queueing Service
- Understand its quirks and set up alerts, and never let anyone enable a second queuing service.
-
You Don't Need 2 Databases
- If you're deciding on whether to expand your schema or add a second database, you likely have a different need. In that case, you should hire a DBA or a Data Consultant.
- Unless you're writing millions of transactions a second, and need the ability to have data be desynchronized to get additional speed of writes, you're better off reevaluating your current schema.
- You should look at optimization technologies like Redis, or management technologies including Flyway or Liquibase.
-
Automation is Essential
- The longer your team takes to decide on an automation platform, or if they split into multiple platforms, the longer each revision to infrastructure and code will take.
- Automation is a keystone for the functionality or dysfunctionality of your teams. Automation platforms should be boring foundational tools that enable you to to create higher-level products with expected repeatability quickly.
- If your tooling is not repeatable, this should be prioritized and cleared as a roadblock.
-
Everything You Add Will Need to Be Maintained
- Whenever a system, service, app, or site is being developed, questions need to be asked. Who will maintain this new thing? What team will it be dedicated to, and will it be shared amongst development groups? Will it be owned by a platform team?
- This should be decided early on, documented, and sent out via email with positive confirmation.
- Deciding these things after something has been implemented can devolve into months-long turf wars that no one wants or needs.
-
Know Your Users
- Listen to your users. Figure out what theyโre using and if they have any pain points.
- Happy users will let you test new systems out and innovate accordingly. Unhappy users will keep you pinned down and reworking until it's right.
-
Work With Your Stakeholders
- In consulting, this is usually an external role. In medium to large companies, this is usually internal, and it goes by many names, whether it's a Product Manager, Project Manager, Application Owner, and more. Regardless, this person or persons holds the keys between your system and end users. Having this individual on your teamโs side is vital to keeping focused, clear objectives and aligning your business and teams for success.
- If you're an engineer, an architect, or otherwise involved in the day-to-day of building and maintaining a system, developing a good working relationship with this person is crucial. This will allow you to get buy-in for the hard stuff, refactoring and removing tech debt, and the ability to innovate beyond changing the colors of buttons.
- Invest in your stakeholders work happiness, as they hear from management and users directly. They often know of additional aggregate pain points, even if they don't bring them up at every sprint review. Even if you can't work on them today, sorting out the longer-term issues can often help you prevent them from becoming active fires down the road. It can allow you to find ways to incrementally mitigate them if there's time between sprints.
-
Share Your On-Call Rotation
- Historically, on-call was the responsibility of Ops folks. In the 2010s, there have been blog posts and thinkpieces that Devs should be part of the rotation or if SREs should guide in the formation of rotations. In working through various rotations since 2014, I've come to the simpler conclusion: everyone should be part of the rotation including non-technical managers and analysts.
- I call this the Town Crier Method, and my key insight is that if everyone is getting woken up for issues, it's more likely for issues to get long-term resolutions. This even includes folks who can't directly resolve the issue but can help with contacting additional engineers and documenting the incident. No one wants to be responsible for a non-functional system.
- There is also additional camaraderie to be found when a wider group helps resolve issues. It visualizes to stakeholders and non-technical team members these critical incidents, in a clear and immediate manner.
- Thirdly, the larger an on-call rotation is, the lighter the impact is on everyone's free time yearly. The difference between a 10 and 20-person weekly rotation with a primary and backup is being on-call for 36 days a year, instead of 72.
Photo by Marvin Meyer on Unsplash
Top comments (0)