Building resilient large-scale applications

#architecture #frontend #softwareengineering #systemdesign

The principles for writing software are the same, but the development process itself is quite different when doing it for small versus large applications. Reading, fixing, and building on a small codebase is easier and more forgiving than in large ones. I have worked on applications from small clients, greenfield applications as well as large-scale ones, serving thousands of users. The difference is that, when there is already a fully operational machine running, mistakes are more costly and downtimes harder to recover from. Therefore, processes, guardrails and constant alignment become part of everyday development.

Most of the concepts I cover here are applicable to any part of web development, but as a frontend engineer, my focus and examples will stay in that domain, especially in the modularization topic.

Stating the challenge: multiple people involved on a very large codebase

Having a disorganized codebase is a problem in itself, but the problem multiplies when there are dozens of people working on the same project, which is often the case for large-scale applications. It can be developers from the same team, engineers from other teams, external contributors, or cross-functional staff engineers. Without some level of coordination, the following problems tend to emerge:

Problem	Description	Strategy
Regression	A team ships a feature or refactors code that unknowingly breaks another team's work	Unit tests for logic, integration tests for module contracts, and E2E for critical user journeys, all enforced in CI
Code duplication	Similar features are implemented independently across teams, leading to diverging logic	Shared modules for common components, hooks, and utilities
Inconsistency	Standards around formatting, code style, and tooling are not enforced uniformly	CI guardrails: linters, formatters, shared tsconfig, path alias enforcement
Unintended Changes	A developer modifies another team's module without awareness, causing unintended behaviour changes	CODEOWNERS and enforce review from the owning team on every change (and a responsive review process)
Obscurity	Onboarding, patterns, and code conventions are unclear or scattered across different places	Central documentation covering setup, architecture decisions, and code style
Staging overwrite	A branch deployment to staging silently overrides another team's active deployment	Per-branch preview environments instead of a shared staging instance

This is by no means an exhaustive list of all of the challenges of working on a large-scale application. These are some of the most common issues I personally have come across over the years, and some strategies that have worked. But, of course, as we know, everything in software development is a trade-off, so no solution is a 100% fit for every case.

That said, I’ll discuss the issues and strategies on how to solve them in three groups: modularization, documentation, CI guardrails and CD practices.

Modularization

In “A Philosophy of Software Design”, Ousterhout calls two ways of programming: strategic and tactical. Simply put, tactical programming focuses on developing features and fixing bugs, and strategic focuses on protecting the “long-term structure of the system”. These two ways of getting things done are widely used in the industry, but the author advocates that, although slower in the beginning, the strategic way of programming pays off, as tactical programming accumulates debt that slows teams down over the long term.

Applying good and well-thought architecture is one of the principles of strategic programming, and one of the areas I chose to focus on is modularization.

What seems like a straightforward matter of developing features becomes hard when dozens of engineers and multiple teams are all committing to the same repository. Without proper coordination, this can cause issues such as code duplication, feature overwrite, tightly coupled modules, and regressions with each change.

What's worth clarifying is that modules in frontend look different from those in backend development. Usually, everything related to the application is in a single codebase, as it needs to be bundled, compiled and run in the client. Libraries are loaded and bundled, and communication between modules happens via shared state, props, or event patterns at runtime. Common architecture choices such as layered, microservice, MVC are adapted to the frontend.

Our modules - not to be confused with UI components - are units of business logic, each grouping all the code that belongs to a single domain concern. Clear module boundaries matter.

Modularization is the process of organizing your codebase into loosely coupled, self-contained pieces of code. Total independence between them is impossible, so the goal is to keep dependencies explicit and minimal. It is worth noting that the word 'module' is overused in JavaScript - ES modules, npm packages, and CommonJS modules all use the same term. Here, I use it to mean a domain-ownership unit.

To understand how we apply it in frontend, let’s take an example.

Consider this folder structure (modules by feature):

src/
  features/
    checkout/
      components/
      hooks/
      utils/
      types/
      constants/
      index.ts
    search/
      components/
      hooks/
      utils/
      types/
      index.ts
  shared/
    components/
    hooks/
    utils/

And this one (modules by type):

src/
    components/
        checkout/
      search/
      shared/
  hooks/
      checkout/
      search/
      shared/
  utils/
        checkout/
      search/
      shared/
  types/
      checkout/
      search/
  constants/
      checkout/
      search/

Although they look similar, the first makes the separation of responsibilities explicit in the file system (Modules by Feature). If team A owns the Checkout flow, and team B owns the Search, they know which folder they can change and which folders require coordination with other teams. The second structure also allows for that (Modules by Type), but feature-specific code is scattered across the codebase, which leads to more mistakes and slower cleanups.

The diagram below illustrates the difference in how code is organized under each approach.

In Modules by Type, when a codebase grows to cover ten or twenty features, the components/ folder might contain hundreds of files. Some of those components belong to the checkout flow, some to the search experience, and some are shared across features. Finding everything related to a single feature requires jumping between components/, hooks/, utils/, and types/ and knowing which files in each folder belong to which feature.

In Modules by Feature, by contrast, finding everything related to checkout means opening one folder. Deleting a feature means deleting one folder. With this structure, it is also possible to set up CODEOWNERS, which is important when there are multiple teams working in the same codebase. Whenever team A changes a piece of code under team B’s responsibility, they will be notified and required to be a reviewer of a PR, and will be prevented from merging (provided that branch protection rules are configured to require code owner approval).

On top of that, code duplication can be avoided by using Shared Modules (utilities, components, or anything that might be useful for the entire system). But discovery is the hard part - which is why clear module ownership and documentation also matter here. That's where documentation becomes the connective tissue that makes modular structure legible at scale.

Documentation

Everybody says we need good documentation, but the stakes are higher when there are hundreds of people looking at your code. In a large company, documentation is necessary. This decentralizes knowledge, enables async work, and reduces information silos.

Overall, the benefits it can bring are:

Faster onboarding: make newcomers more autonomous when running your project
Consistency: keeping patterns, code conventions, and architecture decisions documented for future reference
Centralization of information: everything related to the project, Confluence pages, observability links, localization documents, etc., can be in the repository for easy access.
AI-readiness: in addition to the “classic” benefits, agents can read through documents to implement features that comply with guidelines.

Documentation is, admittedly, tedious to write and tends to go stale quickly. For that reason, some advocate for less documentation, especially if the same intent can be expressed through the code itself. Code should speak for itself through clear naming and structure. There are, however, things that code cannot document: processes like onboarding, releases, and deployments, or decisions that need to be more visible than what lives inside the codebase.

Keeping documentation alive is a collaborative task. There is no single responsible person: every team member needs to be proactive about keeping it current. When I start at a new company, I write down everything I need to get the project running - API keys, third-party access, software to install - and use it to update the onboarding documentation if it's outdated (and it usually is). I revisit it every time I help someone else get set up.

As AI tools mature, documentation becomes a two-way investment: repository-level documentation gives coding agents the context they need, and AI-powered tools will be able to speed up the documentation update process, like via Atlassian Automation or a GitHub Actions webhook.

CI guardrails and CD practices

Modules give structure, and documentation makes that structure legible, but neither survives at scale without enforcement.

Verbal agreements matter, but automated enforcement is what actually keeps things consistent. Agreements made in conversation don't stick - people forget. The volume is too high and context switches too frequent for people to reliably catch everything: commit message conventions, PR descriptions, test coverage thresholds, review processes, and so on. Because of that, we have several tools to automate this process, and we apply them by creating CI guardrails.

For example, before a PR can merge to main, we can add a check that enforces a minimum overall coverage threshold, and a d*iff coverage* check to ensure newly introduced code is covered too. We can also run linter checks - which catch potential bugs and enforce patterns - and formatter checks for consistent code style. If those checks don’t pass, the change is not integrated into the main code. This ensures that every change meets a defined quality baseline.

With these automated checks, human review is no longer needed for mechanical steps - which is important at scale, where manual gatekeeping becomes a bottleneck. Instead of having to check formatting or whether it has enough tests, engineers can focus on business logic implementation and architectural decisions.

After passing all the checks, it’s time to deploy to a staging environment. It shouldn’t be a problem to deploy to staging - or so it seems. In practice, when multiple teams are testing different features simultaneously, a single shared environment becomes a source of constant conflict.

For that reason, when it comes to the CD part of the process, it is crucial to have multiple staging environments. What usually happens is a per-branch preview: you can deploy your branch, and it creates a unique URL. No shared staging instance means no overwrite, no "who deployed what" confusion, and no blocked QA because two teams need to test at the same time.

Conclusion

The practices covered here - modularization, documentation, and automated enforcement - are what make a large codebase workable in the long run.

That said, the investment goes beyond tooling and processes. Large companies dedicate entire roles to it: platform engineers who ensure others can ship faster, and system architects who oversee architecture across domains. Beyond roles, there are dedicated processes - like writing and approving RFCs - to govern changes that affect multiple teams.

The investment is ongoing and never fully finished. But the alternative, like teams stepping on each other, knowledge locked in people's heads, regressions shipping, costs far more.