Spotify Backstage has revolutionized Platform Engineering by centralizing the Developer Experience (DX) into Internal Developer Portals (IDPs). However, when you scale an IDP to serve thousands of developers in complex enterprise environments, concurrency issues, permission bottlenecks, and critical security vulnerabilities start to emerge.
Today, I want to share three advanced architectural patterns we designed to solve these problems at their root, along with our recent Open-Source contributions to the community.
1. SRE Watchdogs: Preventing "TCP Hangs" in Catalog Synchronization
The Problem:
When building a Custom Entity Provider to sync thousands of repositories from Azure DevOps or users from MS Graph/EntraID, external APIs can often exhibit instability. If the network ingestion process stalls (TCP Hangs), it blocks the Node.js Event Loop within Backstage, causing widespread unavailability and request queuing.
The Solution (Decorator Pattern & Mutex):
Instead of modifying the ingestion code directly, we applied the Decorator design pattern to wrap our providers with an SRE Watchdog. This "watchdog" injects a Mutex (mutual exclusion) and a strict Timeout (e.g., 15 minutes).
// SRE Watchdog Wrapper
export class ResilientEntityProviderWrapper implements EntityProvider {
constructor(private readonly inner: EntityProvider, private readonly timeoutMs: number) {}
async connect(connection: EntityProviderConnection): Promise<void> {
// Safely initializes the connection
await this.inner.connect(connection);
}
async refresh(logger: Logger): Promise<void> {
// Injects Mutex and Timeout to prevent silent Azure API hangs
return withTimeoutAndMutex(
() => this.inner.refresh(logger),
this.timeoutMs,
"AzureDevOpsSyncTimeout"
);
}
}
This guarantees that network bottlenecks or third-party API instability will never bring down the IDP.
2. Zero-Touch RBAC: From Static YAML to Dynamic Conditional Policies
The Problem:
Backstage utilizes a robust permissions framework, but the community standard frequently relies on creating static Roles within extensive YAML files. In a dynamic organization, manually maintaining the pairing between developers, squads, and software components simply does not scale.
The Solution (Push-Down SQL):
We designed a Zero-Touch RBAC architecture. We abolished manual configurations by replacing static identity checks with dynamic authorization conditional policies (createCatalogConditionalDecision).
Backstage now natively mirrors hierarchies from Azure Active Directory, dynamically mapping resource ownership at query time (Push-Down SQL). A user is granted or denied administrative access over a software entity based solely on their Identity Provider "Claims," resulting in zero infrastructure friction and automated governance.
3. Open Source Contribution: Zero-Leak Policy (Mitigating SSRF in the Scaffolder)
The Problem:
Within the Backstage ecosystem, the http:backstage:request template action (maintained by the excellent RoadieHQ team) is widely used for HTTP integrations. However, we noticed that it lacked native guardrails when templates received dynamic inputs, opening severe vulnerabilities to Server-Side Request Forgery (SSRF) or Confused Deputy attacks. A malicious user could exploit the Scaffolder to scan ports on the internal network or mutate confidential endpoints via restricted methods.
The Solution:
Today, I took the lead in patching this vector and submitted an official Pull Request to the community's Open Source repository.
We injected coreServices.rootConfig directly into the HTTP module's constructor and created a parameterized Zero-Leak Policy via app-config.yaml. Now, Platform Administrators can enforce a strict security Whitelist:
-
scaffolder.http.allowedMethods: To restrict accidental deletions (e.g., blockingDELETE). -
scaffolder.http.allowedHosts: To guarantee that Scaffolder HTTP requests only reach authorized hosts, effectively isolating the network infrastructure.
Here is a glimpse of the architecture we injected into the action's handler:
// 🛡️ ZERO-LEAK POLICY: SSRF and Confused Deputy Mitigation
const allowedMethods = config?.getOptionalStringArray('scaffolder.http.allowedMethods');
const allowedHosts = config?.getOptionalStringArray('scaffolder.http.allowedHosts');
if (allowedMethods && !allowedMethods.includes(method)) {
throw new Error(
`Security Policy Violation: HTTP method '${method}' is not allowed. ` +
`Allowed methods: ${allowedMethods.join(', ')}.`
);
}
if (allowedHosts) {
const requestUrl = new URL(input.path);
if (!allowedHosts.includes(requestUrl.hostname)) {
throw new Error(
`Security Policy Violation: Host '${requestUrl.hostname}' is not in the allowed list.`
);
}
}
# app-config.yaml Hardening
scaffolder:
http:
allowedMethods: ['GET', 'POST', 'PUT']
allowedHosts: ['.dev.azure.com', 'api.myinternal.system']
4. Giving Back to the Community: Open-Sourcing Our Modules to NPM
Beyond architectural improvements, we strongly believe in the open-source philosophy. To help other organizations struggling with similar challenges, we have decoupled our internal solutions and officially published them to the NPM registry!
You can check out our new standalone Backstage packages:
- 📦 @leooelx/plugin-catalog-backend-module-azure-autodiscovery: A resilient Entity Provider for Azure DevOps that automatically discovers repositories while bypassing strict rate limits via Audit Logs.
- 📦 @leooelx/plugin-scaffolder-environment-matrix-field: A dynamic React extension for the Backstage Scaffolder allowing users to inject environment-specific variables and secrets seamlessly.
Conclusion
Implementing an IDP like Backstage is only the first step. The true challenge of Platform Engineering lies in ensuring that the platform operates continuously, invisibly, and inviolably under the highest standards of governance (SRE & CyberSec).
If you also work in Platform Engineering or want to debate resilient architectures in enterprise Node/React ecosystems, let's connect! Feel free to share your thoughts on how your teams handle permissions and resilience in your IDPs.
Top comments (0)