This article was originally published on bmf-tech.com.
Overview
In this article, we will explain the definition, necessity, implementation, and decision-making for introducing platform engineering.
Platform engineering is a specialized field that abstracts and provides complex infrastructure and operational processes as a common foundation, allowing developers to focus solely on application development. It fundamentally resolves issues such as cognitive load on developers, bottlenecks in development speed, and lack of standardization, which become apparent as organizations scale.
This article first explains the issues that platform engineering addresses and the components required to achieve it (IDP, Golden Path). Next, it clarifies the specific benefits gained from implementation and the differences from related methodologies such as DevOps and SRE. Furthermore, it outlines the four roles necessary for success (selection, abstraction, self-service, operational responsibility) and the required skill set, and finally presents the criteria for when to introduce it.
Definition of Platform Engineering
Platform engineering is a specialized field that abstracts and provides complex infrastructure and operational processes as a "common foundation (platform)" so that developers can focus on application development.
By utilizing the platform, development teams can efficiently build, deploy, and manage applications.
Platform engineering streamlines an organization's software development process and aims to improve developer productivity.
Why Platform Engineering is Necessary
As organizations grow and the number of development teams and services increases, the following issues may become apparent.
Increasing Cognitive Load on Developers
With the spread of DevOps, developers have taken on the responsibility of "building and operating themselves." However, the areas to be managed continue to expand, including cloud services, containers, orchestration, security, and compliance. Developers must acquire detailed knowledge of infrastructure and manage complex configurations in addition to their primary application development. This increasing cognitive load leads to developer fatigue and decreased productivity.
Bottlenecks in Development Speed
When developers are forced to issue tickets to the operations team and wait for approval every time they provision or deploy infrastructure, lead times for deployment are prolonged. Constraints such as "it takes days to prepare the environment" and "must wait for the weekly release window for production deployment" delay time-to-market and increase the risk of missing business opportunities.
Lack of Standardization and Complexity of Maintenance
When multiple teams independently build their own tools, CI/CD pipelines, and workflows, it becomes difficult to ensure consistent quality and security across the organization. As individualization progresses, situations arise where "no one understands this configuration" when a person in charge resigns or transfers. Additionally, multiple teams are redundantly solving similar technical challenges, resulting in significant waste from an organizational perspective.
Operational Instability and Lack of Governance
When security and compliance requirements are added retroactively, significant friction occurs in the development process. When different security settings and monitoring methods are adopted by each team, it becomes impossible to enforce governance across the organization, increasing the risk of vulnerabilities and incidents.
Platform engineering emerged to fundamentally solve these issues. By abstracting complex infrastructure and operational processes and providing a common foundation that allows developers to develop safely and quickly through self-service, it improves productivity and reliability across the organization.
Components of Platform Engineering
The following components are required to realize a platform.
Internal Developer Platform (IDP)
A toolset that abstracts technical complexity and realizes self-service.
The main functions of IDP are as follows:
- Service Catalog: A list of available services and templates
- Self-Service Provisioning: A mechanism that allows developers to create and manage resources themselves
- Integrated Documentation: Aggregates how to use the platform and best practices
- Observability Dashboard: Centrally checks the state of applications
Representative tools for building IDP include:
- Backstage (by Spotify): A portal integrating service catalog and documentation
- Port: A platform that allows building IDP with no code
- Kratix: A framework for building platforms on Kubernetes
- Humanitec: An IDP specialized in application configuration management
Golden Path (Paved Road)
Standardized templates and workflows guaranteed to be safely and quickly deployable.
Examples of Golden Path include:
- Application Templates: Pre-configured project structures and CI/CD pipelines
- Infrastructure Templates: Standard configurations of Terraform modules and cloud resources
- Deployment Patterns: Implemented workflows such as Blue/Green deployment and Canary release
- Security Settings: Configurations with built-in encryption, access control, and audit logs
Developers do not need to worry about "how to build" and can naturally follow best practices by adhering to the Golden Path. This allows even inexperienced developers to build high-quality and secure systems.
Benefits of Platform Engineering
By introducing platform engineering, organizations can enjoy many benefits.
1. Improved Development Speed and Efficiency
With a self-service platform, developers can reduce the time spent on requests to other departments and waiting for approvals. They can autonomously perform infrastructure provisioning, deployment, and environment setup, shortening time-to-market.
2. Reduced Cognitive Load on Developers
By abstracting operational complexities such as complex infrastructure, cloud configurations, security, and scaling, the platform reduces the cognitive load on developers.
Developers can work along the Golden Path, enabling development that naturally follows best practices. This improves developer productivity.
3. Enhanced Security and Compliance
In platform engineering, security is not an afterthought but is built into the platform from the design stage (secure by design).
The platform automatically applies consistent security policies, access controls, and compliance checks across all services.
4. Improved Consistency and Quality through Standardization
By providing standardized templates called Golden Paths, consistent development methods are realized across the organization.
This prevents the proliferation of disparate methods for each project, reducing the complexity of maintenance. Additionally, onboarding new members becomes smoother, allowing even inexperienced developers to achieve high-quality and secure configurations.
5. Observability and Operational Stability
Integrated observability ensures insights into middleware and application performance. This allows issues to be detected and resolved before impacting end-users.
6. Cost Optimization
Platform engineering achieves efficient resource utilization and reduces operational costs. By centralizing and automating common development processes, redundant work and waste of human resources can be eliminated. As a result, it is possible to reduce costs while improving profitability.
Differences Between Platform Engineering and DevOps
DevOps is "culture and practices," while platform engineering is "specific implementation and solutions."
While DevOps refers to frameworks and methods aimed at "integrating development and operations," platform engineering is a specialized field that implements and provides those methods as specific mechanisms (tools and foundations).
DevOps is a method and practice where development teams also take on operational responsibilities under the principle of "You build it, you run it." However, as the areas to be managed, such as infrastructure and security, continue to expand, there is a challenge of excessive cognitive load on developers.
On the other hand, platform engineering is an implementation and solution that abstracts complex operational processes as a "self-service common foundation" while maintaining the ideals of DevOps. It provides an "environment" as a product where developers can release safely and quickly without deep expertise, enhancing the development experience.
Differences Between Platform Engineering and SRE
SRE focuses on "production reliability," while platform engineering focuses on "developer productivity."
The customer of SRE (Site Reliability Engineering) is the "end-user," and it is responsible for ensuring the "reliability (stable operation)" of the service.
In contrast, the customer of platform engineering is "internal developers," and it aims to improve development productivity by providing a platform.
Four Roles of Platform Engineering
The aforementioned components (IDP, Golden Path) indicate "what to create," but these four roles define "what principles and mindset to promote." Platform engineering brings value to the organization by promoting it with the same mindset as "product development" based on the following four roles.
1. Selection as a Product
Define the platform not as mere "infrastructure" but as a "product" with internal developers as customers. The principle is to reduce developers' cognitive load and focus on value delivery, strategically providing "choices that lead to success in the shortest time" rather than freedom to do anything.
In practice, both product and selection are important. While being customer (developer) centric, it is necessary to clearly delineate "what not to do" if it does not align with organizational governance. As a Golden Path, integrate multiple tools into one easy-to-use workflow for most standard needs. Additionally, as Rails, build new common foundations to fill gaps, such as batch processing and notifications, as a new common foundation.
2. Software-Based Abstraction
A strong engineering stance is required: "If you don't write software, it's not platform engineering." The principle is to provide an excellent developer experience (Developer Experience) through automation and abstraction to enhance developer satisfaction and productivity.
In practice, providing a stable API is important. Abstract the complexity of OSS and vendor products behind the scenes as a "stable API" that is easy for the app side to use. Also, provide components that enhance reliability, such as caching, sharding, and sidecars, to improve reliability and performance without changing the app side code. Furthermore, establish a mechanism for managing metadata and visualization that can centrally manage ownership, cost, access rights, etc. Note that building an IDP (portal) is not mandatory, and priorities should be determined according to the "pain points" on the ground.
3. Self-Service with Guardrails
Provide an environment where developers can work autonomously while incorporating "guardrails" that ensure safety and governance. The principle is to balance freedom and constraints, realizing a mechanism that allows developers to develop quickly and safely, raising the productivity of the entire organization from beginners to power users.
In practice, enable developers to provision environments on their own through UI, CLI, and CI integration. At the same time, realize self-service with guardrails by incorporating "guardrails" in advance to prevent fatal configuration errors and cost increases. Additionally, user observability is important to provide a debugging experience where developers can immediately determine "whether it's their code or a platform issue" when something goes wrong.
4. Operational Responsibility as a Foundation
It is not the end to provide tools, but to take responsibility for the entire lifecycle thereafter. Recognize that the reliability of the platform itself is the foundation of the business, and prioritize operational stability.
In practice, take End-to-End responsibility for the entire operation, including not only self-written code but also the behavior of dependent OSS and cloud vendors. While promoting self-service, build a "support system" and "positive culture" that responds to advanced production failures and diverse technical questions, realizing the democratization and support of operations. Furthermore, by habitualizing anomaly detection and proactive response, and continuously managing the SLO/SLA of the platform itself, maintain operational discipline that protects reliability as the foundation of the business.
Skill Set of Platform Engineers
Platform engineers are involved in both development and operational responsibilities, requiring a diverse set of soft and technical skills.
Soft Skills
The following soft skills are considered important for successful platform engineering.
- Collaboration and Communication: Work closely with software developers, DevOps, and Site Reliability Engineering (SRE) teams to understand their needs and reflect them in platform features. The platform is a "product," and dialogue with developers as customers is important.
- Problem-Solving Ability: Tackle complex infrastructure challenges and continuously improve platform features. Find optimal solutions within technical constraints.
- Empathy for Developers: Developers are the main customers. Focus on Developer Experience, designing the platform to reduce friction and support productivity. Think from the developer's perspective and build a platform that is "wanted to be used."
Technical Skills
Platform engineers should be familiar with the following technical areas.
- Infrastructure as Code: Manage infrastructure provisioning as code and automate it. Build highly reproducible infrastructure through declarative configuration management.
- Containers and Orchestration: Manage containers, deploy microservices, and automate scaling. Efficiently operate distributed systems.
- CI/CD Pipeline: Build continuous integration and delivery. Design a system where code is automatically tested, built, and deployed when developers push code.
- Monitoring and Observability: Visualize system health and detect problems early. Define SLI/SLO and set alerts appropriately.
- Security Practices: Understand security principles such as encryption, access control, secret management, and vulnerability scanning, and incorporate them into the platform. Reflect the secure-by-design concept in implementation.
- Programming Skills: Develop automation tools and CLI. Not only combine existing tools but also develop unique tools to solve platform-specific challenges.
When to Introduce Platform Engineering
Platform engineering is not suitable for all organizations. Introducing it incurs the cost of a "dedicated team to nurture the platform as a product," so it is necessary to assess the organization's growth stage and technical maturity before making a decision.
In the case of small teams or simple configurations, the overhead (management cost) of building a platform may exceed the benefits obtained. It is important to carefully judge whether it is worth investing in the platform as a "product" from the perspective of "is development speed clearly declining due to dependence on the operations team?" and "are there multiple teams that would be saved by standardization?"
Conclusion
Platform engineering strengthens the competitiveness of the entire organization by reducing developers' cognitive load and improving development speed and productivity. Its true value is not in the technical implementation itself but in the organizational mindset change that arises from viewing the platform as a "product" and considering developers as customers.
The platform should be provided as a product, continuously reflecting feedback from developers, and designed considering the entire service lifecycle. By accepting this change and practicing the principles of platform engineering, developer productivity improvement and strengthening of the organization's overall competitiveness are realized.
References
- www.oreilly.co.jp - プラットフォームエンジニアリング―成功するプラットフォームとチームを作るガイドライン
- www.gartner.com - プラットフォームエンジニアリングとは?
- learn.microsoft.com - What is platform engineering?
- newrelic.com - プラットフォームエンジニアリングとは?基本をわかりやすく解説
- platformengineering.org - What is platform engineering?
- payara.fish - What is Platform Engineering – An Overview without Buzzwords
- www.gartner.com - Platform Engineering That Empowers Users and Reduces Risk
- cloud.google.com - Google Cloud のプラットフォーム エンジニアリングで移行
- cloud.google.com - Light the way ahead: Platform Engineering, Golden Paths, and the power of self-service
- cloud.google.com - Laying the foundation for a career in platform engineering
- github.com - プラットフォーム エンジニアリングとは
- internaldeveloperplatform.org - Internal Developer Platform
- www.thoughtworks.com - The evolution of platform engineering
- www.youtube.com - Maturing your platform engineering initiative - Nicki Watt PlatformCon 2024
Top comments (0)