Can we improve the reliability of a system by employing various performance engineering techniques to different stages of the development process?
This is a look at how a solid Performance Engineering strategy that uses Reliability principles and DevOps idealisms to complement and strengthen current or proposed performance initiatives
These approaches attempt to achieve better business cohesion, reliability and velocity benefits. To do this we can look at applying various methodologies from Performance Engineering using a Shift left and Move Right approaches that extend Traditional Performance Testing techniques
- A mechanism to run load against an application or system
- A way of measuring how they performed
- A way of comparing the results against what we believe is the ideal state
Each area of performance within the DevOps model has its part to play. That is, they all relate in some shape or form to the principles around building, defining and maintaining a reliable system
Each Performance execution and analysis piece should look to be guided by the Engineering Efficiency, DevOps and Reliability principles that apply to software development
Reliability Engineering(RE) attempts to predict and prevent the risk of there being a failure whether that be a component or an entire system of services
Performance Engineering(PE) states we should start earlier in the SDLC to get faster feedback, but also extends into Operations and Support to use real world data to build/update of the performance models (scripts and analysis)
Performance Testing (PT) is all about determining what the performance of an application is (baselining) or comparing to how you believe it should be(delta analysis) under various conditions and situations in the 'test' environment
PE looks incorporate the methodologies of 'Agile' and use these in conjunction with 'DevOps' idealisms in order to provide a improved approach that adds value rather than one that tends to hinder delivery velocity
We can do this by looking at adopting a left shift / move right approach that incorporates a cloud first performance automation approach. This can then lead to reduced feedback cycle (velocity increase) and bottlenecks / bugs being caught early on (reliability increase).
PE is all about applying process and strategies at each step of the SDLC, the following are example actions/options that can be applied within each vertical
The idea being that performance is a consideration at each step in the software lifecycle, The captured metrics are gathered from Dev, Test, Deploy and Operations and used to refine the next cycle of performance
Quite often done within the test phase and entails a big bang approach that consists of many pods/VM's to generate load against an application/system
|Simulates real world conditions as closely as possible
|Often a integrated(shared) environment which can affect results
|Integrated tests execute against multiple components at once
|Data is often 'test' data which could affect behaviour/results
|Tools can replicate thousands (if not more) of users
|Replicating 'Prod' environments can be expensive
|Extensive metrics/reports from tool
|Finding route case when diagnosing issues can be complex
|Commercial Tooling can be expensive to operate item
--> We can attempt to find this out using combination PE, RE and DevOps principles and methodologies
Reducing the SDLC feedback loop to uncover and rectify potential system and environment issues early
|Foster developer engagement and contribution.
|Reduced development costs.
|Detect and eliminate bottlenecks shortly early.
|Find bugs and performance issues earlier.
|Speed up time-to-market
|Having more trust in your applications and infrastructure.
A "Move Right" approach extends testing out to include user feedback and metrics from your production environment. This can then be used to update the performance model that's developed as a consequence
|Increased User experience
|Tests closer match the actions expected by your users
|Teams have more involvement and ownership over the performance information is presented back
|Design hypothesis evaluated
|Assumption are reflected upon and adequate action can be taken
|Various performance management options
|Many different tools for being able to change traffic flows that can alter performance
The use of performance metrics from each environment (Dev/Test/Prod) are used to determine whether they are within SLO's limits.
Idea being we can understand and easily record local (component) and integrated(end 2 end) metrics to provide better performance transparency. These then would be compared to ideal state
These SLO's can be enforced through the use of SLI's (SLI specifications and SLI implementations) and compared to our error budget to measure tolerance
With the view to obtain an current state view of our applications performance in each environment and at each stage of the SDLC these are then compared against our business performance exceptions defined in the SLO and enforced in the SLI
- API / UI response times
- DB transaction times
- Pod / VM scaling events
- CPU use / Network activity / Memory usage
Could all be defined and compared using SLI's
A subset of the performance suite can be used to poke test (performance smoke test) the application after deployment. A degraded Performance run could then trigger a rollback
A balanced performance strategy that is applied at each stage of the SDLC, that uses guidance from RE principles provides a more well rounded verification process and in turn lead to a culture of empathy, encourage collaboration, reduce delivery cycle duration and mitigate the chance of deploying underperforming software