DEV Community: Audacia

Engineer Experience: Why & How to Measure It | Audacia Insights

Audacia — Tue, 16 Jun 2026 09:00:00 +0000

Engineer Experience examines how people, processes and tools affect productivity amongst software engineering teams. Based on a Tech Talk delivered by Technical Director Richard Brown, this blog explores the ‘why’ and ‘how’ of measuring Engineer Experience.

What Is Engineer Experience?

According to GitHub, engineer experience, sometimes called developer experience or DevEx, refers to the systems, technologies, processes and cultures that influence the effectiveness of software development.

Engineers are typically the most expensive resource on any software project. When engineers constantly face bottlenecks, blockers, inefficient tooling, slow laptops or build queues that stretch for hours, the entire team slows down and struggles to deliver value to end users. Whereas the more efficiently a developer is able to work, the better value for money an organisation can offer.

Bringing engineer experience front and centre means recognising that investing some time, effort or money can unblock engineers to deliver much faster. The return on investment is often relatively high compared to the input.

Why Measure Engineer Experience?

Often, surveys can reveal what metrics can’t. Metrics like DORA measurements, build times and deployment frequency provide useful data. However, without asking engineers directly, organisations can miss potential blockers and inefficiencies. The best way to measure engineer experience is through surveys that ask engineers themselves about their day-to-day work.

Measuring engineer experience delivers four key benefits:

Identify inefficiencies

Surveying engineers helps identify inefficiencies by allowing organisations to drill down into process details to find what causes them. For example, metrics might show that deployments reach production only once per month, but survey answers can reveal why.

Evaluate tools and technologies

Engineers working directly with software, frameworks and tech stacks are best positioned to say whether their tools are fit for purpose or their tech stack is up to date. Engineering leaders might struggle to keep pace with every new development, especially in AI tools where capabilities proliferate rapidly, but engineering teams usually maintain deep knowledge in their specific areas. Asking them about the tools and technologies they use surfaces insights about how effective those choices are for solving daily problems.

Recognise pain points

Job satisfaction matters enormously in any role, and in software development the best engineers are in very high demand, making it crucial to avoid having people dissatisfied in their roles. Whether it stems from being unable to use particular technologies, waiting three hours every time they queue a build, or working in silos with no visibility of other teams, a few stacked frustrations can prompt engineers to look elsewhere. Identifying pain points means they can be addressed before they become retention issues.

Build engagement

People genuinely appreciate being asked their opinion and having it valued. Engineering leaders who mandate decisions — frameworks, IDEs, product licences — without input quickly leave people feeling disempowered and lacking buy-in. Asking for opinions and taking them on board creates genuine engagement and commitment to organisational goals.

Best Practices for Engineer Experience Surveys

Several practices help organisations run effective engineer experience surveys and extract maximum value from the results.

Ask about things within your control

Only ask questions about factors that can actually be changed. For example, if an organisation only uses JavaScript and there are constraints that mean this cannot be changed, asking about satisfaction with the programming language only draws attention to dissatisfaction without offering a path to improvement. Instead, focus on areas where action is possible.

Focus on what’s important to your organisation

Template surveys and example questions exist online and make excellent starting points, but every organisation differs. Use templates as foundations, then add, remove and modify questions to ensure the survey speaks to specific organisational needs and priorities.

Be consistent

Running a single survey provides a pulse check on engineer sentiment, but the real value emerges from running questionnaires at a regular cadence. Instead of changing questions between surveys, maintain a high proportion of consistent questions as a baseline. This allows organisations to track trends and determine whether changes implemented between surveys have influenced engineer sentiment. For example, average scores moving from 3.5 to 4 out of 5 between surveys suggests that any change implemented in-between was positive and worth continuing.

Include role-specific questions

Most organisations employ different types of engineers — software engineers, test engineers, cloud engineers, data engineers, data scientists etc. Include universal questions aimed at everyone as well as subsets of questions targeting individual roles. For example, ask testers about splits between automation and manual testing, developers about unit testing and cloud engineers about cloud provider choices or infrastructure-as-code toolchains.

Consider anonymity carefully

Anonymous surveys encourage honest, robust feedback, but they can be harder to action. Without follow-up questions or context, it becomes difficult to fully understand the motivation behind comments. For example, in an organisation of 100 engineers, if 97 rate documentation as fantastic but three suggest room for improvement, then the big picture suggests that documentation is a low priority pain point. However, if those three engineers all make up one team, then the feedback becomes very different and targeted improvement for that specific team becomes necessary.

One solution can be to maintain anonymity whilst also collecting role, team or project information. This preserves anonymity (teams are typically larger than three people) whilst enabling identification of systematic trends within specific roles or teams.

Balance quantitative and qualitative feedback

Quantitative questions, scoring from one to five, enable easy trend measurement and quick identification of pain points (typically the lowest-scoring questions). Qualitative free-text responses provide the crucial context that makes feedback actionable. Someone scoring something one out of five is useful, but someone scoring one out of five and explaining why is far more valuable.

Follow through with change

Actioning feedback is the most critical practice in this list. At Audacia we break results down by project and meet with lead developers to identify next steps. These decisions feed into sprint retrospectives or targeted improvement initiatives.

We never make surveys mandatory, but by demonstrating that feedback leads to real change engagement stays high. Without this follow-through engagement rates drop as people gradually disengage from the process. If feedback can’t be actioned because of constraints such as finite time and budget, then clearly communicate those. Opening this conversation makes people aware of leadership constraints, whilst maintaining trust.

Key Survey Areas

At Audacia, we structure our survey questions around these key areas chosen to reflect what matters most to our engineers and the organisation. These areas can change year-on-year depending on organisational activity. Every business is different, and the areas you prioritise should reflect that.

Documentation and knowledge sharing

These practices address how quickly engineers can onboard to new projects or code areas. For consultancies, this becomes especially critical because project teams work in entirely different business domains and build very different software, making knowledge silos easy to form.

Can engineers search documentation to answer questions about how code works or why architectural decisions were made?
Or must they ask specific individuals, creating bottlenecks?

Technical standards

Technical standards should help organisations ship better code faster. Surveys provide a pulse check on whether standards are helping or hindering. Too many standards, poorly discoverable standards or frequently changing standards slow teams down rather than speed them up. On the other hand, clear, concise, easily discoverable standards speed teams up and improve code quality.

Agile processes and project delivery

Determine whether agile processes help teams work effectively, or whether they fail in practice.

Are stand-ups efficient or do they stretch to two hours because discussion is too detailed?
Do retrospectives happen?
Do teams action retrospective items?
Are teams continuously improving?

Additionally asking about project delivery can surface inefficiencies. For example, builds taking two hours because of massive queues, or deployments only being possible at specific times because they rely on one person.

Observability

This matters particularly when supporting software in production.

When bugs appear or production goes down, how do teams find out?
Do customers call to report problems, or do alerts arrive proactively, enabling teams to inform customers about issues and mitigation plans?
When bugs occur, can teams diagnose them quickly?

Strong observability, seeing exceptions logged and tracking paths through systems, enables rapid bug diagnosis, faster patches and reduced impact on end users.

Quality

This topic covers broad territory.

Do code reviews add value or slow teams down?
Are automated tests written?
Are those tests run regularly?
Do they deliver valuable feedback?
Do tests catch regression bugs?

Practices like code reviews and automated testing policies are important, but they can easily become bottlenecks. Asking engineers about the value they receive identifies where policies fail to add value and simply slow people down.

Tools, technologies, and practices

These often have the biggest impact on engineer job satisfaction. Engineers spend most days writing code, living in their IDEs, working with programming languages and frameworks. An engineer could face frustration if they work with outdated frameworks approaching end-of-support whilst colleagues on other teams use the latest frameworks. Identifying these pockets of dissatisfaction — particularly when responses are sliced by team or project — enables targeted improvements.

AI tools and adoption

Engineer sentiment around AI is increasingly worth measuring on its own terms. Attitudes vary widely from enthusiastic adopters to sceptics, and understanding where your teams sit helps inform decisions around tooling and training.

Useful questions to ask include:

Do engineers feel positively about the role of AI in engineering generally?
Do they think the organisation is adopting AI at the right pace?
Are engineers satisfied with the AI tools available to them?
Do they have sufficient quota within those tools to use them in their day-to-day work?

Conclusion

Engineer experience surveys provide a valuable mechanism for gathering feedback that metrics alone cannot capture. By customising surveys to organisational priorities and running them consistently, organisations can measure progress, identify systematic issues and make targeted improvements. This ongoing practice of measuring and improving engineer experience leads to better delivery outcomes, higher engineer satisfaction and stronger retention.

Watch the full talk

Author

Richard Brown is the Technical Director at Audacia, where he is responsible for steering the technical direction of the company and maintaining standards across development and testing.

AI-Assisted Engineering: Building the Foundations for Adoption and Scale

Audacia — Tue, 28 Apr 2026 07:30:00 +0000

Based on a Tech Talk delivered by Technical Director Richard Brown, this blog explores how to successfully implement AI-assisted engineering.

Rolling out AI to engineering teams successfully depends on three interconnected phases: establishing solid foundations, executing a structured rollout and committing to continuous improvement. Each phase builds on the previous one to create sustainable AI adoption that delivers real value. These observations come from our internal AI deployment at Audacia and supporting other organisations through similar transformations.

The stages of AI assisted engineering - foundations, rollout and improvement represented in a diagram.

What are the foundations of AI-assisted engineering?

Before rolling out AI-assisted engineering, three foundations need to be in place: a clear AI governance policy that defines what people can and can't do, solid delivery practices that AI can amplify rather than expose, and documented coding standards that can be fed directly into AI tools as context.

AI Governance

Research into AI adoption across organisations reveals that providing guardrails and governance strongly correlates with successful AI adoption. On the other hand, adoption rates drop when organisations impose no constraints. Without defined boundaries, people become nervous and uncertain about what they're allowed to do.

A key first step to laying the foundations for AI is publishing an AI policy, which removes this ambiguity. The policy should clarify acceptable use – which tools are approved, licensing requirements, privacy controls and IP indemnity policies – as well as defining overarching principles that must be followed when using AI tools. The principles we use at Audacia are:

Accountability: People remain responsible for the quality and accuracy of their work, including any AI generated output used.

Fairness: AI models can reflect biases in their training data, so output should be reviewed carefully wherever it influences decisions that affect people.

Maintainability: AI-generated code should meet the same standards for readability and structure as anything written by a human.

Privacy: AI tools must handle data appropriately, with clear retention policies and care taken over what is shared with external models.

Security: AI tools warrant particular scrutiny – tools must meet appropriate security standards, and applications built with AI capabilities need proper security testing.

Transparency: AI involvement should be clear to all stakeholders/consumers, whether in code, content or decisions.

Solid Delivery Practices

The annual DORA (DevOps Research and Assessment) report recently focused on AI-assisted engineering and its impact across the software industry. The key takeaway that stands out is that AI serves as an amplifier; robust, mature practices accelerate the positive impact of AI, and poor processes provide a blocker to successful adoption.

In practice this means that if work flows smoothly through the pipeline, code review and pair programming processes are robust, testing is comprehensive and deployments are quick, AI will generally make those things even better. However, if single points of failure exist and bottlenecks plague your processes, then AI will amplify those problems too. Making sure solid foundations exist before introducing AI ensures that the right practices get amplified.

Many of these solid delivery foundations represent established good practice. For example, small batch sizes matter – including committing often, submitting small pull requests, breaking down user stories to smaller chunks, and reducing lengthy feature branches and troublesome merge conflicts. Additionally, unit testing becomes critical when generating more code faster – a comprehensive automated test suite ensures increased velocity doesn't introduce regression bugs.

Lastly, recognising that software engineering represents just one part of the overall delivery lifecycle is crucial. Speeding up software engineers achieves nothing if bottlenecks exist elsewhere in the lifecycle – if test environments aren't fit for purpose or requirements can't be defined quickly enough, faster developers simply wait for requirements or work on the wrong things. Ultimately, the entire development lifecycle needs examination, not just isolated parts.

Documented Standards

AI-assisted engineering demands documented standards – without them, there is nothing to govern the code AI generates. If your organisation doesn't have established coding standards, this is a good opportunity to document them.

Coding standards need to be communicated effectively to AI tools. Having Copilot or Claude generate extensive code that engineers must immediately rewrite or refactor to fit coding standards creates inefficiency. To address this, your organisational standards should be part of the context used by large language models generate content. Technologies like MCP servers allow coding standards to be pulled into the LLM context window, helping the model understand what 'good' looks like.

Ideally, these foundational pieces should be in place before rollouts begin – that's not always feasible, so laying the foundations and rollout can happen in parallel. However, before rollout, at a minimum, the overall software development lifecycle needs examination, and people need information about what they are and aren't allowed to do with AI.

How do you roll out AI tools across an engineering team?

Rolling out AI tools requires choosing the right tool for each team's environment, with data security and longevity in mind, and establishing benchmarks to evaluate new tools consistently. Equally important is role-specific training that sets realistic expectations – using AI effectively is a skill that doesn't automatically come with technical ability.

Choosing the Right Tools

The obvious first step to rolling out AI is giving people access to the right AI tools – such as GitHub Copilot, Claude Code, etc. Choosing the right tools for the right team requires meeting engineers where they spend most of their time. For example, if developers work primarily in VS Code, Copilot often makes sense because of deep integration. Data security should also inform this choice – tools must have appropriate terms of service, clear data retention policies and transparent privacy policies.

Beyond the initial selection, tool choice needs to account for longevity. The AI landscape moves quickly, and the right tool today might not be the right tool in six months. This makes ongoing assessment a responsibility in itself – staying aware of new models and services as they emerge rather than treating the initial rollout as a settled decision.

To help with ongoing tool assessment organisations can establish benchmarks to test against. Measuring tools against dimensions like correctness (does it meet the requirement given), autonomy (how much rework is required), quality (does it generate good code), even token usage (how efficiently does it get to a solution), gives teams a reliable basis for comparison. A representative codebase or application – something that genuinely reflects the organisation's work – makes this practical. For example, when a new model or plugin ships it can be tested against something real rather than abstract criteria.

Training and Upskilling

Alongside providing access to tools, ensure guidance is provided on how best to use them. Whether called prompt engineering, vibe coding, or something else, using AI as a software engineer is a skill in itself – and the best software engineers aren't necessarily the best at using AI, and vice versa.

Any training should present a balanced attitude to AI. Opinions tend to be polarised – hype at one end, dismissal at the other – with the truth somewhere in the middle. AI has limitations, and sometimes stepping outside the AI loop entirely is needed. Without these realistic expectations, the first roadblock an engineer faces can lead to disillusionment and disengagement.

Making training role-specific matters too. General organisational training is useful, but software engineers, test engineers and other teams each need specific guidance. Like tooling, meeting people where they are is essential. Finally, training must evolve. As new employees join and tools update, the ability to deliver ongoing, up-to-date guidance is what turns a one-off rollout into lasting capability.

How do you sustain and improve AI adoption over time?

Sustaining AI adoption long-term comes down to culture, clarity and measurement. Building communities that encourage open knowledge sharing, defining where AI works well and where human judgement is still needed, and tracking delivery velocity and quality all ensure AI continues to add value rather than becoming shelfware or a crutch.

Building Communities

Embedding AI adoption into an organisation’s culture, through building passionate communities, is what will drive AI forward and continuously improve how it gets used. As part of this, AI adoption shouldn't be seen as a top-down mandate – instead everyone should have a voice. Newer engineers in particular grow into the industry as AI natives. This technology moves so fast and is so new that everyone has valid opinions worth hearing.

These communities should also promote knowledge sharing. Whether through existing communities of practice, online forums, Slack channels, Teams channels, etc., building communities around AI creates space for sharing success stories, discussing challenges and asking questions. To avoid knowledge silos forming, employ a "no stupid questions" culture, creating a space where everyone learns from each other. Otherwise, instead of moving forward as an organisation, only localised progress occurs.

Developing Use Cases

Developing specific use cases helps teams understand the boundaries of AI. There are some cases AI struggles with, and some it handles incredibly well. Tasks which are generally repetitive, somewhat tedious work can be effectively offloaded to AI agents. For example, AI has proven reliable for upgrading codebases from one framework version to another, assuming the upgrade path is well-documented.

However, understanding boundaries remains important. Where does manual intervention still matter? When developing use cases involving AI agents, how much freedom and control should those agents have? Where does a human need to remain in the loop? Defining these boundaries ties back to acceptable use policies.

Tracking Progress

Tracking key metrics is important in ensuring AI is being used effectively, but also that individuals aren’t over-depending on AI without critical thinking. Maintaining the ability to solve novel problems without becoming overly dependent is crucial. Key metrics to track include delivery velocity and quality – as AI adoption increases, hopefully the amount of value delivered increases too, and quality should, at minimum, remain constant.

Summary

Successful AI adoption in engineering requires more than tool access. It starts with clear policies, strong delivery practices and documented standards, moves into carefully chosen tools and realistic training, and succeeds long-term through knowledge-sharing communities, well-defined use cases and consistent progress tracking.

Above all, AI-assisted engineering is not a one-off endeavour. Treating adoption as a continuous process of revisiting foundations, refreshing training and reassessing tooling as the landscape shifts, is what separates organisations that sustain value from those where initial momentum fades.

Author

Richard Brown is the Technical Director at Audacia, where he is responsible for steering the technical direction of the company and maintaining standards across development and testing.

Watch the Tech Talk

A look at Microsoft Fabric

Audacia — Mon, 30 Mar 2026 09:00:00 +0000

Running data functions at large organisations might look like one set of tools for ingestion, another for storage, something else for transformation, a separate analytics layer, and a BI platform bolted on top. Each being the right choice at the time, however, collectively, they've become a problem.

This can end up with a setup where:

Data engineers spend their time preparing and packaging data to hand off to analytics teams
Analysts build reports in tools that sit outside the engineering environment, often working from copies or extracts rather than a single source of truth
Data scientists operate in yet another silo, pulling data into notebooks and models that live separately from everything else

Handoffs can lead to delays, and each disconnected tool can add governance complexity as well as creating a significant cumulative cost - in time, money, and organisational friction.

The industry has been moving towards platform consolidation for years, and the major cloud providers have all made progress in this direction. Microsoft's entry with Fabric represents an attempt to bring the entire data lifecycle, from raw ingestion to executive dashboard, into a single, unified environment.

For those evaluating where Fabric fits, the challenge is understanding what Fabric actually changes, who benefits most, and whether the strategic shift is worth pursuing.

Understanding the spectrum

Before assessing any platform, it helps to step back and consider where your organisation sits with regards to data structuring. Different businesses need different levels of sophistication, and understanding the different requirements helps clarify where Fabric adds value and where simpler solutions might still serve you well.

At the most straightforward level, simple databases serve a clear and important purpose. Relational databases like SQL Server or PostgreSQL handle structured data storage and retrieval for individual applications effectively. If your needs are transactional, such as powering a web application, managing customer records, or supporting a single product, a well-designed database does the job without unnecessary complexity. Many teams start here, and for contained use cases, there's no reason to move beyond it.

As organisations grow and the demand for cross-functional reporting increases, data warehouses become the natural next step. Platforms like Azure Synapse Analytics, Snowflake, or Google BigQuery are designed to aggregate data from multiple sources into a structured, optimised environment built for analytical queries. This is the traditional backbone of enterprise business intelligence where data is extracted from operational systems, transformed into consistent schemas, and made available for reporting and analysis. For organisations that need reliable, governed analytics across departments, a data warehouse remains a solid foundation.

The challenge arises when the warehouse alone is no longer enough. Modern data demands often include unstructured data, real-time streaming, machine learning workloads, and self-service analytics - none of which a traditional warehouse handles natively. This is where organisations start layering in additional tools such as a lakehouse for unstructured data, a Spark environment for data science, a separate streaming platform for real-time use cases, and a BI tool on top. Each addition solves a problem, but each also introduces another integration point, another security model to manage, and another team boundary to navigate.

Unified platforms like Microsoft Fabric sit at the far end of this spectrum. Rather than asking organisations to assemble their own stack from best-of-breed components, Fabric brings storage, engineering, warehousing, data science, real-time analytics, and business intelligence together in a single environment. For those operating at scale, with multiple data teams and increasingly complex requirements, the cost of maintaining a fragmented stack can become harder to justify.

Understanding where your organisation sits on this spectrum matters because the value of Fabric depends heavily on context. An organisation running a handful of straightforward reporting use cases may find a warehouse and Power BI perfectly sufficient. An organisation juggling data engineering, science, streaming, and BI workloads across five different platforms will feel the consolidation benefits immediately.

Microsoft Fabric: The umbrella explained

Fabric can be easy to misunderstand if you approach it as simply another Microsoft product release. In reality, it's an umbrella platform - a unified SaaS offering that brings together multiple previously separate data services under a common foundation.

At the base of everything sits OneLake, Fabric's unified data layer. OneLake acts as a single storage foundation for your entire organisation's data, regardless of whether that data is structured, semi-structured, or unstructured. Every service within Fabric reads from and writes to OneLake, which means there's one copy of the data, one set of access controls, and one lineage trail. This is the architectural decision that makes the rest of the consolidation possible. A shared data layer means the services built on top of it genuinely share a foundation, rather than simply being co-located.

Built on top of that foundation, Fabric consolidates several core services that organisations have traditionally sourced and managed independently.

Data Factory handles data integration and orchestration. If you're currently running ETL or ELT pipelines to move data between systems, Data Factory provides that capability natively within Fabric. It connects to a wide range of source systems and allows you to build, schedule, and monitor data movement and transformation workflows without reaching for a separate integration tool.

Data Engineering provides a Spark-based environment for large-scale data processing. Data engineers can work with notebooks and Spark jobs directly within the Fabric environment, processing large volumes of data without needing a standalone Spark cluster or a separate Databricks workspace. The data they process lives in OneLake, immediately accessible to every other service.

Data Warehousing delivers a T-SQL-based analytical data warehouse. For organisations with teams skilled in SQL, this provides a familiar interface for building and querying structured analytical models without the need to provision and manage separate warehouse infrastructure.

Data Science supports machine learning and advanced analytics workloads. Data scientists can build, train, and deploy models within the same environment where the data engineering and warehousing work happens. This reduces the friction that typically exists when models need to move between teams or when data needs to be extracted into separate science environments.

Real-Time Analytics addresses streaming and event-driven data. For organisations working with IoT data, application telemetry, or any use case that requires near-instant insight from data as it arrives, this service provides real-time ingestion and querying capabilities natively within the platform.

Power BI, already the dominant enterprise BI tool in many organisations, is integrated directly into Fabric rather than sitting alongside it as a separate product. Reports and dashboards connect directly to data in OneLake, with no need to extract, export, or duplicate data into a separate BI layer.

Data Activator adds an automation layer, allowing organisations to set up alerts and trigger actions based on data conditions. Rather than building custom monitoring solutions, teams can define rules that respond automatically when data meets certain thresholds or patterns.

Comparable tools for each of these capabilities exist elsewhere in the market. Where Fabric distinguishes itself is in the shared foundation. These services share OneLake, share a security model, share a governance framework, and share a licensing structure. They were built on a common platform rather than bundling together tools.

Data consolidation

For data teams, the most compelling argument for Fabric is often operational rather than technical. The way most large organisations currently work with data involves a series of handoffs between teams that can create unnecessary friction.

Consider a workflow of:

A data engineering team builds and maintains pipelines that ingest raw data from source systems, transform it, and load it into a warehouse or lakehouse. Once the data is structured and validated, it's made available, often through a separate access layer or export process, to an analytics team.
The analytics team then builds reports and dashboards in a BI tool like Power BI, Tableau, or Looker.
If a data science team is involved, they'll often pull data into yet another environment to build models, the outputs of which may then need to be fed back into the warehouse for the analytics team to report on.

This also creates an opportunity for data to drift out of sync, for definitions to diverge, and for governance to become fragmented. With each of these handoffs also being a potential point of failure, introducing latency, and requiring coordination between teams who may be using different tools, different interfaces, and different mental models of the same data.

Fabric's consolidation directly addresses this. When Power BI sits within the same platform as the data engineering and warehousing layers, the gap between "data is ready" and "report is built" shrinks dramatically. An analyst building a Power BI report in Fabric is working directly with data in OneLake, the same data the engineering team just processed, governed by the same access controls, with the same lineage. There's no export, no separate connection to configure and no waiting for data to appear in a different system.

Similarly, when data scientists work within the same environment, they can access the data they need directly rather than extracting it into a standalone notebook server or requesting access through a separate process. They work on the same platform, with the same data, subject to the same governance. The output of their models can be written back to OneLake and immediately consumed by BI reports or downstream applications.

Organisational roles and specialisms remain important in this model. Data engineering, analytics, and data science are distinct disciplines with distinct skills, and Fabric doesn't change that, however it does reduce the friction between them. Teams still specialise, but they collaborate on a shared platform rather than throwing work in silos between disconnected tools.

The governance implications are equally significant. In a fragmented stack, security and access controls need to be configured and maintained separately across each tool; data lineage is difficult to track end-to-end when data passes through multiple systems; and compliance reporting requires pulling information from multiple audit logs. However, in Fabric, a single security model covers the entire lifecycle. Access controls set at the OneLake level apply consistently whether the data is being accessed by an engineer in a Spark notebook, an analyst in Power BI, or a scientist in a machine learning experiment.

For organisations operating in regulated industries such as financial services, healthcare or the public sector, this unified governance model creates a significant reduction in compliance risk and audit complexity.

Considerations

Understanding what Fabric offers is a useful starting point. The harder work is deciding whether and how to adopt it. There are several strategic dimensions worth considering:

Market positioning

Fabric exists within a competitive landscape. Databricks offers a strong lakehouse platform with deep data science capabilities, while Snowflake provides a mature, cloud-agnostic data warehousing experience, and AWS has its own suite of data services. Each has genuine strengths, and the right choice depends on the specific context. Fabric's distinctive advantage lies in its breadth and native integration with the Microsoft ecosystem. If your organisation already runs on Azure, uses Microsoft 365, and has Power BI embedded across business teams, Fabric offers a consolidation path that leverages existing investments and skills. Organisations whose stacks are primarily built on AWS or GCP will need to weigh that integration benefit against the switching costs involved.

Migration reality

No large organisation is going to rip and replace its entire data infrastructure overnight, and Fabric doesn't require that. A more realistic approach is phased adoption - identifying workloads where consolidation delivers the most immediate value and starting there. Power BI teams that currently connect to external data sources can be a logical first choice, with data engineering teams managing complex pipeline orchestration across multiple tools being another. Starting with high-friction, high-visibility workloads can help to build internal confidence and demonstrate value before committing to broader migration.

Skills and team readiness

Fabric lowers certain barriers, such as analysts can do more without engineering support, and the shared environment reduces the need for manual handoffs. At the same time, adopting any new platform requires an investment in learning. Teams will need to understand OneLake's storage model, the nuances of each service within Fabric, and how governance works across the unified environment. Planning for this upskilling alongside the technical migration is essential.

Governance and compliance

For organisations in regulated sectors, Fabric's unified security and lineage model is a significant draw. Having a single place to manage access controls, audit data movement, and trace lineage from source to report simplifies compliance in a way that fragmented stacks struggle to match.

Platform maturity

Fabric is still evolving. Some components are more mature than others and Microsoft continues to ship updates and new capabilities at pace. Early adopters should be prepared for a platform that is moving quickly, with all the opportunity and occasional rough edges that brings. Evaluating Fabric today means accepting that some features may still be maturing while recognising that Microsoft's investment and trajectory suggest significant development ahead.

Summary

The fragmented data stack served its purpose for a long time. It allowed organisations to adopt best-of-breed tools for each stage of the data lifecycle and build capabilities incrementally. But the operational and strategic costs of maintaining that fragmentation are growing, and the expectations placed on data teams - to deliver faster, govern better, and do more with less - are only increasing.

Microsoft Fabric represents a credible path towards consolidation. By bringing the full data lifecycle under one roof, sharing a common data layer, and unifying governance across every workload, it addresses many of the friction points that data teams deal with daily.

Whether Fabric is the right move for an organisation depends on the current stack, their team's capabilities, and overall strategic direction. For data leaders already embedded in the Microsoft ecosystem and feeling the strain of a fragmented infrastructure, it can be a good option to evaluate.

Author

Chris is a Lead Data Scientist, with a background in astrophysics, and has over 4 years’ experience in providing data strategies insights using computational models and machine learning methodology. Chris has worked with a number of organisations across industries to successfully deliver AI projects, from PoC development and use case validation, through to model training and maintenance.

Testing AI: How to Effectively Evaluate LLMs

Audacia — Mon, 23 Mar 2026 10:00:00 +0000

Traditional software testing rests on a basic assumption that given the same input, the system produces the same output. A test case defines expected behaviour, and a test passes or fails based on whether the output matches. This assumption – deterministic behaviour with verifiable correctness – is the foundation on which decades of quality assurance practices have been built.

However, this can break down with large language models. An LLM may produce a different response to the same prompt on successive runs. Its outputs are sensitive to context, prompt phrasing, temperature settings and the interaction between retrieved documents and parametric knowledge. It can produce responses that are fluent, confident and completely wrong - a failure mode that traditional testing has no framework for detecting. And unlike a conventional software bug, which typically manifests consistently and can be reproduced, AI system failures are often probabilistic, context-dependent and difficult to predict.

For engineering leaders, this creates a new problem. Organisations are deploying LLM-powered features at pace, such as customer-facing chatbots, internal knowledge assistants, AI-augmented search, automated document processing, coding assistants and increasingly autonomous agentic workflows. However, the testing and evaluation practices for these systems are struggling to keep up.

The World Quality Report 2025, surveying over 2,000 senior executives across 22 countries, found that hallucination and reliability concerns are now among the top barriers to generative AI adoption in quality engineering, cited by 60% of respondents - a challenge that barely registered two years ago.

This article looks at what testing looks like for AI systems, why it is fundamentally different from traditional software testing, and how organisations can build the evaluation capability required to deploy LLMs responsibly.

Why Traditional Testing Fails for AI Systems

The differences between testing traditional software and testing AI systems are not differences of degree but of kind.

In conventional software, correctness is binary. A function either returns the right value or it does not. Test cases can enumerate expected input-output pairs, and 100% pass rates are achievable and expected. The system under test is deterministic - run the same test twice, get the same result. And when a test fails, the failure is reproducible, allowing engineers to diagnose and fix the root cause.

Little of these properties hold for LLM-powered systems. There is no single "correct" response to most natural language queries. A question about company policy might have multiple valid phrasings, levels of detail and degrees of nuance. The system is non-deterministic by design (temperature and sampling parameters introduce controlled randomness). And failures, such as hallucinations, reasoning errors, safety violations and biased outputs, may occur intermittently, triggered by specific combinations of context, phrasing and retrieved information that are difficult to anticipate or reproduce.

This means testing AI systems is an evaluation discipline rather than a verification discipline. Instead of asking "does this pass or fail?", organisations must ask "how well does this system perform across a range of scenarios, and is the distribution of performance acceptable for our use case?" This requires statistical thinking, domain-specific quality criteria and continuous evaluation rather than one-off test suites.

The Hallucination Problem: Scale and Consequences

Hallucination - where an LLM generates content that is fluent and confident but factually incorrect or unsupported by source material - is the most visible failure mode and the one that most concerns enterprise adopters.

Vectara's Hallucination Leaderboard, which benchmarks LLMs for factual consistency in summarisation tasks, found that even frontier reasoning models, including GPT-5, Claude Sonnet 4.5, Grok-4, and DeepSeek-R1, all exhibited hallucination rates exceeding 10% on their updated, more challenging benchmark. The recently released Gemini-3-pro demonstrated a 13.6% hallucination rate and did not make the top-25 list.

These are the best available systems, evaluated on a straightforward summarisation task, not adversarial conditions or edge cases.

The academic community is also grappling with how to define and categorise hallucinations consistently. The HalluLens benchmark, presented at ACL 2025, identified a fundamental challenge in existing benchmarks often conflating hallucination with factuality, despite these being distinct problems requiring different evaluation approaches. HalluLens proposes a taxonomy distinguishing between extrinsic hallucinations (where generated content deviates from or contradicts source material the model had access to) and intrinsic hallucinations (where the model contradicts its own earlier outputs). This distinction matters for enterprise applications because the mitigation strategies differ, with extrinsic hallucination being a retrieval and grounding problem, while intrinsic hallucination is a consistency and reasoning problem.

The real-world consequences of inadequate hallucination testing are already visible and increasingly costly.

Air Canada lost a legal case after its chatbot fabricated a bereavement discount policy that did not exist – the airline was held liable for the AI's invention.
New York City's public-facing chatbot provided illegal advice to business owners about regulatory requirements.
And a GPTZero analysis of over 4,000 papers accepted at NeurIPS 2025 found that dozens contained fabricated AI-generated citations – invented authors, titles and journals that passed peer review undetected.

These incidents share a common root cause in systems being deployed without adequate evaluation of their failure modes under realistic conditions.

What LLM Evaluation Looks Like

Practitioners are converging on a multi-dimensional evaluation approach that moves well beyond traditional pass/fail testing. The emerging consensus spans at least seven dimensions: accuracy, safety, bias, hallucination, robustness, latency and security. Each requires different evaluation methods, and the relative importance of each dimension varies by use case – a customer service chatbot has different critical dimensions than a code generation tool or a medical information system.

Benchmark suites

Benchmark suites are the most familiar evaluation approach, adapted from academic AI research. Standardised benchmarks test model capabilities across reasoning, knowledge, coding and other dimensions. However, generic benchmarks have significant limitations for enterprise use. Many models now saturate standard benchmarks like MMLU (exceeding 90% accuracy), which has driven the development of harder alternatives. More fundamentally, a model's score on a general benchmark tells you little about how it will perform on your specific domain, data and use cases. Organisations deploying LLMs need domain-specific evaluation datasets that reflect the actual questions their users ask, the documents their RAG systems retrieve, and the edge cases their particular deployment will encounter.

LLM-as-judge approaches

LLM-as-judge approaches use one language model to evaluate the outputs of another. This approach is both practical and scalable, allowing automated evaluation of thousands of responses without human reviewers, with tools like DeepEval and RAGAS making this accessible. But the approach does have an inherent risk. If both the generating model and the evaluating model are prone to hallucination, they may reinforce each other's errors, creating what researchers describe as a "hallucination echo chamber." Effective LLM-as-judge implementations mitigate this through multi-model consensus (using several different models as judges), structured evaluation rubrics that constrain the judge's assessment to specific, verifiable dimensions and periodic calibration against human judgement.

Red-teaming and adversarial testing

Red-teaming and adversarial testing deliberately probe the system for failure modes. This includes testing for prompt injection (where adversarial inputs manipulate the model's behaviour), safety violations (where the model produces harmful or inappropriate content), and edge cases where the model's confidence exceeds its accuracy. Red-teaming is particularly important for customer-facing AI systems, where an adversarial user may deliberately attempt to exploit the system. The EU AI Act explicitly requires adversarial testing for general-purpose AI models, making this a compliance requirement rather than a best practice.

Human evaluation

Human evaluation remains essential for high-stakes use cases. Automated metrics cannot fully capture whether a response is genuinely helpful, appropriately nuanced, or safe in context. Human evaluation is expensive and slow, which makes it impractical for comprehensive testing, but it serves a critical role in calibrating automated evaluation systems and validating performance on the most important and sensitive scenarios.

Continuous evaluation in production

Continuous evaluation in production closes the loop. Unlike traditional software where testing occurs before deployment, AI systems require ongoing monitoring because their performance depends on inputs that cannot be fully anticipated. This includes tracking hallucination rates on real user queries, monitoring for distribution shift (where the types of questions users ask diverge from what the system was evaluated on), and collecting user feedback to identify failure patterns that pre-deployment testing missed.

Testing RAG Systems: Where Retrieval Meets Generation

Retrieval-augmented generation (RAG), where an LLM's responses are grounded in documents retrieved from an organisational knowledge base, is the most common enterprise LLM deployment pattern. It is also where testing becomes particularly nuanced, because failures can originate in the retrieval step, the generation step or the interaction between the two.

A RAG system can fail in several distinct ways. The retrieval component may return irrelevant documents, missing the information needed to answer the query. It may return relevant documents but rank them poorly, burying the critical information below less relevant content. The generation component may ignore the retrieved context and rely on its parametric knowledge instead, producing a plausible but ungrounded answer. Or it may hallucinate details that are not present in any of the retrieved documents, fabricating specifics while appearing to cite its sources.

Testing RAG systems therefore requires evaluating each component independently and the system as a whole. Retrieval quality can be measured through precision (what proportion of retrieved documents are relevant?) and recall (what proportion of relevant documents are retrieved?). Generation quality requires checking faithfulness (does the response accurately reflect the retrieved content?), relevance (does the response actually answer the question?) and completeness (does it include all pertinent information from the retrieved documents?).

The challenge is that these evaluations require ground-truth datasets specific to the organisation's knowledge base and user queries. Off-the-shelf benchmarks do not test whether your RAG system correctly answers questions about your company's policies, products or processes. Building these evaluation datasets, such as curating representative questions, establishing correct answers and maintaining them as the knowledge base evolves, is one of the most labour-intensive but essential aspects of AI testing. Enterprise research has found that content quality and organisation within the knowledge base itself often has a larger impact on RAG performance than the choice of model or retrieval architecture, which means testing must extend to the data layer, not just the AI components.

Testing Agentic AI: The Next Frontier

The testing challenge compounds further as organisations move from simple question-answering systems to agentic AI – systems that can plan multi-step tasks, use tools and take actions in the real world. An agentic workflow might involve an AI system that receives a customer request, retrieves relevant information from multiple sources, reasons about the best course of action and executes a series of steps (updating a database, sending a communication, triggering a workflow) with minimal human intervention.

Testing agentic systems requires evaluating not just the quality of individual outputs but the correctness of entire decision chains. Does the agent correctly decompose a complex task into appropriate sub-tasks? Does it select the right tools for each step? Does it handle errors and unexpected conditions gracefully? Does it know when to escalate to a human rather than proceeding autonomously?

These questions go beyond hallucination testing into territory that more closely resembles integration testing and end-to-end workflow validation. However, with the added complexity that the system's behaviour is non-deterministic and its decision-making is opaque.

The real-world consequences of inadequate agentic AI testing have already surfaced: in one widely reported incident, an autonomous AI coding agent deleted a company's primary database during a self-directed "cleanup" operation, violating a direct instruction prohibiting modifications. The root cause was not a hallucination but a reasoning failure, where the agent decided that a database cleanup was appropriate despite an explicit code freeze instruction, and no separation existed between test and production environments.

For engineering leaders, agentic AI testing demands a combination of traditional integration testing principles (test the workflow end-to-end, validate boundary conditions, verify error handling) with AI-specific evaluation (assess the quality of the agent's reasoning, its compliance with guardrails and its behaviour under adversarial or unexpected conditions). Sandbox environments with realistic but non-production data become essential, as does the ability to replay and analyse the agent's decision chain after the fact.

The Regulatory Dimension

The regulatory environment is adding both urgency and specificity to AI testing requirements.

The EU AI Act, now entering enforcement, establishes graduated testing obligations based on risk classification. High-risk AI systems, which include those used in employment, credit decisions, education and critical infrastructure, require comprehensive testing for accuracy, robustness, cybersecurity and non-discrimination before deployment, with ongoing monitoring obligations thereafter.

General-purpose AI models face model evaluation requirements including adversarial testing. Organisations deploying LLM-powered features must be able to demonstrate that they have tested their systems against these criteria – a compliance requirement that many have not yet begun to address.

The UK's approach differs in structure but converges in its implications. Rather than prescriptive legislation, UK regulators are applying existing regulatory frameworks, through the FCA, ICO, CMA and sector-specific regulators, to AI systems within their remit. The ICO's guidance on AI and data protection, for instance, requires organisations to demonstrate that AI systems processing personal data are accurate, fair and transparent. The practical effect is similar in that organisations must be able to evidence that they have evaluated their AI systems' behaviour against relevant quality and safety criteria.

The EU Cyber Resilience Act adds another layer for AI-powered software products, requiring that products be developed according to secure-by-design principles, free from known exploitable vulnerabilities and supported by ongoing security updates. For AI systems that interact with external inputs (user queries, retrieved documents, API calls), this implies testing for adversarial inputs, prompt injection and data leakage – categories that traditional security testing does not cover.

Building AI Testing Capability

Perhaps the most practical challenge facing engineering leaders is where AI testing capability should sit organisationally and what skills it requires.

AI evaluation requires a blend of competencies. It requires understanding of ML evaluation methodology, such as benchmark design, statistical analysis of non-deterministic outputs and evaluation metric selection. It also requires domain expertise to define what "correct" means for specific use cases – a question that is ultimately a business judgement rather than a technical one. As well as this, it requires prompt engineering capability to design effective evaluation prompts and adversarial test cases. And lastly, it requires the infrastructure skills to build and run evaluation pipelines at scale, integrate monitoring into production systems, and maintain evaluation datasets as the system and its usage evolve.

Some organisations are embedding this capability within existing QA teams, extending their remit to encompass AI evaluation alongside traditional testing. Others are building dedicated AI quality or AI evaluation functions, sometimes within ML engineering teams, sometimes as standalone roles. Neither approach has emerged as clearly superior. The right answer depends on the organisation's AI maturity, the scale and criticality of its AI deployments, and whether the dominant challenge is evaluation methodology (which favours ML expertise) or integration with existing quality processes (which favours QA expertise).

What is clear is that there is a skills gap. The World Quality Report found that 50% of organisations lack AI/ML expertise, unchanged from the prior year, and that generative AI has emerged as the single most in-demand skill for quality engineers (63%), ahead of core quality engineering fundamentals (60%).

PractiTest's State of Testing 2026 data reinforces this from the practitioner perspective. Testing professionals who actively use AI tools are significantly less anxious about their future and earn a measurable salary premium, suggesting that the market is already pricing in AI evaluation capability.

From Optional to Essential

The window during which AI testing could be treated as an emerging discipline is closing. Organisations are deploying LLM-powered systems into production, customers and employees are interacting with them daily, and the failure modes are documented and increasingly expensive.

The hallucination rates are quantified, with even frontier models exceeding 10% on rigorous benchmarks. The regulatory requirements are specific, with the EU AI Act mandating testing that most organisations cannot yet perform. And the deployment patterns are growing more complex, with RAG systems compounding retrieval and generation failures, while agentic workflows are introducing autonomous decision-making with real-world consequences.

The Veracode research on AI-generated code security showed the same pattern – newer, larger models do not produce more secure code, highlighting that these are not problems that will be solved with the next model release. Instead, teams require exploration and investment into testing capability, evaluation infrastructure and the organisational capacity to assess and manage the risks inherent in deploying probabilistic systems.

Author

Richard Brown is the Technical Director at Audacia, where he is responsible for steering the technical direction of the company and maintaining standards across development and testing.

Why AI Governance is Key to Scaling AI

Audacia — Mon, 16 Mar 2026 10:00:00 +0000

Governance is the aspect of AI that most reliably triggers resistance from delivery teams. The perception, which can often be well-founded in experience, is that governance means delays, committees, paperwork and risk management leading to blockers.

This perception is understandable but can lead to significant risk. Understandable, because many organisations have governance frameworks that aren’t necessarily suited to the iterative, experimental nature of AI development. However the absence of governance does not eliminate risk, but rather it means that risks are often uncovered in production, where the consequences are most severe and the cost of remediation is highest.

The organisations that are scaling AI successfully have resolved this tension - not by choosing between speed and governance, but by fundamentally rethinking what governance means in the context of AI. They have made it proportionate, embedded and automated, with the evidence showing that this approach can help to accelerate delivery, not slow it down.

The Cost of Ungoverned AI

McKinsey's 2025 State of AI survey found that 51% of organisations report at least one negative AI-related incident in the past 12 months. The most commonly cited incidents involved inaccuracy, followed by compliance failures, reputational damage, privacy breaches and unauthorised actions by AI systems.

These are risks affecting the majority of organisations deploying AI at any meaningful scale, and they are growing. The average organisation is now actively managing around four types of AI risk, up from approximately two in 2022, with inaccuracy, cybersecurity, privacy and regulatory risk most frequently addressed. Explainability – the ability to understand and explain why an AI system produced a particular output – stands out as a risk that many organisations experience but fewer have robust controls for.

Deloitte's 2026 State of AI in the Enterprise report adds a governance dimension specific to the emerging agentic AI frontier, with only one in five companies having a mature governance model for autonomous AI agents. As AI systems move from answering questions to taking independent action, the governance gap becomes a genuine operational risk.

The business case for governance is fundamental to building the organisational trust required to scale AI beyond pilots. Without governance, boards can hesitate to approve production deployment, business stakeholders can question the reliability of AI outputs, regulators can ask questions that cannot be answered, and individual AI initiatives that might otherwise create value remain confined to sandboxes because teams don’t have the confidence to release them.

The EU AI Act: A New Regulatory Baseline

The most significant regulatory development for enterprise AI is the EU AI Act – the first comprehensive AI legislation globally. Its phased implementation timeline is now well underway and directly affects any organisation operating in or serving EU markets.

The Act entered into force on 1 August 2024. Within this, prohibited AI practices – including social scoring and certain forms of biometric categorisation – have been banned since February 2025. Obligations for general-purpose AI (GPAI) models, including transparency and documentation requirements, became applicable in August 2025. The penalty regime is now active, with fines of up to €35 million or 7% of global turnover for prohibited practices, and up to €15 million or 3% for other infringements.

The most consequential deadline for enterprises is August 2026, when the comprehensive compliance framework for high-risk AI systems takes effect. This covers AI used in areas including biometrics, critical infrastructure, education, employment, essential services, law enforcement and border management. Organisations deploying AI in these domains will need to demonstrate risk management systems, data governance measures, technical documentation, human oversight mechanisms and conformity assessments.

For UK organisations, the Act has extraterritorial reach - if the output of an AI system is used within the EU, the obligations apply regardless of where the provider is based. Any UK enterprise with EU customers, operations or supply chain connections should therefore understand and plan for compliance.

The UK Approach: Principles-Based but Tightening

The UK has deliberately chosen a different path from the EU's prescriptive legislation. As of early 2026, the UK has not adopted a single cross-economy AI law. Instead, it relies on existing sector regulators to apply current frameworks to AI within their domains – a principles-based, outcomes-focused approach.

In financial services – the UK sector furthest advanced in AI adoption – this approach is well-articulated. The FCA confirmed in December 2025 that it will not introduce AI-specific rules, citing the technology's rapid evolution. Instead, it relies on existing frameworks including the Consumer Duty, Senior Managers and Certification Regime (SM&CR), and operational resilience requirements. FCA's position is that these technology-agnostic frameworks already cover the key risks associated with AI deployment – accountability, transparency, consumer protection and resilience.

The Bank of England and FCA's third survey of AI in UK financial services, published in November 2024, found that 75% of firms are already using AI, with a further 10% planning to adopt within three years. Foundation models account for 17% of use cases, though most deployments remain low materiality. Lloyds' 2025 Financial Institutions Sentiment Survey reported that 59% of institutions now see measurable productivity gains from AI, up from 32% a year earlier.

But "principles-based" does not mean "relaxed." The FCA's Chief Data Officer has noted that advances in AI may require modified approaches to firm risk management and governance, and that regulation will need to adapt. The Treasury Committee published a report on AI in financial services in January 2026, examining both opportunities and risks. And the UK government appointed two AI Champions for financial services – signalling that regulatory attention is intensifying.

For organisations outside financial services, the landscape is less codified but no less important. The ICO's existing guidance on automated decision-making under UK GDPR applies to any AI system that processes personal data. Sector-specific regulators in healthcare (MHRA, CQC), energy (Ofgem), and other domains are developing their own positions. And the UK government's AI Opportunities Action Plan, published in early 2025, signals a direction of travel toward greater expectations around safety, transparency and accountability – even without prescriptive legislation.

The practical implication for UK enterprises is that the absence of an AI-specific law does not mean the absence of regulatory obligation. Existing frameworks already create accountability for AI outcomes, and the direction of travel – both domestically and through the extraterritorial reach of the EU AI Act – is clearly toward greater scrutiny.

Three Principles for AI Governance

The organisations succeeding with AI governance share three design principles that distinguish their approach from the heavyweight, process-oriented governance models that have historically frustrated delivery teams.

Proportionate governance

Proportionate governance calibrates the level of oversight to the level of risk. Not every AI application carries the same risk profile. A model that recommends internal knowledge articles requires a fundamentally different governance posture than a model that makes credit decisions or informs clinical diagnoses.

A practical risk-tiering framework – typically three or four tiers – allows low-risk use cases to move quickly with lightweight review, while high-risk applications receive the scrutiny they demand. The key dimensions for tiering include: the impact on individuals if the model produces an incorrect output, the regulatory sensitivity of the domain, the degree of human oversight in the workflow and the nature of the data being processed (particularly personal or sensitive data). This approach avoids the bottleneck of treating every AI initiative as though it were mission-critical.

Embedded governance

Embedded governance builds compliance checks into the development process rather than imposing them as a gate at the end. This includes bias testing as part of model evaluation, data privacy assessments as part of pipeline design, explainability requirements as part of model selection and risk assessment as part of use case approval.

When governance is embedded, it does not create a bottleneck at deployment. Instead, it prevents the far more costly rework that comes from discovering compliance issues after a model has been built, tested and handed to the operations team. The shift is from governance as a stage gate to governance as a continuous practice – present throughout the development lifecycle, not concentrated at a single approval point.

Automated governance

Automated governance leverages tooling to enforce standards without human bottlenecks. Automated checks for data quality thresholds, model performance metrics, bias indicators and audit logging can be built into CI/CD pipelines, ensuring that governance is consistently applied without requiring manual review for every model update or retraining cycle.

Cisco's AI Readiness Index found that 97% of the most AI-ready organisations ("Pacesetters") deploy AI at the scale and speed necessary to realise value, compared to just 41% overall – and that 84% of these Pacesetters have comprehensive change management plans, versus 35% of all companies. This highlights that governance and speed are not in tension for the most advanced organisations, they can in fact be mutually reinforcing.

Building the Governance Framework: Components

For organisations looking to establish or strengthen their AI governance, several components form the foundation.

An AI risk register and use case inventory

Before governance can be applied proportionately, the organisation needs visibility into what AI is being used, where and at what risk level. This sounds quite simple, but many organisations – particularly those where AI adoption has been bottom-up and decentralised – lack a comprehensive view of their AI estate. The inventory should capture each use case, its risk tier, its data sources, its intended users and its current lifecycle stage.

Clear roles and accountability

Governance requires named individuals accountable for AI risk. In the UK financial services context, the SM&CR already provides this structure – the Senior Manager responsible for AI outcomes is personally accountable. Outside regulated sectors, the principle still applies: someone senior must own AI governance, with authority to approve, escalate or halt deployments based on risk assessment.

Model documentation standards

Each AI model in production should be accompanied by documentation covering its purpose, training data, performance metrics, known limitations, bias assessments and monitoring arrangements. This documentation serves multiple purposes – it enables effective oversight, supports regulatory compliance, facilitates knowledge transfer when team members change and provides the audit trail that boards and regulators increasingly expect.

Monitoring and incident management

Governance does not end at deployment. Production AI systems require ongoing monitoring for model drift (degradation in performance as real-world data diverges from training data), data quality issues, emerging biases and unexpected behaviours. A clear incident management process – defining how AI-related issues are detected, escalated, investigated and remediated – is essential, particularly given how many organisations have already experienced at least one negative AI incident.

Regular review and adaptation

The governance framework itself should evolve. The regulatory landscape is changing rapidly – the EU AI Act's high-risk obligations take effect in August 2026, UK regulatory expectations continue to sharpen, and the technology itself is advancing at pace. A governance framework designed for today's AI capabilities will need updating as agentic systems, multimodal models and new deployment patterns continue to evolve.

Governance as Competitive Advantage

It is tempting to view governance as a cost centre – an overhead imposed by regulators and risk committees that can add little to the value AI delivers.

However, governance is what gives the board confidence to approve production deployment, as well as allow the use of AI in customer-facing and decision-critical contexts rather than confining it to internal experimentation. And it prevents the compliance complexity when regulatory expectations tighten.

BCG's research found that AI leaders follow a 10-20-70 resource allocation: 10% to algorithms, 20% to technology and data and 70% to people and processes – the category that includes governance, change management and organisational readiness. With the organisations investing most heavily in governance the same ones generating the most value from AI.

The lack of governance can be one of the main reasons that AI projects stall. It can lead to eroding trust, increased compliance risk, rework, ultimately keeping promising AI initiatives confined to sandboxes. However, if governance is built it in from the start, proportionate to risk, embedded in the development lifecycle and automated where possible, it can be the element that leads to production success.

Author
Chris is a Lead Data Scientist, with a background in astrophysics, and has over 4 years’ experience in providing data strategies insights using computational models and machine learning methodology. Chris has worked with a number of organisations across industries to successfully deliver AI projects, from PoC development and use case validation, through to model training and maintenance.

Managing Hidden Waterfalls in Legacy Modernisation Projects

Audacia — Mon, 09 Mar 2026 08:30:00 +0000

Why agile delivery fails in legacy heavy environments without structural preparation.

Agile remains the dominant model for modern software delivery for good reasons. Iterative development, fast feedback loops and the ability to adapt to new information are essential in complex, evolving systems. However, when agile is introduced into legacy-heavy organisations without accounting for institutional constraints, its effectiveness can diminish over time. What begins as an agile programme can often shift imperceptibly into a sequential delivery model beneath the surface.

In 2019, 29% of organisations reported using waterfall delivery models. By 2022, that figure had risen to 43%. Not because teams chose to abandon agile, but because the environments they were delivering into quietly forced the shift.

Teams start with discovery and prototyping, iterate rapidly and validate assumptions early. But as delivery progresses, unaddressed constraints begin to emerge, such as undocumented legacy behaviours, regulatory edge cases or operational workarounds that were never captured as formal requirements. At this point, the legacy system reasserts itself as a source of truth.

Agile ceremonies may continue, but the programme becomes more reactive. The goal can subtly shift from solving user problems to reproducing historical behaviour. What remains is a hybrid model becoming agile in appearance, yet waterfall in substance - the hidden waterfall.

Legacy Replacement as High-Risk

The data on legacy modernisation is consistent. These programmes fail more often, and more visibly, than greenfield initiatives.

A review of ERP project outcomes by KPC Team places failure or severe underperformance rates between 55% and 75%, depending on scope and definition. With ERP Today highlighting testing, data quality and scope volatility as common points of failure.

Data migration projects carry even higher risk. According to Oracle, over 80% of data migration initiatives either overrun, underdeliver or fail entirely. Most often due to undocumented dependencies and inadequate validation. Factors such as schema drift, semantic inconsistencies and legacy entanglement are reported as persistent blockers to successful transformation.

Most digital transformation efforts in large organisations are not pure greenfield builds. They are legacy replacement or coexistence programmes - subject to all the structural, technical and operational complexity that entails.

Why Legacy Becomes the Specification

A common misstep in legacy modernisation is the assumption that existing systems simply encode outdated implementations of known requirements. In practice, legacy systems carry decades of organisational memory, much of it undocumented.

Research in requirements engineering reveals several persistent patterns in legacy systems:

Business logic is embedded in code rather than documentation
Exceptions are handled through hidden branches or procedural workarounds
User behaviours evolve around system constraints, becoming de facto requirements

When delivery teams attempt to define new system requirements without examining these embedded behaviours, they quickly encounter gaps. At that point, the legacy platform is no longer a background dependency - it becomes the only available reference model.

Studies published in IJACSA and Springer show that late discovery of implicit requirements is a leading cause of rework. In legacy replacement programmes, these “requirements” were never made explicit because they were never formally captured.

This is a structural outcome of relying on systems that evolve without parallel investment in shared knowledge.

Water-Scrum-Fall

In 2011, Forrester introduced the term “Water-Scrum-Fall” to describe hybrid delivery models in which agile practices are embedded between upfront planning and downstream release governance. More than a decade later, this pattern persists, and if anything, it has increased.

CIO Dive reported in 2022 that 43% of organisations still use waterfall models, up from 29% in 2019, with compliance, assurance and funding structures cited as the main reasons. KnowledgeHut’s 2025 State of Agile found that agile adoption is now stagnating or reversing in many enterprise environments, with hybrid models becoming the norm.

These regressions are rarely ideological. Most organisations want to be agile. But delivery becomes sequential by structure.

Why Waterfall Reappears by Default

Several factors can pull agile programmes toward waterfall behaviours:

Funding cycles require fixed scope and budget commitments before discovery
Governance models rely on stage gates, rather than continuous assurance
Supplier contracts focus output completion over outcome delivery
Compliance processes are serial in nature, with formal sign-offs and audit trails

In the public sector, the National Audit Office has repeatedly highlighted how legacy estates, inflexible procurement and capacity gaps create barriers to agile working. The State of Digital Government Review 2025 confirms that many central government services still rely on systems more than two decades old, with modernisation constrained by high operational risk and fragile dependencies.

In this environment, teams may adopt agile practices within their sprint cycles, but the programme remains governed by linear constraints, creating hidden waterfalls.

Recognising Hidden Waterfalls Before They Set In

Hidden waterfalls rarely announce themselves. They emerge gradually, often masked by functioning agile rituals. But several indicators can signal that a programme has shifted from iterative delivery to sequential progression:

Sprint goals are increasingly defined by legacy parity rather than user outcomes. Backlog items begin to reference "the old system does X" as the primary acceptance criterion, rather than solving a validated user need.
Discovery stops but requirements keep growing. The team completed a discovery phase early in the programme, but new requirements continue to surface from legacy behaviours that were never formally captured. Each one is treated as an exception rather than evidence of a structural gap.
Release planning compresses into a single milestone. Despite iterative development, the programme converges on a single go-live date with limited rollback options, often driven by contract, funding or political commitments rather than technical readiness.
Testing becomes regression-dominant. The majority of test effort shifts toward proving that the new system reproduces existing behaviour, rather than validating that it meets redefined needs.
Stakeholder confidence depends on sign-off, not evidence. Progress is measured by stage-gate approvals and documentation completeness rather than working software, user feedback or operational metrics.

None of these indicators are necessarily failures in their own right. However, when several appear together, they suggest the programme has structurally reverted to sequential delivery, regardless of the methodology it reports.

Modernisation Approaches

Projects that aim to replace legacy systems in a single release, often called “big bang” delivery, assume considerable risk. These programmes concentrate delivery dependencies, limit rollback options, and make data migration a single-point failure.

Incremental modernisation strategies can offer a more resilient alternative. Patterns such as parallel run, feature toggles, coexistence architectures and the strangler fig pattern allow systems to be evolved rather than replaced outright.

In one survey, 79% of developers said the strangler pattern reduced project risk, primarily because it isolates change and supports rollback. Incremental delivery also aligns better with governance and assurance frameworks - supporting progressive certification, staged user validation and controlled data migration.

In regulated environments, these approaches can reduce disruption and support operational continuity. They also provide decision-makers with clearer evidence of progress and outcomes at each stage.

Balancing Ambition with Legacy Constraints

Preparing for hidden waterfalls is not an argument for replicating legacy systems. It is a call to interrogate them more rigorously, and to distinguish between what must be retained and what can be rethought.

This involves:

Identifying which behaviours are regulatory, contractual or operationally essential
Separating business-critical rules from historical conveniences
Defining the minimum viable increment that preserves service capability while allowing change
Designing systems that support evolution rather than frozen replication

These assessments are less suited to be completed through workshops or documentation. Instead they require early and direct engagement with legacy systems, their data models, codebases, interface behaviours and operational roles.

The Role of AI in Surfacing Constraints

AI-assisted tooling offers practical support in navigating legacy complexity. When applied responsibly, these tools can help teams:

Analyse code to extract business rules and logic paths
Identify unused or redundant code segments
Map dependency chains and integration points
Generate automated tests to capture existing system behaviours

In environments where documentation is sparse and institutional memory has faded, these tools can reduce the time and effort needed to understand legacy systems. Oracle’s whitepaper notes that poor understanding of legacy code is a major cause of data migration failure, an area where AI-driven code analysis can make a measurable difference.

However, it is important to view AI as an enabler, not a decision-maker. Tools can help surface logic and dependency, but they can struggle to decide which behaviours remain relevant or valuable. That task requires domain knowledge, user insight and human judgement.

Preparing for Hidden Waterfalls

Effective preparation involves a combination of technical, governance and delivery decisions:

Map structural constraints early, particularly data, regulatory and legacy integrations
Treat legacy systems as evidence, not default specifications
Select modernisation approaches that allow co-existence and rollback
Align governance and assurance models to tolerate incremental delivery
Use AI tools to reduce manual analysis effort and highlight legacy dependencies

The objective is not necessarily to eliminate all waterfall elements but to make them visible and manageable. Programmes that fail to do this often discover late in delivery that they are operating under assumptions that no longer hold, or that were never articulated to begin with.

Why This Step Is Important

Hidden waterfalls are a predictable outcome of unaddressed structural constraints that agile methods alone are not enough to resolve.

Acknowledging this reality early allows teams to structure programmes that are responsive, transparent and recoverable. It enables more realistic delivery planning, supports operational continuity and improves trust between teams and stakeholders.

This step becomes particularly important when delivery timelines are fixed, data quality is uneven or regulatory scrutiny is high.

Author

Matt Cross is a Lead Business Analyst at Audacia. Matt has a background in leading requirements workshops, defining acceptance criteria for requirements and supporting stakeholders throughout the project lifecycle – on both consultancy and development projects across engineering, data, AI and cloud.

Serverless Architectures: Designing for Scale, Simplicity and Resilience

Audacia — Mon, 19 Jan 2026 08:30:00 +0000

Modern applications face a constant tension between competing architectural demands: systems must scale efficiently, remain highly available, perform well under load and be maintainable without excessive operational overhead.

This blog, adapted from a Tech Talk by Principal Software Engineer, Luke Mitchell, explores cases where serverless architectures can address these requirements by shifting infrastructure management to cloud providers, allowing development teams to focus on building features rather than managing servers.

Scaling Strategies: Horizontal vs Vertical

Understanding scaling approaches provides the foundation for appreciating serverless benefits. Vertical scaling involves adding resources to a single instance - more CPU, RAM or storage. A down-side of this approach is that it is limited by a single point of failure. Meaning when that machine goes down, the entire service becomes unavailable.

Horizontal scaling takes a different approach by adding more instances of the same machine. This design provides built-in fault tolerance because multiple machines handle requests simultaneously. Therefore, if one machine fails, others continue serving traffic. This redundancy makes horizontal scaling more resilient than vertical scaling, though horizontal scaling can include more complexity in orchestration and load distribution.

Case Study 1: From Monolithic Functions to Distributed Processing in Azure

The overhaul of this file processing system demonstrates how serverless patterns transform architecture. The initial implementation, as shown above, used a single Azure Function that continuously ran, grabbing files from an SFTP server, processing each entry sequentially and writing results to a Snowflake database. This design had several limitations: it scaled only vertically, created a single point of failure and left no clear recovery path when errors occurred mid-process.

To mitigate these limitations the system architecture was refactored, as shown above – after which responsibilities were split across multiple components. The initial function now simply reads the file and splits each entry into individual messages on a storage queue. As messages arrive, a second function automatically scales up to process multiple instances in parallel. This distribution transforms sequential processing into concurrent execution, dramatically improving throughput speed.

As well as speed, the switch to a serverless architecture also improves fault tolerance. If a function instance fails mid-processing, the message automatically returns to the queue for retry. Messages that consistently fail move to a poison queue for manual investigation, preventing problematic entries from blocking the entire pipeline. Unlike the original architecture, the need to track processing state within files or implement complex restart logic is eliminated because the queue handles these concerns automatically.

Case Study 2: Serving Static Content at Scale in AWS

Traditional web server architecture requires substantial infrastructure, as can be seen in the diagram above. Firstly, requests made by users are distributed by an application load balancer across EC2 instances deployed in multiple availability zones. Next, auto-scaling groups monitor traffic and adjust instance counts accordingly, adding capacity during peaks and removing it during lulls to control costs. Each virtual machine incurs a cost whether actively serving requests or sitting idle.

The serverless alternative simplifies this substantially. Static files reside in an S3 bucket, with CloudFront serving as the access point. CloudFront operates as a content distribution network with edge locations worldwide. When users request content, they receive it from the nearest edge location rather than travelling back to the origin region. This geographic distribution reduces latency significantly for global audiences.

This serverless approach has benefits for performance, maintainability and scalability:

S3 stores files across multiple availability zones by default and if one zone becomes unavailable, requests route to files in other zones without manual intervention.
CloudFront caches content at edge locations, reducing origin server load and improving response times.
The entire stack scales to handle traffic spikes without configuration changes or capacity planning.

Case Study 3: API Infrastructure Without Servers in AWS

API servers typically follow similar patterns to web servers: virtual machines behind load balancers, deployed across availability zones for resilience. This infrastructure requires ongoing maintenance - operating system patches, image updates, capacity planning and monitoring.

API Gateway provides a serverless alternative, acting as a unified entry point for API traffic. It integrates directly with numerous AWS services: Lambda functions for compute, DynamoDB for database access, SQS for message queuing and SNS for publish-subscribe patterns. This integration flexibility enables varied architectural patterns without managing underlying infrastructure. This makes initial start-up easier than it would be using a virtual machine.

The publish-subscribe model through SNS demonstrates particular power. A single message can fan out to multiple subscribers - perhaps a Lambda function sending notifications to Slack while simultaneously queuing work for asynchronous processing. This pattern enables event-driven architectures where services respond to events without tight coupling between components.

This approach is also accompanied by benefits:

Multi-availability zone deployment happens by default.
The platform automatically handles failover and scaling without explicit configuration.
Updates don't require creating new machine images or coordinating rolling deployments across instances.
The pay-per-use model means costs align directly with actual usage rather than provisioned capacity.

Trade-offs and Considerations

Serverless architectures introduce their own considerations. For example, cold starts, the latency when a function first initialises, can affect user experience through lagging. This can be mitigated at an additional cost through provisioned concurrency (keeping a specified amount of Lambda expressions always running) to keep functions warm. This trade-off matters most for latency-sensitive applications where milliseconds count.

Additionally, cost efficiency should be considered, which depends on scale. Serverless platforms charge per request, making them economical for variable workloads. At extremely high sustained volumes, dedicated infrastructure may become more cost-effective. However, this typically occurs only at the scale of major internet services.

Some use cases still favour traditional servers. Such as long-running processes, which don't map cleanly to function execution models. For example, server-side rendering requires a server to generate HTML dynamically, which S3 and CloudFront cannot provide. Static site generation or pre-rendering can address some of these scenarios, but pure static hosting has SEO limitations without additional tooling.

A learning curve exists for both serverless and non-serverless approaches. To use servers, understanding load balancers, auto-scaling groups and virtual machine maintenance requires expertise. On the other hand, serverless architectures require different knowledge - message queues, function composition and event-driven design. Teams should evaluate their existing skills and strategic direction when choosing approaches.

Conclusion

Serverless architectures deliver meaningful advantages in scalability, performance, fault tolerance and maintainability. By abstracting infrastructure management, they enable teams to focus on application logic rather than operational concerns. While not universal solutions, they provide compelling benefits for most modern applications, particularly those with variable traffic patterns or limited operations resources. The examples demonstrate that serverless patterns often simplify rather than complicate architecture, delivering better results with less overhead.

Watch the Tech Talk

Putting the CD Back into CI/CD: A Guide to Continuous Deployment

Audacia — Mon, 12 Jan 2026 08:30:00 +0000

Putting the CD Back into CI/CD: A Guide to Continuous Deployment
Many organisations talk about CI/CD, but the reality is that most have achieved continuous integration (CI) without continuous deployment (CD). Embracing both CI and CD represents a fundamental shift in how software reaches production and how teams approach risk, quality and delivery.

This blog, adapted from a Tech Talk by Principal Software Engineers Luke Mitchell and Akeel Ahmed, explores two distinct pathways to achieving true continuous deployment: trunk-based development with ephemeral environments, and Git Flow with structured release management. Both approaches can deliver frequent, reliable releases, but they require different technical infrastructure, and cultural and organisational readiness.

The Current State of Continuous Deployment

Continuous integration has become standard practice through automated testing, code reviews and build pipelines. However, the journey from merged code to production often remains batched and infrequent.

The reasons are varied: legacy approval processes inherited from waterfall methodologies, lack of confidence in automated testing, concerns about deployment risk or simply the complexity of managing multiple environments. Yet the benefits of genuine continuous deployment – faster feedback loops, reduced integration risk and the ability to respond rapidly to business needs – make it worth pursuing.

Trunk-Based Development

At the core of trunk-based development is a single principle: the main branch should always be in a deployable state.

The Fundamentals

Unlike Git Flow's multiple long-lived branches, trunk-based development maintains a single main branch. Developers work on short-lived feature branches, typically lasting hours or days rather than weeks, before merging back to main. Each merge triggers an automated pipeline that can deploy directly to production.

This approach demands discipline. Small, focused commits become essential. Code reviews must happen synchronously – within 10 to 15 minutes of raising a pull request. The entire team must prioritise getting code through the pipeline over starting new work.

Ephemeral Environments

One of the most powerful enablers of trunk-based development is the use of ephemeral environments. Rather than maintaining static QA and staging environments where multiple developers' changes intermingle, each feature branch spawns its own temporary environment.

When a developer pushes their branch, the pipeline automatically provisions cloud infrastructure and deploys their changes to an isolated environment. This provides several advantages:

Isolation of changes: Bugs discovered during testing are definitively linked to the development stage.

Parallel development: Developers and testers can work simultaneously without interference, removing bottlenecks from the development and QA processes.

Cost efficiency: Environments are taken down automatically after the code merges to main, ensuring resources are only consumed when needed.

Production parity: Each ephemeral environment can mirror production configuration, reducing environment-specific issues.

The workflow is streamlined: Development happens on the feature branch, code review occurs when the developer opens a pull request, QA testing happens in the ephemeral environment once the PR is approved, and upon successful testing, the code merges to main and the ephemeral environment is deleted.

Release Strategy for Trunk-based Development

Merging to main doesn't necessarily mean immediate production deployment, though it could. Many teams create release candidate branches automatically upon merge to main. These branches can then be deployed to UAT or production based on business requirements.

Teams take different approaches to deployment timing. Some deploy every merge to production immediately – true continuous deployment. Others batch a few tickets together, deploying the most recent release candidate branch that contains all the desired changes. The key principle remains constant: all code in main is production-ready, and the organisation decides when to deploy based on business needs, not technical readiness.

To track what's currently live, many teams maintain a production branch, merging their release candidate branches into it after deployment. This provides a valuable snapshot of the live environment, simplifying rollbacks and hotfixes by maintaining a known good state to return to. Teams requiring additional safeguards sometimes create rollback candidate branches automatically before each production deployment, though this adds complexity that not all teams need.

Feature flags provide an additional layer of deployment control that works with both trunk-based development and Git Flow. They're particularly valuable in trunk-based development, where code deploys to production frequently, by controlling feature visibility independently of code deployment.

The Benefits

Risk mitigation: pinpointing bugs becomes easier in smaller, recent releases.
Early client/user feedback: a clients’ vision can change or become clearer when presented with something concrete – it’s best to know as early as possible.
Reactive to change: small releases reduce the amount of time and difficulty it takes to get feedback and implement changes.

The Cultural Requirements

Trunk-based development requires significant cultural change. It demands trust that developers will maintain quality, that automated tests are comprehensive and that the team will respond quickly to production issues.

It also requires scaling back bureaucratic approval processes. Change Advisory Boards can be antithetical to continuous deployment if every change is scrutinised. The governance must shift from manual approval gates to automated quality gates and rapid response capabilities.

Full team ownership becomes paramount. From junior developers to tech leads, everyone shares responsibility for production stability. This shared accountability, combined with the practice of deploying small changes frequently, reduces risk compared to large, infrequent releases.

Git Flow

Not every organisation can adopt ephemeral environments immediately. Infrastructure constraints, compliance requirements or existing tooling may necessitate static environments. Git Flow provides a structured approach to continuous deployment within these constraints.

The Git Flow Model

Git Flow employs multiple long-lived branches with specific purposes:

Main reflects production and is updated only with tested, stable releases.
Release branches are cut from develop to deploy to production.
Develop serves as the integration branch for ongoing development.
Feature branches are created from develop for new functionality.
Bugfix branches are short-lived branches created from develop or release to fix defects, and are merged back into their source branch once resolved.
Hotfix branches are created from main for urgent production fixes, and merged back into main and develop.

This structure provides clear separation between development, testing and production code. Whilst feature branches can be short-lived with good continuous integration practices, the methodology naturally supports more structured release cycles.

Release Planning and Management

Success with Git Flow depends heavily on release planning and management. Rather than ad-hoc deployments, teams batch related user stories into planned releases. This upfront planning – tagging stories with release identifiers early in the sprint – provides predictability for stakeholders whilst still enabling frequent releases.

The workflow operates in distinct phases:

Development phase: Developers merge feature branches to develop, which automatically deploys to a shared QA environment for testing.

Release preparation: When all features for a release are complete and QA-tested, a release branch is created from develop.

UAT phase: The release branch is deployed to UAT for stakeholder testing. Crucially, no new features are added during this phase – only bug fixes and refinements.

Production deployment: After successful UAT, the release branch deploys to production and merges back to main, providing a live reflection of production in the codebase.

Managing Hotfixes

Git Flow excels at handling production issues whilst development continues. Hotfix branches are created from main, tested independently, and deployed to production without disrupting the develop branch or ongoing releases.

A practical versioning approach helps manage this: if release 1.0 is in production and a bug is discovered, create hotfix branch 1.1, deploy it to production, then merge it back to both main and develop to keep everything aligned.

The Advantages of Structure

Git Flow's structure provides several benefits for teams and stakeholders:

Stability: The main branch always reflects production, reducing confusion about what code is live.

Visibility: Clear branching structure makes it easy to understand what features are in which release.

Control: Product owners and project managers have explicit control over what gets released and when.

Transparency: Every merge, tag and deployment is logged, providing an audit trail for accountability.

This structure particularly benefits larger teams where multiple developers work on the same codebase simultaneously. The isolation between branches provides clearer separation of concerns and reduces the risk of unstable code reaching production.

Choosing Your Approach

In a simple analogy, trunk-based development can be imagined as multiple passengers in different taxis heading to the same destination, whereas Git Flow involves a group of passengers on a bus – going through each checkpoint together. The decision between the two strategies depends on infrastructure capabilities and business requirements.

Trunk-Based Development Fits When:

You have cloud infrastructure that supports ephemeral environments
Your team is comfortable with high deployment frequency
Automated testing provides high confidence
There's organisational trust in the development team
Small, incremental releases align with business needs
You want to minimise the feedback loop between development and production

Git Flow Fits When:

You have static environments that can't easily be replicated
Releases need stakeholder approval or coordination
Compliance requires structured release documentation
Larger teams benefit from clear branch isolation
Business prefers predictable, planned release schedules
You're transitioning from traditional release processes

Neither approach is inherently superior. Both can achieve continuous deployment if implemented well. The key is matching the approach to your context and executing it with discipline.

Making It Work: Practices

Regardless of which strategy you choose, a few practices are essential for successful continuous deployment:

Automated Testing as a Foundation

Quality gates must be automated and comprehensive. Unit tests, integration tests and UI tests should run automatically on every commit. These tests become your confidence in deployment – they must be reliable and fast.

Synchronous Code Reviews

Code reviews can't be allowed to become bottlenecks. Establishing the expectation that pull requests receive attention within 15 minutes keeps code flowing.

Communication and Collaboration

Continuous deployment requires continuous communication. Development teams and testers must collaborate closely, using tools like Slack, Teams or Azure DevOps to stay coordinated. Early feedback loops with product owners and clients help ensure that frequent releases deliver what stakeholders want.

Monitoring and Observability

When deploying frequently, you must know immediately if something goes wrong. Comprehensive monitoring, alerting and logging become essential. The ability to quickly diagnose and resolve production issues provides the confidence to deploy often.

Next Steps

Whether through trunk-based development's simplicity or Git Flow's structured approach – start small. Each increase in deployment frequency teaches lessons about improvements that can be made to testing, automation, monitoring or process.

Moving from continuous integration to genuine continuous deployment represents a significant evolution in development maturity. It requires technical investment in automation and infrastructure, cultural change in how teams approach quality and risk, and organisational trust in development practices.

Watch the Tech Talk

The Building Blocks of AI Governance: Policies, Principles & People

Audacia — Mon, 15 Dec 2025 09:27:13 +0000

This blog post has been adapted from this Tech Talk by Chris Bentley, Lead Data Scientist at Audacia. Find the full video at the end of the article.

Navigating AI’s Expanding Landscape

Increasingly, AI is being woven into the fabric of modern engineering. Whether it's enterprise models like ChatGPT, off-the-shelf cloud tools or bespoke machine learning pipelines.

However, with every new capability comes new risk. As AI capabilities grow, so does the chance of unintended consequences: discrimination, security vulnerabilities or even loss of control over powerful systems. The solution is to ensure governance is considered at every part of the pipeline; it should be a robust, evolving framework grounded in clear principles, backed by thoughtful policies and shaped by the right people - enabling them to take ownership and drive responsible outcomes.

This article sets out a practical foundation for technology leaders looking to implement or update AI governance.

What We Mean by AI (And Why It Matters for Governance)

Before discussing governance, it helps to define what we mean by "AI."

Rule-based systems: The earliest AI was entirely programmatic - explicit rules codified by humans to mimic decision-making in well-understood domains.
Machine learning: A huge leap forward. Algorithms learn patterns from data to make predictions or decisions without explicitly coded rules for every scenario.
Deep learning: A subset of machine learning that uses multi-layered neural networks to capture complex patterns in vast datasets.
Generative AI: At the innermost core sits generative AI: deep learning models trained on massive datasets to produce new content - text, code, images or audio.

Most of the governance debate today is triggered by generative AI. However, in practice, governance concerns apply across AI in all its forms - not just generative AI. Whether you're building a tailored fraud detection model or experimenting with ChatGPT prompts, the same foundational risks around ethics, security and control still apply.

A Working Definition of AI Governance

At its heart, AI governance is a framework of guidelines, processes and practices to ensure AI systems are:

Ethical: respecting human values and avoiding harm
Safe: robust, reliable, aligned with your organisation's attitude to risk
Transparent: open to inspection, traceable and explainable

And crucially, governance should span the full lifecycle - from initial scoping and development to deployment, monitoring and daily use.

In a simple analogy, AI governance is like the rules of the road. Your business context sets the landscape, your developers are the drivers, and AI is the vehicle. Governance provides the signposts, traffic lights and certifications to ensure you reach your destination (the use case) safely - without crashing the system or harming bystanders along the way.

Why Governance Matters More Than Ever

Ignoring AI governance comes with very real consequences:

Zillow: Their machine learning system for home buying was trained on outdated market data. Without ongoing governance to detect drift or continuously fine tune the model with new data, the model consistently overbid, racking up losses of over $500 million and forcing layoffs and program shutdowns. (InsideAI News, 2021)
Samsung: Engineers pasted proprietary code into ChatGPT to debug problems, unaware of the implications. The result was uncontrolled exposure of intellectual property, forcing an emergency ban on AI use. (Forbes, 2023)

Additionally, a UK poll revealed one in five companies experienced data leaks due to ungoverned GenAI use. Meanwhile, 92% of Fortune 500 firms already use ChatGPT - sometimes through informal "shadow AI", where employees independently adopt tools without IT or legal signoff. ( Reuters, 2024)

Governance isn't there to slow teams down. It's your best route to:

Build trust and adoption, internally and with customers
Mitigate operational, legal and financial risk
Ensure your AI systems are auditable, reproducible, and scalable
Shorten time to production through clear, standardised practices
Attract top technical talent who care about ethical, forward-thinking engineering.

Core Principles: The Ethical Backbone

The starting point for any governance framework is a set of core principles. These are underlying high-level ethical and operational guidelines - your non-negotiables for how AI gets built and used.

Here are six example principles split into two core areas:

Oversight & Integrity

Accountability: Define clear roles and ownership for AI usage. Know who's responsible for model outcomes and empower leaders to take corrective action.
Ethics: Align AI with moral and societal values. Actively guard against bias or discriminatory outcomes.
Transparency: Make your AI systems understandable. This allows effective auditing and builds technical literacy across teams.

User Rights & Protection

Security: Guard against unauthorised access and misuse. Protect systems from compromise.
Privacy: Safeguard personal and sensitive data. Stay compliant with GDPR and evolving global standards.
Control: Give users and your organisation the means to override or restrict AI outputs to stay aligned with human judgement.

These principles are deliberately broad, they form the basis of many governance policies, which are then narrowed down with specific organisational context.

How Governance Changes Shape Up the Stack

Governance doesn't look the same at every level. Consider a typical AI project lifecycle with your users/project managers at the centre, and your engineers/data scientists embedded in the process:

We can approach this lifecycle at three different levels:

Built systems: Built from scratch by data science teams. Here governance focuses on model development standards, data selection (to mitigate bias or toxicity) and hands-on monitoring.
Cloud services: Plug-and-play frameworks where your team provides data and tweaks. You're responsible for due diligence on service choice, feeding clean data and verifying outputs comply with standards.
AI products: Tools like Copilot or ChatGPT. Governance shifts to vetting vendors, understanding their transparency commitments and educating employees on approved use.

Governance doesn't diminish as we move up the stack, it simply changes shape.

Turning Principles into Policy

So how do you move from abstract principles to something actionable?

A practical first step is an overarching AI governance policy document, tailored to your existing organisation. This might include:

A formal statement of your principles
Tables of roles & responsibilities
Clear implementation guides with examples
Checklists for assessing risks and impacts before adoption
Standards for monitoring, auditing and escalation paths

Good policy documents are clear and accessible, as well as easy to update and dynamic. Additionally, avoid over-complication that restricts workflows and innovation - this massively reduces the likelihood of widespread adoption.

Mitigating the Biggest Risks

Build policies that actively spot and reduce risks. Examples include:

Shadow AI exposure: Keep a registry of approved (and banned) tools. Provide sanctioned alternatives - e.g. enterprise-grade ChatGPT - to steer teams away from unsafe workarounds.
Model drift & stale data: As Zillow discovered, failing to monitor changing data can be ruinous. Bake regular model reviews into your policy.
Sensitive inputs: Guardrails (both technical and policy-based) to stop developers pasting IP into consumer SaaS tools.
Privacy leaks: Ensure privacy reviews are a standard step in your deployment process.

Consider introducing lightweight artifacts like audit checklists or readiness questionnaires. You may want to build these into your governance policy document or introduce them as separate tools to keep them more dynamic. These integrate governance without adding heavy process that stifles engineering momentum.

People: The True Drivers of Governance

No policy lives in a vacuum. Successful AI governance comes down to people.

Dedicated roles: Many organisations appoint an AI governance lead - often a data scientist or architect with a passion for responsible AI. They bridge exec strategy and daily developer practice.
Defined responsibilities: Make sure everyone knows how governance relates to their role, and who to go to with questions.
Periodic training: Keep teams current on new models, new regulations, and what policies mean for their work.
Open culture: Foster spaces to raise concerns, suggest improvements, and discuss AI ethics. This not only improves adoption - it makes your governance framework stronger and more relevant.

Staying Dynamic in a Fast-Moving World

AI isn't standing still. It took ChatGPT two months to hit 100 million users - the fastest ever for a consumer application. (The Guardian, 2023) Meanwhile, model sizes are growing by orders of magnitude, costs to train are plummeting and multi-agent systems are pushing us closer to artificial general intelligence.

In practice, this means:

Review your policies often: use strict version control, iterate quickly.
Monitor your deployed systems: stay alert for unanticipated changes, especially if using enterprise models updated outside your control.
Keep learning: from regulatory frameworks (like the UK's pro-innovation principles or ISO 42001) to evolving global standards, staying informed is essential.

Where to Start (Wherever You Are)

Not every organisation is at the same stage. If you're only just exploring AI:

Audit your workflows. Where could AI realistically help? Is shadow AI already creeping in?
Use this to shape your first minimal governance guardrails.

If you're sporadically using AI via contractors or pilot projects:

Identify who might lead your governance efforts. Do you have data specialists who can step up? What lessons from past projects can be formalised into your first policies?

If you're mature in AI use but light on governance:

Start documenting your implicit standards. Turn them into explicit, auditable principles and policies. Avoid slowing innovation but build the right checks so your systems stay robust and ethical.

In Closing

AI governance boils down to three pillars: policies, principles and people. Get them right and you're unlocking AI's potential in a way that's safe and aligned with your values.

Watch the full Tech Talk:

Building a Tech Radar: A Practical Guide for Technology Leaders

Audacia — Mon, 01 Dec 2025 11:21:39 +0000

This blog post has been adapted from this Tech Talk by Richard Brown, Technical Director at Audacia.Find the full video at the end of the article.

Technology leaders regularly make decisions that shape the technical direction of organisations. Each choice regarding frameworks, languages or tools influences how systems are built and maintained. The challenge is keeping those decisions aligned across organisations.

One of the most effective ways to achieve that alignment is by using a Tech Radar.

What is a Tech Radar?

At its simplest, a Tech Radar is a visual representation of technology choices within your organisation. It shows where each tool, framework or platform sits in terms of maturity and adoption.

A Tech Radar makes it easy to answer key questions: what should we be adopting? What's being trialled? What’s emerging? What’s being phased out?

The concept originated at Thoughtworks, but every organisation can adapt it to fit their own priorities.

Why Tech Radars are Important

Without an explicit method for tracking technology choices, teams can drift, and tools can be selected based on convenience or habit rather than strategy. Over time, this results in inconsistency and risk.

A Tech Radar forces deliberate conversations about technology. It provides a shared view of what’s recommended, what’s under evaluation, and what’s no longer safe to use.

For technology leaders, the benefits are clear:

Knowledge sharing In large organisations or consultancies, there is a danger that teams can become siloed. A radar combats this by makes choices visible. It becomes a central reference point for engineers starting new projects or exploring unfamiliar tools. *2. Risk management * Technologies age, licensing models change, maintainers move on. When a framework is deprecated or a tool becomes risky, the radar is the single source of truth that makes this clear. *3. Future direction * By looking at what’s being trialled or assessed, you can see where the organisation is heading. It informs hiring, training, and investment decisions.

Anatomy of a Tech Radar

A radar has two main elements:

Segments (or quadrants): Categories of technology. These can be as traditional as ‘Languages & Frameworks’ or as precise as ‘Customer Experience Platforms’.
Rings: Levels of maturity. Thoughtworks use:

Hold: Do not use (either immature or on the way out).
Assess: Watch closely.
Trial: Test in a controlled way.
Adopt: Recommended for general use.

At Audacia, we adapted this structure to reflect the way our projects are structured and the services we deliver.

For example, our Cloud & DevOps quadrant spans everything from CI/CD pipelines to infrastructure-as-code tools. Data & AI merges data engineering with machine learning, because these disciplines are often inseparable in practice. Additionally, we renamed ‘Hold’ to ‘Avoid’. If a tool has been deprecated or poses security risks, its position in the ‘Avoid’ ring makes this explicit. New teams no longer waste time rediscovering the same issues.

This tailoring is important. Choose quadrants and rings that reflect how you work, rather than adopting someone else’s model. For some organisations, that might mean categories around business domains rather than tools. Copying another company’s structure rarely gives good results.

How To Manage a Tech Radar

A Tech Radar is not a one-off exercise. It must be living and breathing.

At Audacia, we’ve built a lightweight internal web application to host our Tech Radar. It’s updated continuously, with a clear owner for each quadrant. Every blip includes context: where it’s used, why it’s recommended or avoided and who to talk to. Because the radar is visible, it keeps everyone informed and turns technical direction from implicit knowledge into a documented, shared resource.

We also make it easy for engineers to suggest updates through a feedback mechanism. This lowers the barrier to contribution, ensuring the radar reflects the collective expertise of the organisation. People on the ground often see trends earlier than leadership.

Ensure that your Tech Radar is reviewed regularly – a stale radar is worse than no radar. Additionally, make changes visible to everyone. When a significant shift occurs, such as a major library deprecation, communicate it internally beyond the radar itself.

Common Questions

How long should you keep a deprecated technology on the radar?

It depends on how widely it was used. If it was central to past projects, leave it visible in the ‘Avoid’ ring for longer so that future teams know to avoid it.

Should cost factor into a decision?

A Tech Radar provides visibility on what is and is not recommended, which could be for several reasons: technical, cost, licensing or others. Therefore, if licensing costs make a tool unviable, that should be reflected in its position.

How can we build one?

You can start with a simple shared document. Over time, invest in a web-based version that supports filtering, links and feedback. Several open-source templates exist, but building your own allows the flexibility to match your structure.

A Tech Radar is a conversation starter, a knowledge-sharing tool and a governance mechanism. The radar is also a way of making risk visible, such as licensing changes, security vulnerabilities or shifts in community support. It keeps technology choices deliberate and visible, ensuring that decisions made today continue to serve the organisation.

For technology leaders, it is one of the simplest, most effective ways to set direction, manage risk and explain choices.

If you don’t already have one, now is the time to start.

Watch the full Tech Talk:

Brick by Brick: How to Define the Right System Requirements

Audacia — Mon, 17 Nov 2025 08:46:09 +0000

This blog post has been adapted from this Tech Talk by Matt Cross, Lead Business Analyst at Audacia. Find the full video at the end of the article. 

Successful software projects are rarely the result of chance. They emerge from a disciplined approach to understanding the problem, structuring requirements and aligning teams. These same principles – clarity of vision, scope control and collaboration – are as essential to replacing a medieval Lego castle as they are to delivering a complex software system.

This article explores three core challenges in defining effective system requirements:

Stacking Wisely – Managing Scope and Priorities
The Picture on the Box – Visualising Requirements
All Hands on Bricks – Engaging Stakeholders

1. Stacking Wisely: Managing Scope and Priorities

Every project begins with a list of ideas, but the reality of time, cost and risk forces a tough question: what belongs in the first release? Delivering everything at once may sound appealing, but it often leads to instability, wasted effort and missed opportunities for learning from real users.

Start with the problem

Before anything else, ensure decision-makers agree on what problem is being solved. A simple framework of three questions is effective:

Who is experiencing the problem?
What is the problem?
Why does it matter?

In a castle-building analogy, merchants face a market constrained by poor foundations – limiting growth and trade. In software terms, a system too unstable for additional features prevents teams from adapting to market demand. A clearly articulated problem provides a lens through which features can be assessed.

Avoid the trap of over-simplification with clear principles

A narrow focus on the immediate problem can produce a quick win but risks long-term frustration, and features that do not consider scalability, flexibility or future needs will soon require rebuilding. For example, a market expansion may succeed, but if entry gates remain too small or foundational choices too rigid, growth will be limited.

Defining a short set of principles alongside your problem statement helps safeguard against short-sighted decisions, by providing context to what’s important when evaluating features. These principles should influence how requirements are prioritised and where trade-offs are acceptable. They may include long-term architectural goals, user experience priorities or operational resilience. For instance, if future phases will introduce significant load or new user groups, early design choices – like implementing role-based access or building with modularity in mind – can ensure the system is prepared to accommodate tomorrow’s needs, even during MVP scoping. You can read our eight over-arching Engineering Principles here.

Simplify prioritisation

Initial prioritisation benefits from a binary ‘in or our’ filter to quickly define what makes the MVP. Once that shortlist is clear, apply a MoSCoW analysis (Must, Should, Could, Won’t) to balance value and effort. This two-step approach reduces ambiguity, prevents scope creep and clarifies which items belong to future phases.

2. The Picture on the Box: Visualising Requirements

Software is intangible, and unlike a Lego set, there is no picture on the box to show you what it should look like. Without clear visualisation, requirements can be misinterpreted, dependencies overlooked and effort wasted. The ‘picture on the box’ challenge is to make these intangible elements visible, so we can determine which bricks are needed and where they should be stacked. Techniques such as user story mapping and wireframes can help to achieve this.

User Story Mapping

User story mapping is a practical and collaborative way to present a backlog as a visual, flowing representation of a product. Their purpose is to present the epics, features and user stories in the backlog as a chart – almost like a process diagram – showing both the big picture and the details of what is being built.

Think of it as a layered map:

The top row outlines high-level epics that represent the largest goals
Under each epic, features group related functionality into logical processes
Finally, user stories are the individual steps required to complete a task

This tree-like structure should be laid out to trace a start to finish path through the application from the user’s perspective. This exercise makes it easier to separate essential functionality from outdated processes that can be discarded.

The process is inherently collaborative. Working through the journey step-by-step exposes gaps, validates assumptions and challenges unnecessary complexity. Story maps can also support later tasks, such as prioritisation, dependency mapping and planning iterative releases.

Wireframes

When interface and user experience are central to a feature, simple wireframes add clarity. Even rough sketches highlight layout, workflows and user paths, making discussions concrete. The introduction of Generative AI tools and basic creative software such as Figma mean wireframes have become more time-effective and accessible – rather than resorting to underwhelming paint drawings, or expensive UX design.

These lightweight diagrams can help stakeholders quickly critique, refine and align on functionality before development begins. For example, a wireframe of a market interface might expose usability issues, such as layouts unsuited to mobile devices, or prompt the addition of features like search and filters. Teams can incorporate these visual guides into user stories, enhancing shared understanding among developers, testers and stakeholders.

3. All Hands on Bricks: Engaging Stakeholders

Building a system requires the efforts of many hands. Different groups bring different experiences, priorities and assumptions. Without careful management, these differences can derail a project.

Recognising Stakeholder Motivations

Not all stakeholders see change in the same light:

Some value the legacy system and fear losing familiar tools.
Others are eager for innovation and push for ambitious features.
Some may be sceptical, based on past experiences.
Others may simply lack understanding of what is possible.

Recognising these underlying motivations helps tailor communication and build trust.

Fostering Collaboration

Early workshops set the tone. Use structured icebreakers and open-ended questions to encourage contributions. For example, ask participants to share what excites them about the project and what they see as possible risks. Ensure quieter voices are heard by directly inviting input, so no critical knowledge is overlooked.

Stakeholder engagement is more than extracting requirements – it is about building ownership. People who contribute to the design process are more likely to support and champion the final product.

Clear and Consistent Communication

Projects benefit from a central knowledge base that includes:

Problem statements and project principles
Glossaries of terms and roles
Guidance on processes like testing and agile practices
Tools and dashboards for visibility on progress

Provide introductory sessions, especially when stakeholders are unfamiliar with agile delivery. Record these sessions and keep materials accessible for new team members joining mid-project.

Conclusion

Defining the right system requirements is a structured, collaborative process. This process involves stacking wisely, creating the ‘visual on the box’ through techniques such as wireframes and story maps and engaging stakeholders through thoughtful communication, consistent knowledge sharing and structured workshops.

When approached this way, requirements gathering can transform from standard documentation to a system foundation, built brick by brick, that is resilient, user-focused and scalable.

Watch the full Tech Talk:

Delivering Greenfield Projects: Getting the Foundations Right

Audacia — Mon, 03 Nov 2025 09:02:10 +0000

How to get the first line of code - and everything that follows - right.

Without legacy constraints, greenfield projects allow teams to bake in modern practices, cloud-native architectures and a developer-first culture from day one. Done well, those early choices compound into faster releases, easier scaling and happier teams for years. Done poorly, they can create tomorrow’s technical-debt problems. This article discusses how development teams can lay solid foundations when starting from scratch.

Summary:

Automate early. A CI/CD pipeline, test automation and Infrastructure as Code (IaC) from sprint 0 lock in speed and quality for the life of the product.
Design for cloud and APIs. Cloud-first, API-first and security-by-design principles align with the UK Government’s Technology Code of Practice and give future teams flexibility at scale.
Balance YAGNI with extensibility. Build “just enough” architecture - simple, loosely coupled services that are easy to extend later, not an over-engineered fortress.
Invest in observability, documentation and culture. Practices like structured logging, Architecture Decision Records (ADRs), documentation and peer code review are far cheaper to embed before the codebase grows.

Blank Slates - Opportunities & Challenges:

Greenfield projects are often described by engineers as a “nirvana” – no legacy code, no entrenched constraints, the freedom to choose the best technologies.

But they also carry risk: without legacy constraints, teams might under-invest in necessary structure or, conversely, over-engineer because everything is possible.

For IT leaders, a greenfield initiative (be it a new digital product, a spin-off system, or a major rewrite separated from legacy) is an opportunity to set an example for how software should be done. It’s the chance to incorporate lessons learned from past projects and evolving industry practices.

One key guiding principle is “first build the right thing, then build it right” – meaning even with a perfect technical foundation, you must ensure the product meets user needs. Greenfield teams should still follow agile, user-centric development to validate they’re building a valuable product.

But our focus here is on building it right - the engineering practices and infrastructure that form the foundation.

Core Foundations:

CI/CD Pipelines:

Automated pipelines build the code, run tests, and (if tests pass) deploy to a test environment, as well as deploying further to production for continuous delivery. By doing this from the start, every code commit goes through a consistent, repeatable process, and developers become accustomed to fast feedback. It also enforces good habits: if a certain practice (like running unit tests) is mandated by the pipeline, it will become part of the team’s routine.

The pipeline can essentially be the embodiment of your process – add static analysis, security scans, etc., early on, so the team gets immediate feedback and quality is baked in.

Many UK startups attribute their rapid scale-up to having CI/CD from the get-go. For example, when Monzo began, they invested heavily in automation and tooling, which allowed them to deploy small changes frequently, catching issues early and scaling their operations without a hitch as user numbers grew.

Architecture and Design Principles:

Choose an architecture that will support future needs without over-complicating, for instance, favouring a microservice or modular monolith architecture.

Starting with microservices can allow for independent teams to work in parallel and deploy independently, fueling rapid feature development. The key is loose coupling – design components or services with clear responsibilities and interfaces, so that the system can evolve.

Gartner suggests that reducing coupling and complexity at design time increases future changeability. Apply known design principles (SOLID, high cohesion, etc.) to avoid complexity as features grow.

Cloud-Native & Infrastructure Automation:

The UK government Technology Code of Practice states technology projects should “use cloud first”. Essentially all greenfield projects should be cloud-first. Cloud provides on-demand resources, managed services and scalability that a new project can leverage instead of reinventing.

Use Infrastructure as Code (IaC) – tools like Terraform or AWS CloudFormation – to script your environments. This ensures you can replicate environments, do disaster recovery easily, and treat infrastructure setup as part of your codebase.

Additionally, consider using platform services to accelerate development (databases, messaging, authentication services). A greenfield project can save time by not building commodity components themselves. For example, why build your own identity service if you can use Azure Entra ID or Auth0? This frees your team to focus on unique business logic.

DevSecOps:

Security and compliance must be foundational, not an afterthought. Incorporate security design (threat modeling, secure defaults) and compliance requirements early. For instance, if building a healthcare app in the UK, ensure from day one that the data model and hosting comply with NHS Digital standards for patient data.

Implement security controls (encryption, secure secret storage, logging) as part of the initial build. Use automated static code analysis and dependency vulnerability scanning in your CI pipeline. It’s easier to build a secure product from scratch than to retrofit one later.

The Technology Code of Practice emphasises “Make things secure” and “Make privacy integral” as key points. Teams can focus on setting up monitoring and alerting with security in mind. For instance, define what constitutes suspicious behaviour in your system and plan how you’d detect it.

Open and Accessible Development:

Adopting an open-first approach can have many benefits. Using open-source components (with due diligence) accelerates development. Hosting your code in repositories where collaboration is easy (GitHub, GitLab) and possibly open-sourcing parts of it can attract community contributions, as well as potentially make new team additions easier due to visibility.

It’s an important opportunity to also consider accessibility from the start (especially important for public-facing services) – follow WCAG guidelines from the outset so you aren’t scrambling to fix accessibility later (for example, use semantic HTML, proper ARIA tags, etc., in web apps).

Building accessible and inclusive technology is not only a legal requirement for some (e.g. public sector must meet certain accessibility standards), but also expands your potential user base.

Team and Process Foundations:

Greenfield doesn’t mean “no process”; rather, it means the chance to implement lightweight agile processes that fit the team and purpose.

Define how the team will collaborate, for example a Scrum or Kanban approach. Set up a backlog with user stories, define a Definition of Done (including testing, documentation, etc.), and use an agile project tool (Jira, Trello, Azure Boards) from the start to track work, ensuring transparency.

Encourage practices like pair programming or peer code reviews from early on – these habits catch defects early and spread knowledge. Also, instill an engineering culture that aligns with your values. For example, if innovation is key, ensure people have time for spikes/proof-of-concepts; if reliability is crucial, emphasise TDD (Test-Driven Development) from the outset.

A strong positive culture set in a small initial team can scale with the product. For example, Monzo’s engineering principles, such as “make changes small and often” and “leave things better than you found them”, were set early and helped maintain quality and speed even as the engineering team scaled 60% in a year.

Building for Scale (but not over-building):

One risk in greenfield projects is over-engineering by trying to anticipate every future need. It’s important to strike a balance. Design an architecture that can scale out if needed, but don’t implement features or complexity you don’t need yet.

A useful concept is YAGNI (You Ain’t Gonna Need It) from agile: defer work on future hypothetical requirements until they are more certain. For example, you might foresee that the system could need a more complex sharding mechanism when it has millions of users, but if you’re at prototype stage with 100 users, don’t implement sharding now; just design the data access in a way that adding sharding later is possible (e.g. via an abstraction layer).

Another useful concept is building for extensibility, not general scalability, because trying to accommodate every potential future use makes systems complex and harder to maintain. Instead, build something that works for the known requirements, but with clean separation of concerns and with the ability to extend.

For instance, if building a payment processing service, you might design it to handle credit cards initially but make sure the way you implement doesn’t hardcode specifics that would prevent adding PayPal later – perhaps use a strategy pattern for payment methods. You wouldn’t, however, implement PayPal support from day one if it’s not needed – you’d just ensure you won’t have to rewrite everything when adding it.

Setting Up Environment & Tools:

On a practical level, ensure developers on a greenfield project have a frictionless environment.

This might include using containerisation (Docker) so that setting up the development environment is quick and matches production as much as possible. Many teams create a one-click onboarding script so a new dev can get the whole system running locally or in a personal dev environment in the cloud within minutes. Quick onboarding is a sign of good foundations.

As well as this, teams can implement source control best practices (feature branching or trunk-based dev, code reviews on pull requests, etc.) from day one. These practices are easier to establish when products are new, rather than fix later.

Monitoring & Observability from the Start:

An often overlooked foundation is observability.

Instrument the new application with logging, metrics and tracing early on. If you only add monitoring after going live, you might find you lack crucial insight into the system’s behaviour.

Instead, incorporate libraries for structured logging and alerting, choose a metrics collection system (such as cloud-native ones like AWS CloudWatch or Azure Monitor if on those clouds), and possibly integrate a tracing system (like OpenTelemetry) if you have a microservices architecture.

This “observability by design” means when you flip the switch to production, you can see what’s happening and catch performance issues or errors proactively.

Take Aways:

Codify Best Practices: Document and implement engineering best practices for the project early. For example, establish a rule that all new code must have unit tests and must be peer-reviewed. Put this in a Wiki for the project, helping to ensure consistency and making information easily accessible.

-** Observe Performance:** Even if you don’t need to handle millions of users on day one, set up the ability to do performance testing. Create baseline performance tests (throughput, response time) for key operations and include them in your pipeline (even if they only run nightly). This way, you catch any egregious performance issues early. It also provides a baseline to compare as features are added.

Keep Documentation Up-to-date: Start maintaining minimal but useful documentation. For example, a README for how to run the system, an Architecture Decision Record (ADR) log where you record key decisions (e.g. “Decision 1: We chose PostgreSQL over MySQL because...”). These ADRs help future team members understand why things were done. Also, document interfaces and APIs as you create them – possibly by adopting an API-first approach (write API spec first, then code). This avoids the scenario where 2 years in, nobody remembers why X was done or how Y works. Tools like Swagger and OpenAPI can be useful here in providing up to date API documentation automatically.
Prototype to Validate Architecture: If you’re trying a novel architecture or new technology in the greenfield, do a quick prototype of the riskiest part. For instance, if you plan event-driven microservices, prototype a couple of services and the messaging between them to ensure it behaves as expected. This can surface integration challenges or learning curves early, allowing you to adjust the plan while risks are low.

In short, a greenfield build is a once-only chance to future-proof your product. By baking in cloud-native infrastructure, automated CI/CD, observability, solid security and a culture of small, high-quality changes from sprint 0, teams can help to avoid tomorrow’s technical-debt problems and drive improved developer velocity.

Rhys Smith, Principal Software Engineer