DEV Community: Neighbourhoodie

NH:STA S01E06 Reproducible Builds

Maddy — Wed, 15 Jul 2026 08:00:00 +0000

This post is part of our series on our work for the Sovereign Tech Agency (STA), formerly the Sovereign Tech Fund. Our introduction post explains why and how we are contributing to various Open Source projects.

In this episode we discover how the Reproducible Builds project helps engineers and users ensure the safety of software packages. We also catch up with Alex Feyerke and Jacoba, who worked on the project, to find out what sets this project apart.

Introducing Reproducible Builds

Suspicion can be a good starting place for personal security on the web. Habits like double-checking the sender of that tracking link before clicking (amazon, not arnazon!) are great, but they can only serve us if we know what to look for. Spotting the equivalent sign of danger in the thousands of lines of code that make up the cool app you downloaded could be… challenging, even if you could be confident you’d recognise it when you came across it.

Software packages, which have been put together especially for your machine’s operating system and specs, are incredibly convenient and have become the standard way many of us interact with new desktop apps. Unfortunately, as is often the case, convenience introduces risks. It turns out, a compiled software binary — the ready-to-install version of a piece of software — is a really good place to hide some malicious code.

Reproducible Builds focuses on exactly this problem: helping people ensure that the binary on their machine contains exactly and only what the author intended. To achieve this, the project defines process commandments and publishes and maintains a wealth of tools to support them in the development process.

The “magic” these approaches and tools contribute to the build process is “complete determinism” — they help ensure the same program, run twice, regardless of hardware, produces the exact same artifact, byte by byte. You may already anticipate that some of the process-side stuff includes practices like not using dates that are logged in the build, for example.

By implementing Reproducible Builds’ practices, you can make the requirements for a successful malicious attack — like a supply chain attack — exceedingly difficult. For starters, you’d need to breach two infrastructures and make the exact same change in both. Not easy, right? That’s exactly what the project had in mind.

Our contributions

In the Reproducible Builds collaboration our team was able to work on more future-facing aspects of the project. Communication — especially making it easier for more people to understand and participate in the project — was what Reproducible Builds needed to become that much more resilient in 2025.

As happens with contribution-based projects, landing pages and tools can find themselves out-of-step with one another in terms of updates. The maintainer team was happy to have our help ensuring their website and documentation reflect the most up-to-date project status, and convey the impact it can have for teams who are considering implementing it.

Contributor documentation

First up, our team found opportunities to make the contributor documentation more approachable and informative. Our goal was to ensure it would help first-time visitors understand what the project is about, give an overview of the parts that make it up and highlight the project’s contribution needs, from coding to writing to donating.

Success stories

It’s become something of a common anecdote that someone is prepared to spend the time or money necessary on digital security only after an incident. Reproducible Builds actively publishes stories that highlight the benefits of using their tools from the outset. We pulled a collection of success stories onto their own page, to help make an executive case to decision-makers to invest in their security through Reproducible Builds.

Website enhancements

Lastly, we wanted to enhance the information flow and usability of the project’s website, and prominently link the newly updated contributor documentation and success stories in a more intuitive navigation.

Reflections from the team

What Surprised You the Most While Working on Reproducible Builds?

Alex: What was interesting about this is that the project does have code, but is primarily a process. That’s a really uncommon and challenging thing to communicate, so even though website updates seem pedestrian at first, restructuring the content for this project needed special attention.

Jacoba: Their team is super fast! When we pinged them they were friendly and very responsive, so working with them was a pleasure. They also meet once a year in-person for a few days, which they’ve been doing since 2015; I think that’s quite unique.

Alex: Yeah that’s right, they do in-person meetings. The project has been going for more than 10 or so years now and they have done a lot of traveling and presenting. Their team really values communication so it was great to help with that.

Conclusion

We’re very grateful to the Reproducible Builds team for welcoming our curiosity and contributions. We hope you were delighted to come and explore our website updates and that our efforts freed you up to work on moving the project even further ahead.

You can read more about our work with the Sovereign Tech Agency and the projects we’ve worked on with their support on our blog.

Keep up to date with future projects by joining our newsletter.

You can join the program

The Sovereign Tech Agency’s Tech Resilience Program is open for your applications.

We can also help you directly

Many companies and products have crucial dependencies on small and potentially understaffed and underfunded FOSS projects. We can help identify and improve the ones that are important to your business. We’re always curious and love to help. Book a call with us today!

NH:STA — Series Finale

Maddy — Wed, 08 Jul 2026 14:47:58 +0000

The Sovereign Tech Agency has awarded the latest round of contracts for their Tech Resilience program, designed to support projects over a 4-year horizon. Unfortunately, Neighbourhoodie was not selected.

We are incredibly proud of the work we have achieved in our time with the STA (neé STF) since 2023. We are doubly proud to have set up a system that works and is now replicated and multiplied across more supporting companies and corresponding FOSS projects.

We’d like to congratulate the fresh organisations who will be starting a new journey in supporting Open Source.

During our tenure, we have helped illustrious projects like:

systemd
PHP
PyPI
GNOME
Servo
Reproducible Builds
Log4J
Yocto
OpenPGP.js
Sequoia-PGP
Prefix.dev
Apache CouchDB
Apache PouchDB
Poppler
Typo3
phpseclib

And we covered technologies as diverse as:

JavaScript
TypeScript
C
PHP
Python
Java
Erlang
Elixir
Make
HTML/CSS
Android App Development
Sphinx
Blueprint
OpenTelemetry
SQLite
Ansible
Buildbot
Jenkins
…and we updated more than 200 CVEs for NIST

We have write-ups pending for some of the projects above which we will release when they are ready, stay tuned for those.

We thank all participants and projects for putting their trust in us 🙏🙏🙏

This leaves us to say, if you 🫵 need help with FOSS project maintenance of any type — for example to identify and improve the projects crucial to your business — do get in touch.

NH:STA S01E05 Log4j

Maddy — Wed, 08 Jul 2026 08:00:00 +0000

In this episode, we look at migrating test suites, cover that infamous vulnerability and speak to Alba Herrerías Ramírez and Julia Krüger about their work on the project.

Introducing Apache Log4j

Apache Log4j is an open source Java library for logging. It’s among the most well known, if not the most well known, of its kind, something its 300+ contributors would attest to. In 2022, 49% of devs worldwide when surveyed responded that they use Java, and if Log4j is the logging utility of choice, you have a rough idea of how widely deployed this library is. It may not be the only reason you’ve heard of it — but we’ll get to that later.

Logging — documenting events including messages, errors or data in the sequence they occurred in either your operating system or software — is indispensable for devs and QA teams when it comes to understanding actual behaviour and reproducing incidents. When it’s installed on your Java application, Log4j lets you see your app’s processes and get helpful messages about them.

As hinted, if you’re already familiar with Log4j, your first introduction may not have had anything to do with a need to find a logging library for your own project. Log4j was famously the subject of Log4Shell, a notable catalyst for governments to invest more in open source resilience.

CVE-2021-44228, better known as Log4Shell, was reported in November 2021. It had lain undetected since 2013; long enough for the version with the vulnerability to circulate onto the world’s biggest servers, arguably potentially affecting most of us. We’re indebted to the Apache team who worked 24/7 in order to eliminate the threat this vulnerability posed quickly. It’s also hard not to be grateful for the impact it had outside of just this project — it’s no coincidence that the Sovereign Tech Agency was successfully founded and funded the following year.

Our contributions

If you’ve been following our work, you’ll know that we’ve gotten quite practiced at joining and assisting teams quickly. To be independent and get up to speed without many resources or help from the core team, we start at the beginning and get the app running locally using contributor instructions. Their docs were excellent. We had no notes.

The Log4j project uses JUnit for their testing suites — a tool loved for its unit tests in particular. The Log4j codebase contained a mixture of JUnit4 and JUnit5, so their team asked if we could take over the migration they started. We had our scope: we’d focus on testing, and deliver the upgrade to JUnit5. This also meant we didn’t need to fully learn the entire codebase to contribute. If you’ve done a migration, you’ll know it can be time-consuming, so we were excited to take this off the team’s hands. Not only can we contribute to raising the codebase quality even further and strengthening the project, it’s unlikely to be the kind of work that introduces a big bug someone needs to take care of later. Double win!

Every time the Neighbourhoodie team switches to a new language we’re not already experts in, we need to get familiar with the idiosyncrasies of the environment that these languages have around them. Refactoring is a great way to get familiar with a project, and adding test coverage even more so. In open source projects, extensive test coverage both keeps projects resilient and also makes them more approachable for contributors.

With Log4j, we needed to learn how the tests work, and how the libraries work so we could ensure we completed the migration without breaking changes. In all, five members of our team worked across 14 modules on a dozen PRs. At the time of writing, most of our PRs have been merged, introducing around ~5700 lines of refactored code.

Reflections from the team

Alba and Julia sat down with us to talk more about their experience working on Log4j.

What Surprised You the Most about this Project?

Alba: For me this project was particularly great because I dealt with the management side. Of course I did a lot of development, but I managed the engineering team. As we saw, working with five people meant there was quite a heavy coordination and management load. I was very happy that so many people joined and so many people could contribute to the project while we were working on it.

Julia: It was great to have so many people working on a project at the same time. If someone encountered a problem, chances were high that somebody else already worked out a solution a day prior for something similar. In most of our STA projects we’ve worked in smaller groups, so this was refreshing.

What was Especially Challenging about this Project?

Alba: Definitely working with a huge codebase, structuring the work with a big working group, and managing contributions with our workflow. At times the modules being refactored were related, so changes needed to be made simultaneously. Contributors outside of Neighbourhoodie also started working on the modules we were working on. Some of those changes were merged before ours, and in one case we did end up with quite a few conflicts. Sorting out the conflicts was a bit of a challenge, but it was great to see people so keen to contribute.

Julia: Agreed, there were a lot of things happening with this project. Since we’re in the EU and the project’s maintainer base is global, timezones also played a role; sometimes you’d wake up to hundreds of changes. You need to check what was merged overnight in case it affects the tests you’re writing. It’s a very active project, which means there was always something to do and we didn’t spend a lot of time waiting for responses. We’re particularly thankful to a handful of the maintainers based in Poland, who reviewed a lot of PRs in our timezone while we were at it.

Conclusion

Maintainers of the package will hopefully enjoy our work most as they breathe a sigh of relief and move off JUnit4. We’re grateful to the team for their support when we needed it, and we’re very proud we could contribute to reduced technical debt in one of the internet’s most deployed projects.

Find out more about the work we do by visiting the Neighbourhoodie Blog.

NH:STA S01E04 systemd

Maddy — Wed, 15 Apr 2026 08:00:00 +0000

This post is part of a series on our work for the Sovereign Tech Agency. Our first post in the series explains why and how we are contributing to various open source projects.

This episode takes a closer look at DNS lookups in Linux, discover how the team fixed some parser bugs and talk to Neighbourhoodie’s James Coglan about the impact of extending test.

About the project

systemd is a critical component of most modern Linux distributions. At its core it is an init system, the component that starts up, monitors and manages all the other processes that run as the system boots up. Using a declarative configuration language, services can register themselves to be started when some other system event occurs, such as when multi-user functionality or networking becomes available, or when some other service starts or stops.

As well as the core init system, systemd includes a suite of other tools that provide functionality needed by many applications, such as log management and various network services. One such service is resolved, a DNS resolver. Applications use this whenever they need to look up information about a domain name, for example to translate a name like example.com into an IP address like 23.192.228.84, or look up where the mail servers for a certain email address are. Needless to say, a computer connected to the internet needs to do these things very frequently, and so it is critical that the DNS resolver is robust, performant, and secure. The DNS protocol involves talking to arbitrary other servers on the internet, and processing input that may not be trustworthy, so it’s important that a malicious DNS server cannot exploit resolved by, for example, sending it malformed responses.

Just like most open source projects, systemd is never “finished” and there is always more that can be done to improve things. Though the project already maintains a high quality bar and is very robust, the maintainers wanted some help adding extra test coverage and finding edge cases, particularly in the resolved component. The project uses the Meson build system, which can be configured to output code coverage reports using LCOV. This made it easy for us to identify source code modules and functions that lacked any test coverage, so we could focus our effort on those code paths where adding tests would have the biggest impact.

Our contributions

In network protocols, the code that deals with parsing and generating messages is often a source of security vulnerabilities, especially in C codebases where a parsing mistake can often result in invalid memory access that an attacker can exploit. We quickly homed in on the DNS message parser as a critical component and set about adding extensive unit tests for most of the DNS message types that resolved knows how to handle. Just as important as checking it parses any valid message correctly, is checking that malformed messages are rejected. We added plenty of checks for different ways that DNS messages and record types can be malformed, to make sure the parser reports errors for all of them and doesn’t mistakenly accept broken inputs.

Having thoroughly tested the parser, we then added tests for the serialiser that encodes DNS record structures into network messages. This makes sure that all the output produced by resolved is valid, and that it never emits a message that it itself could not parse. During this work, we identified and fixed a small number of bugs in DNS message handling:

Fixed a couple of parsing edge cases relating to how resolved interprets the content of OPT records, which are used to convey metadata such as whether a client or server supports DNSSEC.
Identified a scenario where resolved could generate invalid DNS messages, if it was somehow asked to send information about a domain name with segments longer than the 63-byte limit. It was determined there was no known way to exploit this, and the behaviour was corrected a few months later.

After this, we moved on to add test coverage for many of the key data structures and functions used for DNS logic, for example the functions that compare every DNS record type for equality, or that apply CNAME/DNAME redirects to IP address lookups, or that manage the system’s DNS cache. Over the course of a few weeks, we made a significant impact on the test coverage of resolved:

We added over 10,000 lines of unit test code to the project.
We increased the lines of code executed during tests from 16% to 52%.
We increased the number of functions invoked during tests from 21% to 65%.

Reflections from the team

Here’s a short interview with Neighbourhoodie developer James Coglan, who worked on the systemd project, reflecting on how testing makes it easier for new maintainers to approach a project:

What was the most surprising thing working on this project?

James: Projects written in C, especially system software, have a bit of a reputation for being impenetrable and difficult for newcomers to get started with. However, I found the systemd codebase remarkably easy to pick up -- it has a clearly documented build and test process and detailed guides for contributing to the project, what their code style conventions are, etc.

The implementation of the DNS functionality is also pretty clear and unsurprising; once you know how DNS works it’s relatively easy to figure out what the code is doing and where to find particular bits of functionality. Much of it is written in a way that I could add plenty of tests without needing to modify the source code at all, which is not often the case when code is written without tests.

What was especially challenging about this project?

Going into the project, I had a basic understanding of what DNS is and what it does, but not a detailed understanding of the protocol. It has evolved a great deal over time and now encompasses an awful lot of different types of information, and I needed to learn how all of these is represented in DNS messages.

This involved reading many of the dozens of standards documents that define the protocol, all the different types of DNS messages, what the format and validation rules for each one are, etc. Also, a lot of internet protocols end up working slightly differently in actual implementations compared to what the spec says, and so programs like resolved have to deviate from the spec in places to
be compatible with the bugs in various companies’ routers.

Fortunately, the basic DNS message format is fairly simple and it’s quite easy to just send messages to existing DNS servers on the internet and see how they respond. This way, you can get an idea of how it looks in the real world and validate your understanding of the specs, and I relied a lot this sort of experimentation to check various things.

Did you learn anything on this project that could be helpful for other open source teams building critical Linux components?

Making it possible for newcomers to contribute to the project, and to do so safely without fear of breaking things is critical to a project’s resilience. The easier you can make it to write and run tests, the easier it becomes to avoid introducing bugs into critical code paths. People new to the project, who don’t yet know all its details, are much more free to experiment and make changes when they know they can’t accidentally break existing functionality.

Even if you don’t have a lot of testing in place, it pays to write code in a way that makes it easy to do later. A lot of the resolved codebase consists of simple functions that you can call with some input and check what they return. You don’t need to spin up a whole DNS server or configure any network interfaces, you can just invoke the message parser with a buffer and check how it responds.

Making it simple to test specific functions without a lot of complex setup has many benefits for maintainability beyond testing, but writing tests is a good way to make sure more of your code follows this principle.

Conclusion

Good test coverage is one of those things that when it’s done right, people don’t notice it -- everything just keeps working as expected. It’s only when things are not adequately tested that it becomes noticeable as random things break more easily.

By improving the test coverage in the resolved service, we’ve helped the project’s maintainers continue to update the software with confidence, and made it easier for new maintainers to join the project. Everyone can keep working efficiently without being slowed down by uncertainty or manual testing, keeping the project healthy and end users happy.

You can join the program

The Sovereign Tech Agency’s Tech Resilience Program is open for your applications.

We can also help you directly

Find out more about the work we do by visiting the Neighbourhoodie Blog.

NH:STA S01E03 Yocto

Maddy — Wed, 08 Apr 2026 08:00:00 +0000

This post is part of a series on our work for the Sovereign Tech Agency. Our first post in the series explains why and how we are contributing to various open source projects.

In this instalment of the series, we learn why the team dug into more than 200 CVEs from as far back as 1998, and what else they did to help both the Yocto project and Linux security more broadly.

About the project

You may not have heard of Yocto, but you have most certainly used Yocto, at least indirectly. Yocto lets you build a bespoke Linux distribution for your embedded devices. Many “smart home” or otherwise smart appliances likely run a variant of Linux under the hood (that you never see or interact with) and chances are good that the manufacturer used Yocto to create and maintain that bespoke Linux distribution. Their users and supporters are a veritable who is who of the technology industry: Intel, AMD, ARM, Cisco, Microsoft, Siemens, Texas Instruments, Dell, LG, Qualcomm, you name them.

Our contributions

Taking a look at CVEs

First, we looked at security. By allowing to package any and all packages from the Linux world into a binary distribution image, Yocto immediately inherits all CVEs (Common Vulnerabilities and Exposures) of the Linux kernel itself and all of its packages. This comprises a majority of all Open Source CVEs. CVEs are the industry standard for managing security vulnerabilities in software and we are all better off for having a system that lets us manage security issues, large and small, in a structured way. However, there are a number of issues with CVEs:

Not all CVE information is complete. Often a project releases a new version which addresses a CVE, but the original report is not updated to denote the fix version. In practice, the affected software is now listed as vulnerable forever. That’s obviously not great.
Not all CVE information is actually correct. Sometimes affected versions and/or fix versions do not actually match reality, leading automated CVE checks to show that certain software is vulnerable when it is not. Occasionally, two software projects share the same name (by accident usually) and one’s CVEs are now attributed to the other.
Some CVEs are so old that any information on them is difficult or impossible to find. In one case, we had to go back to a PDF published in 1993!

Finally, nobody has time to go through all CVEs and correct these shortcomings. They do not only affect Yocto but all Linux distributions and consumers of the affected software packages. Any affected project now has to maintain their own “override list” to say which CVE is not actually applicable and why. In the security compliance world, this wastes countless hours of work.

Well, nobody had time until now: we set out and in a first triage went through a whopping 221 CVEs from 1998 to 2023 and cleared them all up. We reported a sizable number of CVE corrections to NIST (The United States National Institute of Standards and Technology, which maintains the central CVE database), and we have already had a majority of our reported CVEs fixed for everybody, not just Yocto. We are extremely proud of this work.

Streamlining tracking and graphics

The second part is no less important: usability, being a meta-Linux distribution comes with a lot of challenges that go beyond just packaging the Linux kernel and a workable userland. By allowing users to customise every part of each bespoke Linux distribution, it is imperative that any Linux distribution that can be created with Yocto is a functioning Linux distribution. To manage this immense complexity, Yocto relies on an extensive build farm and reporting from it allows any contributor and user to see where any of the desired guarantees might not hold, so they can be fixed.

However, this system has been built ad-hoc, over time and by various teams and individuals and it is not a coherent whole, so we set out to help. That’s the first part of our work: improve the tracking and visualisation of automated builds and performance tests.

In particular, we:

Improved the performance test result pages by overhauling its graphs subsystem and design, adding graph annotations to make the data clearer, unifying styles for a cohesive project look and making it easy to get an overview of the overall project health.
Improved the user experience, design and typography of the overgrown autobuilder index page test results and added custom filters and date range selections.
Improved the CVE statistics view organised by release, added data annotations and inlined details to improve the overall user experience.
Unified the multi-year and single-year project metrics views:
- Dynamically fetching chart data when the time scale changes (and speed up load times significantly).
- Connected all charts so they show the same date range, making comparisons easier.
- Unified styles with the other metrics views for a project cohesive look
- Addressed various further shortcomings with the shown project charts to make all data visualisations easier to grasp and more useful.

Reflections from the team

Here’s a short interview with Neighbourhoodie developers Alba Herrerías Ramírez and Jacoba Brandner, who run our STF programme and worked on Yocto with the entire rest of the team:

What was the most surprising thing working on this project?

Alba: We didn’t have to be security experts to do the CVE work and we still got a lot of CVEs corrected. Also, the Yocto infrastructure is HUGE, but documented very well. And their communication is awesome, they helped us a lot getting started.

What was especially challenging about this project?

Alba: We were making technology choices for a different team with specific requirements. Of course everything needed to be Open Source software (shout out to Apache ECharts), but also the tooling needed to be long-lasting and low-maintenance. In the end we believe we found a good set that helped the project long-term.

Yocto is using an email-based development workflow, including sending patches. Our team usually works on GitHub or other similar systems, so this was a bit of an adjustment for us.

Jacoba adds: CVE triage is a lot of somewhat tedious archival work that gets lonely very quickly. We switched to a pair “programming” model and that helped keep morale up a lot better.

What lessons did you learn on this project that will benefit the project going forward, or that you will bring into future projects?

Jacoba: We were early in establishing the whole STF process workflow and we learned here that early on, we could have saved some time by requesting help for development setups of their infrastructure from Yocto. We were initially working under the assumption that we should not make more work for the projects, but we took it a bit too far. The Yocto folks were always happy and quick to respond to our questions.

Alba: This was the biggest STF project we worked on at that time and it was great that we could find things to work on, which means you don’t have to be an expert to contribute to projects of this size. It’s a huge confidence boost for our team.

Conclusion

In summary, this was a lot of work and sometimes it wasn’t all glorious digging out patches from arcane software archives (anyone remember SourceForge?), but we are happy to have helped both Yocto and the wider Linux ecosystem.

You can join the program

The Sovereign Tech Agency’s Tech Resilience Program is open for your applications.

We can also help you directly

Find out more about the work we do by visiting the Neighbourhoodie Blog.

NH:STA S01E02 OpenPGP.js

Maddy — Wed, 01 Apr 2026 08:00:00 +0000

This post is part of a series on our work for the Sovereign Tech Agency (STA). Our first post in the series explains why and how we are contributing to various open source projects.

About the project

OpenPGP.js is a pure, Open Source OpenPGP implementation written in JavaScript. Its main use-case is enabling PGP workflows in web-based email systems, but as JavaScript is available on almost all devices these days, its utility is universal.

Our contributions

We started out by introducing a fuzz testing suite to the project. Fuzz testing is a form of unit testing, but instead of relying on manually crafted input and comparing it to the desired output, fuzz testing generates a near infinite number of permutations for input data to find rare implementation bugs. For security-related software, this is an important aspect of a complete automated testing suite.

We then focussed on making the project more approachable for new contributors by:

improving the documentation for first-time contributors
adding a high-level description of the project’s architecture
and improving the general contribution guidelines.

Finally, we started work on migrating certain core modules from JavaScript to TypeScript, to make crucial parts of the project more type-safe.

Reflections from the team

Here’s a short interview with Neighbourhoodie developer Alba Herrerías Ramírez, who runs our STF programme and worked on OpenPGP.js:

What was the most surprising thing working on this project?

Alba: I’m not sure if it’s ‘surprising’ but something I found pleasant was their user documentation, it’s great, I would like to see more projects paying this detail to docs.

What was especially challenging about this project?

OpenPGP.js have been planning to release v6 for a long time and our work got stuck in the middle (since they requested us to base our work in the v6 branch). We needed to accommodate the project’s timelines.

Conclusion

In summary, we could play to our strengths here and help a web-based project and we could build upon our work with Sequoia-PGP. There is lots to be done on the OpenPGP.js project and we hope we get another chance at helping them along.

You can join the program

The Sovereign Tech Agency’s Tech Resilience Program is open for your applications.

We can also help you directly

Find out more about the work we do by visiting the Neighbourhoodie Blog.

NH:STA S01E01 Sequoia-PGP

Maddy — Wed, 25 Mar 2026 09:00:00 +0000

This post is part of a series on our work for the Sovereign Tech Agency (STA). Our first post in the series explains why and how we are contributing to various open source projects.

In our first project with the STA — and first episode of the season — we take a closer look at sequoia-git and use our frontend skills to help the project.

About the project

Sequoia-PGP is an OpenPGP (Open Pretty Good Privacy) implementation in Rust. Its focus is on safety and correctness by using a memory-safe language. PGP has been the backbone for many encryption tasks for decades and this Rust implementation takes this ecosystem into the future.

While work on the core library is progressing with sufficient speed, the lead maintainers are responsible for many ancillary tasks that are important to help the rest of the world to get on board. By relieving them of these tasks, we created space for the project to concentrate on what they do best: write security software without distractions.

Our contributions

Our first task was to review the sequoia-git subproject. Sequoia-git builds on top of git and allows you to cryptographically make sure that code changes you might incorporate into your software come from a trusted set of developers. We gave it a spin and presented the team with a comprehensive report where we believe the tool could be improved in terms of setup, first run, documentation, error handling and overall useability.

Secondly, we developed from scratch a contributing guide for Sequoia-PGP. Open Source projects thrive on the contributions they receive and constantly recruiting and onboarding new developers is a core requirement for any Open Source project maintainer. To make this easier on everybody, we ourselves became first-time contributors and wrote up everything we had to learn to get started. Now anyone who’d like to get started helping out the project can walk in our footsteps and won’t require as much direct help from the current maintainers, so they can focus on other important tasks.

Next, we provided Sequoia-PGP with a modern Frontend Design and Reusable Styling. Our prime goal here was to produce a system that was easy to maintain for many years by people who are not primarily web developers. This led us to eschew many of the modern best-practices designed for folks who do web development day in and day out, but these projects often come with a high learning curve and have to receive regular updates. By going back to web development foundations and minimal tooling, we achieved a modern, better maintainable and better looking website for Sequoia-PGP. This again removed significant time away from the core maintainers while making the project more approachable for newcomers.

In addition to the successful completion of these milestones, this was also our very first project with the Sovereign Tech Agecy and aside from working on the project itself, we also established the blueprint for all following projects. We thank the Sequoia-PGP team for their patience while we worked out the system as we went along.

Conclusion

In summary, we are very happy we managed to help Sequoia-PGP on a more sustainable path for their very important mission. We learned a lot about the OpenPGP ecosystem as a result, and as it turned out, not a moment too soon. Tune in next time when we cover our work on the OpenPGP.js project.

You can join the program

The Sovereign Tech Agency’s Tech Resilience Program is open for your applications.

We can also help you directly

Find out more about the work we do by visiting the Neighbourhoodie Blog.

Neighbourhoodie and The Sovereign Tech Agency

Maddy — Wed, 18 Mar 2026 09:00:00 +0000

We’ve been doing something incredibly exciting for the last couple of years, that we’re getting around to sharing on DEV: Neighbourhoodie is an official Implementation Partner of the Sovereign Tech Agency (STA), formerly the Sovereign Tech Fund, and their Tech Resilience Program.

Hold up, what does all that mean? Let’s back up a little:

In 2021, a security issue (or vulnerability) was made public in the Log4j library. This is business as usual in the software world, security vulnerabilities are found every day, they are being reported, issues are fixed, new releases come out and then everyone can update.

That time, things went a little different: the vulnerability was disclosed without anyone having a chance to fix it, let alone update their systems to the latest version.

In addition, this vulnerability affected anyone running Log4j in a way that anyone anywhere can run unverified code on a target system. This is 10 out of 10 bad.

Two more compounding factors made this issue even worse:

It turns out, Log4j is used everywhere. Not a day goes by where anyone doing any digital work is not using a system that uses Log4j. Small companies, big companies, governments, everyone is using Log4j.
Log4j was maintained by a very small team that was working part time and unpaid.

One of many government agencies, the German Bundesamt für Sicherheit in der Informationstechnik (Federal agency for IT security) classified this as an “extremely critical threat situation”.

In response to this, the German government founded the Sovereign Tech Agency to run programs that help avoid issues like this in the future.

One of these programs is the Tech Resilience Program, formerly called the Bug Resilience Program:

The Bug Resilience Program proactively increases the resilience of open source software infrastructure and empower small and medium-sized open source projects. The goal is to lower their risk of harboring bugs and improve their capacity to respond to bugs as they are discovered. The program provides services to OSS projects, such as helping projects deal with technical debt, working on known security issues, performing code security audits to reduce high-risk vulnerabilities, as well as offering a bug & fix bounty platform to discover, responsibly report, and fix bugs.

So far so good, but where does Neighbourhoodie come into the picture?

How it works

As mentioned above, we are an Implementation Partner. That means we work directly with Open Source projects and help them be more resilient in the face of security issues. We are collaborating with projects in the open and help them address their highest need issues. These vary widely from directly working on the code to improving processes, to internal documentation that helps onboard more folks, and sometimes it means taking on a bunch of chores to free up the core maintainers to focus on high-value work.

Here’s an (abbreviated for clarity) outline of the kind of work we are doing with the projects:

Analysis
1. Organisational Preparation
2. Technical Preparation
  1. Review software dependencies
  2. Review project: Code & Tests, including CI
Improvement
1. Testing
  1. Add test coverage
  2. Introduce or expand automated testing
  3. Increase testing matrix
2. Release Engineering
  1. Stable versioning
  2. Automate releases
  3. Audit access control for release automation
3. Software Development
  1. Help fix high-impact issues
  2. Help review outstanding contributions
  3. Improve documentation for first time contributors
  4. Improve contribution guidelines more generally
  5. Improve developer experience
  6. Help recruit more contributors

While all this is very abstract, we’ll share some of the concrete work we have done with actual projects in later blog posts.

Neighbourhoodie are honoured to have already worked on diverse and essential projects, you can more about on our blog:

We’ll be publishing these stories here on DEV in the coming weeks.

How can you join?

The Sovereign Tech Agency’s Tech Resilience Program is open for your applications.

We can also help you directly

Many companies and products have crucial dependencies on small and potentially understaffed and underfunded Open Source projects. We can help identify and improve the ones that are important to your business. Our friendly sales team is happy to help. Book a call with us today!

How to Sync Anything: Building a Sync Engine from Scratch — Part 3

Maddy — Wed, 11 Mar 2026 14:08:40 +0000

Last time we learned how to efficiently decide what needs syncing. This time we will learn how to version our data.

Let’s jump right in!

In the example from part two, with stories updating after they’ve been created, and with apps being able to request a list of current stories at any time, we already saw how to deal with different versions of a story in our calculation of a delta for an app to download.

This third part explores different version schemes and discusses their respective trade-offs. Before we go into them, let’s briefly look at what kind of technique we are looking for.

Given two objects with the same ID, we need to be able to tell which one came after the other, so we know which one is the most recent one.

If we have more than two, we need to be able to put them into an ordered list, so we know which ones come after others, and so we can know which one is the most recent, or latest. And we need to be able to tell if a more recent version of an object is an ancestor of an older one.

Increasing Integers

The first and most natural idea for denoting versions of a document is increasing integers. That’s just a fancy way of saying “Version 1”, “Version 2”, “Version 3”, and so on.

The advantage here is that these version numbers are very easy to understand for humans and computers alike.

So what’s the problem?

Imagine the client device and the server updating one of our story objects independently. Our story is at “Version 1”. The server then creates “Version 2” because that’s the next in the series.

The client also creates a “Version 2”, but we have no way of knowing whether the contents of the versions are the same. When we try to synchronise the client and the server now, we run into trouble.

We need to find something better to describe our versions.

Timestamps

The next common idea for denoting versions of a document is a timestamp, a snapshot of a clock on some device, Sat Jul 23 02:16:57 2005, or 12569537329. For example, we could have story objects with a property updatedAt that we assign the timestamp of the last update to. And when we first create the object, we also record the current time.

Aside: there are many formats for timestamps with properties ranging from easy to read for humans to easy to process for computers. Whichever you choose depends on a set of tradeoffs for your application and a discussion of the merits of one format against another is outside of the scope of this article. If you just want to pick one that’s generally good, go with ISO 8601.

When we have two story objects with the same ID, we can compare their timestamps to figure out which one is the latest version of our story. As time has the convenient property of monotonically increasing, in other words: clocks only ever go forward, and this will forever be true.

Or will it? Before you go on, we’d like you to read this fantastic collection of falsehoods that programmers believe about time and the sequel. We’ll wait right here.

Hold music is playing

Say we are editing our address book on our phone. First we add a contact’s new phone number. Phone numbers are icky to type so once we are done and compare it to where they wrote it down, we see we swapped two digits.

Under the hood we recorded the current timestamp of the phone into our object versions, so we know which one was the latest. We fix the swapped digits quickly, no harm done. Or is there?

As we’ve seen in The Falsehoods, that is not always true. In fact, even companies with near infinite engineering and operations resources, like Google, can run into trouble with this.

There are a number of things that could conceivably happen, like a rogue time server telling the phone’s clock to adjust to an earlier time, or a daylight savings time switch (lucky they happen at night!); even Apple’s iPhones kept having issues with alarms set for January 1st for a while.

In our example, this means the phone number with the typo will survive as the latest edit to our object, and not the correct one. This is not what we wanted.

Whatever the exact scenario, this is plausible and has documented occurrences in small and large-scale systems: you can’t rely on timestamps to guarantee the order of two items, even if the timestamps were generated on the same device as they lead to data loss.

Ok. How can we improve on timestamps?

Before we find out, we need to introduce one more new concept: conflicts.

Embracing Conflicts

To illustrate conflicts we need to expand our example a little bit. Imagine our app is not just a consumption app for people who want to read our stories, but it is part of a content management system (CMS) where story authors can update their stories.

Our system as designed so far already works in this case; we can have the server request latest story updates from the app of all authors, and it can then send updates to any apps of our readers.

Now imagine this happening: an editor uses the CMS to edit a story by an author to fix a typo, while the author updates the same story with latest developments and we are using a timestamp to signify which version is the latest.

Even if we now have two devices with perfectly synchronised clocks (which, as we’ve learned already, we can’t guarantee we have), this shows a larger problem: what happens if two people make changes to the same object on two different devices roughly at the same time?

It doesn’t help that we know which one was updated after the other, either we pick the latest story developments and keep the typo, or vice versa. This is traditionally called a conflict.

DUN. DUN. DUN

If you are anything like me (which you are probably not, but bear with me), you don’t like the idea of conflicts. I am, as they call it, conflict averse. And while embracing conflicts might be a decent strategy in real life, in computing, we are usually trained that conflicts are bad. Very bad.

That is until we start learning about distributed systems. That’s a mighty term for a lot of ideas and concepts, but it usually boils down to: you have two or more computers connected by a network of some sort and you are trying to make it so some piece of data looks the same on both computers.

Now the tricky part is this: either computer can fail at any time, and the network can have a multitude of failure scenarios (c.f. The Fallacies of Networked Computing) that range from making the other computer appear very far away, or very busy, to turned off.

Distributed systems is a large field with a lot of applications, but that must not scare us. On the one hand, we do have a distributed system: two devices and a server, all connected over the internet.

This fits the “two or more computers connected by a network” definition. But instead of introducing a whole new field of computing into our discussion, we’ll just learn a few best practices from distributed systems and then be on our way, so no worries!

As opposed to single-machine computing, which is what we usually do, where we’ve learned that conflicts are a bad thing™, in distributed computing conflicts are a natural state of data. That doesn’t mean we can just ignore them because they are “natural”. It means we cannot ignore the fact that these things exist, or try to build up an illusion that these things don’t exist. We have to embrace conflicts.

We’ve already seen we can create a conflict, now we can look how to resolve one. Maybe in our scenario above, the typo fix is in the second paragraph and the new developments in the story are at the bottom.

If we are storing a story paragraph by paragraph, for example, we could have a conflict resolution procedure that checks if the two edits we have are in separate paragraphs, and if yes, updates our object with both new versions of the respective paragraphs and then stores the result back into our object.

This is commonly referred to as a merge. If you’ve used git or other source code version control systems (especially ones of the distributed variety), you might have seen merge strategies that do exactly that.

In this case, a computer can decide what the right final, non-conflicted version of our story is.

But now imagine both author and editor fixed the typo, but the author decided to use a different word altogether, while the editor just corrected the spelling, and both are doing this at the same time.

In that case, it is impossible for a computer to know which is the correct version. We could apply a policy in our app that in case of a conflict the author’s changes should take precedent. Or editors get the preferred treatment. Whichever it is, this can be a viable strategy for specific apps.

As mentioned before, these are all examples, and we are trying to become experts in sync more generically. So we need a better solution.

And here is a bit of a bummer: there are some types of conflicts that no computer can ever solve. We will need human intervention. There is no way around this, but at the end of the day, our brains are a lot more sophisticated than computers and conflict resolution is where this plays out.

Despite this, I hope you feel a little more comfortable with conflicts.

As an added bonus, now we are equipped to learn how to improve on timestamps for versioning our objects.

Vector Clocks

If you start searching for techniques to solve the issues with timestamps from a wall clock, you will eventually find a mention of vector clocks, for a variant, see Lampert Timestamps. A vector clock also works with timestamps — though its source is not a wall clock, like we humans use, but a so-called logical clock.

When using vector clocks as version specifiers, we not only send logical timestamps with our objects, but also the state of the logical clock, so that later, when it is time (pun intended) to compare two objects, we can calculate which one came first. Great!

Just one more thing.

Remember when we talked about the author and the editor of a story fixing a typo and adding to the story at the same time? We now know how to detect and how to resolve this conflict.

But what if the typo fix both create result in the same text? With timestamps or vector clocks, a conflict is still generated, and our subsequent merge procedure has got nothing to do. So we can handle this too, but since the resulting story is the same, wouldn’t it be nice to not create a conflict here in the first place?

I realise this example is a little bit contrived, but I hope by staying within our example project, we can see the problem here. Where the problem would actually make our lives harder is when we don’t have a single sync server, but a cluster of sync servers, which we might need for scaling to higher and higher loads of our system.

Explaining all this takes time, and this tutorial is already getting long, so we are going to skip the details of this. I hope you forgive me. If you don’t, I hope this gif of a squirrel makes up for it:

Content Addressable Versions

To solve the problem of same edits generating conflicts, we need to look at another technique: Content Addressable Versions. The idea here is this: take the contents of an object, pass it through a hash function, and use the resulting hash as the version. If we now make the same change on two devices, we don’t generate a conflict. Hooray!

We have one old problem though: Content Addressable Versions are not ordered. There is no way to take two of those versions and know from their value which came before the other. This is unfortunate, but we can solve this.

Instead of replacing the version with each object update, we keep an ordered list of versions along with the document. When we update the object, we put that Content Addressable Version at the top of the list. Then we can know which version came before another by traversing our list.

Before this is generally useful, we must add one pragmatic trade-off. If we update our objects a lot, our list of versions gets very long, and we are storing a lot of data, and moving said data across devices, and at some point we are losing all the benefits we covered when we discussed deltas above.

As a result, we make this list a bounded list, e.g., it can have at most “X” entries. We can even make this length adjustable depending on our application. Anything between 10 and 1000 is reasonable for general purpose applications, so we’ll have to play with our app to see what’s best.

The trade-off here is avoiding unnecessary conflicts at the expense of a little more storage space, and we get to decide how much storage we want to spend for this benefit.

You may think that the list of versions is rather clunky compared to a neat solution like Vector Clocks. But being able to avoid conflicts where possible becomes a priority when our server becomes a cluster of servers that all work independently to handle our application load and we want to present a consistent set of data to our users.

Storing Conflicts Efficiently

And then, finally, one last thing. When we are storing our ordered list of versions, it looks like this, simplified to use numbers instead of our Content Addressable Versions:

We have five versions; now we are creating a conflict. In order to signify we have two versions that are in conflict, we put them in a sub-list at the top of our list:

We now know that version “6” and version “5” are in conflict. Now imagine instead of resolving the conflict first, we create another one:

Now we have versions “7” and “6” in conflict, and that conflict being in conflict with version “5”. Heavy stuff!

We are representing versions in a tree structure. It allows us to:

store conflicts efficiently, and
resolve them recursively, so regardless of how many conflict we have, we can always get to a non-conflicted state eventually.

Putting it All Together

Here is a summary of our algorithm, without the explanation of why we do all the things, summing up all of the above:

There are two devices, A and B, that want to sync data from A to B.
Read high watermark checkpoints from both devices, if they exist.
Start reading the update sequence on A from the recorded high watermark or from the beginning, if none exists.
- for each ID/version pair:
- is it a delete?
  - Yes: store the delete locally (see below) on B.
- do we have the ID/version pair already locally on B?
  - No: fetch ID/version from A and store it locally (see below) on B.
- store high watermark of current update sequence on A and B.

Store it Locally:

Check version list to see if the new version is a direct successor of the currently latest version on B.
- Yes: add the new version to the data store on B.
- No: create a conflict.

Note: I’m glossing over a few details here that involve tracking of deleted conflicts, which are outside of the scope of this tutorial.

Conclusion

Thank you for reading! I hope you learned a lot about the pitfalls of synchronising data reliably and that you are not discouraged to tackle your own sync solutions based on the principles we’ve explored here.

With all the knowledge you’ve gained here, I hope you feel empowered to build your own sync engine!

Please leave any feedback in the comments on Mastodon.

Without you knowing, I explained to you the why and how of the CouchDB Replication Protocol, which allows seamless peer-to-peer data sync between any number of peers, including all the scenarios we’ve explored above.

The CouchDB Replication Protocol is implemented in CouchDB itself, so that covers our server component. Then there is the PouchDB project implementing the same protocol in JavaScript targeted at Browser and Node.js applications; that covers your clients and dev servers.

Addendum

There are two related technologies I’d be amiss to not mention: Operational Transforms and CRDTs.

Operational Transforms

If you’ve ever seen collaborative text editing on Etherpad, Google Docs, or similar, that is powered by a technology called Operational Transforms. It is designed to let any number of people collaborate on text at the same time. It can deal with a certain level of network instability, but generally, it requires clients to be connected at all times. If you go and keep editing, your changes can be integrated later, but not indefinitely later.

In addition, it is designed for text, not for generic objects, so for true offline capabilities of generic data objects, Operational Transforms are less useful. Check them out if you need a solution for mostly connected text.

CRDTs

Conflict-Free Replicated Data Types or CRDTs are specialised data structures designed for use in distributed systems and they have a lot of the properties we’ve discussed in this tutorial, but most notably, they do not have a concept of conflicts. How great is that?! Very, in fact.

Alas, everything in programming is trade-offs, so what do we trade for being able to have conflict-free data structures? Well, they are specialised data structures, like sets and counters, and not generic object representations like JSON. So, we’ll have to buy into a whole world of these specialised data structures, and maybe we have a hard time mapping our application objects to them. If you think your application can benefit from CRDTs, I strongly encourage you to try them out, but be aware of the trade-offs.

An earlier version of this post used to be published on the now defunct hood.ie blog. That earlier version had received reviews from Naomi Slater, Katharina Hößel and Jake Archibald. My thanks to them.

Discover more ways to work with CouchDB — the syncing database — on the Neighbourhoodie blog, where we regularly publish tips, guides and tutorials.

It’s time for the 2025—2026 Annual Apache CouchDB User Survey!

Maddy — Wed, 04 Feb 2026 15:01:32 +0000

CouchDB is an open source database made by a global team of volunteers. It’s shaped by its users and community through everything from contributions to project feedback — and now is the best time to get involved.

📝 Be a participant in the 2025—2026 Annual Apache CouchDB User Survey

You’ll have the opportunity to let us know what you use most, love most and need most from your CouchDB.

We’ll also be back here when we’re ready to share the insights. If you’re curious what others have to say or want to learn what they have running alongside CouchDB in their stack, sign up for our newsletter to get the results as they come out.

We can’t wait to hear from you!

How to Sync Anything: Building a Sync Engine from Scratch — Part 2

Maddy — Wed, 19 Nov 2025 08:00:00 +0000

In this part, we will learn how to efficiently find out what data needs to be synchronised.

Say we have a news app that runs on mobile devices and a server that publishes new stories. Blogs and RSS are good real-world examples.

The scenario is this: our app starts for the first time and there are no stories available for the user to read on the device. So, the app asks the server to send the latest set of stories to the device. End result: our users get to read some stories.

Later, when our apps starts for the second time, your app asks for the latest set of stories.

And here is the first interesting bit: we don’t want to send the app the stories it already has.

Why?

Four reasons:

They are already on the device, so it’s redundant
It costs us server processing time
It costs our users bandwidth
It adds response-time latency to the experience

As a general rule of thumb, in user experience design: don’t let the user wait unnecessarily or they will get frustrated and stop using your app, your store, whatever.

So what’s the solution?

We need to request new stories from the server in a way that says “I’ve got all stories up to this point. Give, give me anything that’s new”.

We’ll call the difference between “all stories” and “stories that exist on the client” the delta (i.e. difference).

The delta is what we need to get from the server.

Calculating the Delta

To ask for the delta efficiently, we need two ingredients:

Our app needs to store its state somewhere, so it knows what stories it already has. We’ll call this the high watermark (for reasons that will become clear soon). In a native app, this could be stored in a device-local database or file. In a website or web app, this could live in browser storage systems like localStorage, or IndexedDB.
Our server needs to look up the list of all stories, sorted by when they were published. It needs to be able to do do this efficiently. And it needs to be able to send back any range of stories, from any specified start date to the present moment.

A naive implementation on the server side would be something equivalent to retrieving all stories from the database, sorting them by date in memory, before sending the result to the app. If the app sends a high watermark, the server would only send stories that come after the high watermark.

While this definitely works, it is not very efficient.

All stories have to be loaded from the database, sorted and finally filtered to match the range the client is interested in.

While a server with only a few apps requesting only a handful of stories can easily do this, we should be looking for optimisations here. Something that will work with large data sets.

How? Traditionally, we would add an index to the database.

The Index

An index has two things going for it:

Order: things will be stored in index order on disk, so reading things (e.g. stories sorted by publish date) in that order is very efficient.
Support for ranges: reading only part of an index from any point in the index range (e.g. all stories after a certain date) is very efficient.

Maybe the database already has an auto-increment integer as a primary key. If so, each new story gets an ID integer that is one higher the previous ID.

If this is the case, the server is already prepared and the app only needs to store the highest ID it gets from the server in the initial request. The app can then send that ID as the high watermark for subsequent requests.

When the app receives the new batch of stories, it stores the new highest ID as the next high watermark, and so on.

Here’s what this looks like:

Updates

Now, let’s imagine our stories are sometimes updated. This is fairly common. Typos need to be fixed, corrections posted, new story developments need to be added, and so on.

In this case, using an auto-incrementing ID is not a good solution.

Not only do new stories need to get a new ID, or more precisely a higher ID, but also: updates to a story need to get a higher ID, so that they are included in the calculation of the next delta.

To solve this, we need a new table: updates.

This updates table provides a secondary index that we use for recording updates only. If you add one record for every update, pointing back to the corresponding story that was updated, the auto-incrementing ID functions as an update sequence.

Now, instead of passing back the story ID as a high watermark, the client can pass back the last update sequence it got. (We have to tell the client the latest update sequence with every request for this to work.)

When the server receives an update sequence from the client as the high watermark, all it needs to do is send back every story that corresponds to every update with a higher update sequence ID in the updates table.

In this example, the client first receives five stories and the update sequence ID of “5”. When the client sends the “5” back as the high watermark, the server notices that there's an update record with ID “6” and sends back the only the corresponding story, which happens to be the story with ID “3”. The client also now gets the update sequence “6”, which is recorded locally as the high watermark.

In our case, our app might choose to handle this information by adding an “updated” marker to the list of stories, so our user knows the story has been updated with new information, even if it was previously read.

So far so good!

But we need to handle one more case for updates: what happens when a story has been updated twice?

This is how it would look in our current system:

This looks very similar to the previous image, with one difference: the delta includes story three twice. Thing is, we only really need the latest update. That previous update will be discarded by the client, so we're wasting server resources, bandwidth, and UX latency sending it.

In our example scenario, this isn’t too bad. But imagine we're downloading thousands of objects with hundreds of updates.

Not great.

So what do we do about it?

The update sequence index we’re using is called a composite index. This means more than one data item defines the index range and how it is sorted. In our case, we have an auto-increasing change ID plus a static story ID.

We need to make one more change to our composite index to solve this multiple update problem.

Let’s make it so that the story ID column in the updates table is unique. Every time we try to write to that table, if we see there’s an existing entry with the story ID we’re about to record a change for, we delete the old row.

If we do this, there will only ever be one update record per story, and it will always correspond to the latest update.

Here’s what that looks like:

What happened here?

Between the first and second requests, two changes were made to story three. Because the story ID column on the updates table is unique, we only have the final update recorded, which is why the update with ID “6” is missing.

The end result? Our client only receives the final update for story three.

Deletes

There’s one last thing before we move on: deletes.

In our app, we want to treat deletes as updates, so that deletes can be sent to the client like any other sort of change.

Let’s take the same example we’ve already been using and instead imagine that between the first and second client requests, story four is deleted from the server.

In this case, we’d want to record this deletion in the updates table, along with an update ID. We can then send this information back to the client as if it were a new story, updating the high watermark as we do so.

Here’s what that would look like:

At this point, the app has several choices. The simplest is probably just to delete the story locally.

The Road to CRUD

Congratulations! Now we know how to communicate new stories, story updates, and story deletions to our app. In other words, we can sync Create, Read, Update, and Delete (CRUD) operations.

And here’s a cool thing: we’ve done this generically enough that a story can be any sort of object we might want.

In this post, we’ve spoken about different versions of the same object. For example, using 3, 3*, and 3** to refer to the first, second, and third versions of story three. We’re also talking about versions when we talk about the deleted and non-deleted version of story 4.

This concept — versions — is very important. So we’ll take a closer look at that in post three of this three part miniseries.

Discover more ways to work with CouchDB — the syncing database — on the Neighbourhoodie blog, where we regularly publish tips, guides and tutorials.

How to Sync Anything: Building a Sync Engine from Scratch — Part 1

Maddy — Wed, 12 Nov 2025 08:00:00 +0000

There’s an old saying I paraphrased in this by now ancient tweet[sic]:

“Friends don’t let friends build their own {CRYPTO, SYNC, DATABASE}.” — @janl on September 24th, 2014

What do I mean by that?

Well, it’s very hard to get these things right. Additionally, not getting them right will mean a lot of unhappy developers and end-users. In other words, these things are best left to the experts.

However.

Brought to its logical conclusion, we’ll end up with a situation where nobody is doing crypto, sync or databases anymore, because no new people learn these things. So there are exceptions.

You can be an exception.

Let’s become experts in one of these things! Specifically, for the purposes of this series of posts: sync.

Or, to put it another way: let’s look at a bunch of different problems and solutions that relate to making data available offline.

I’m picking a specific field (web frontend / backend sync) by way of example, but know that the same concepts apply to any two or more sites you want to synchronise.

The Future

When I first wrote this article, Progressive Web Apps (PWAs) were just being announced as maybe becoming a thing in the future.

That future has long since arrived. What has not arrived is the logical next step that comes after caching your static assets: how to make your data available when a web app is not connected to a server.

You already know how to store your website’s assets in a ServiceWorker cache. You already have some idea about how to store server-delivered content in IndexedDB. And you might know how to locally store user-entered data (think an address book, note taking, favourites, and so on) in IndexedDB or localStorage so you can push it to the server when a network connection becomes available.

We can already see a few different scenarios:

server always pushes (news)
client always pushes (notes)
both push (social, email, or multi-device access of individual notes, and all forms of group data sharing.)

Let’s have a look at the different techniques required to make this all work. We’ll see which applies to which use-case while we go through everything, step-by-step.

The Scene: Background Sync

In his introductory talk about PWAs, Jake Archibald shows an example of a chat application. Towards the end, he uses the Background Sync API to send new messages “later”, whenever the browser thinks it has an actual internet connection (as opposed to having a network but not thinking there’s an internet connection).

This is a great UX experience: the interaction is done as far as the user is concerned. There is still an indicator that the message hasn’t reached the recipient yet, but there is no need to keep the user waiting for any network operations.

In the same talk, Jake explained that the browser (and sometimes mobile operating system) mechanics that determine whether a device is online are not very useful. That’s because they only cover the connection from the device to the wifi router or cell tower. But there are a lot of steps between there and the final web server. For example: ISP routers, transparent proxies, satellite uplinks, just to name a few.

Now imagine we are sending a message within a Background Sync event as shown in Jake’s talk, but we’re on a fast train that goes through a bunch of tunnels in fast succession. Or we are on a conference or hotel wifi. Or we’re a large event with thousands of other people.

Our phone might get a request/response going, and as a result, wakes up Background Sync and tries to send our message. But by the time the message gets sent, we don’t get any requests going because the network became unavailable again. Background Sync will then re-try at a later time. Which is great, and exactly what we want. We don’t have to worry about whether the message is sent, because we know it will be, eventually.

Now. Let’s look behind the scenes.

The Cycle of (the) Web

Even though there are many hops at which things can go wrong in a HTTP request/response cycle, there are generally two parts: the request and the response.

Each part can fail, so we have to account for the following scenarios:

the request fails
the request makes it, but the response fails

Scenario one is neatly covered by Background Sync, but what about scenario two?

Imagine we are sending a message. The server accepts it and sends it onwards to the final recipient. Then, the server responds to the device, saying it handled the message correctly.

If that response fails, Background Sync will consider the message sending request as failed and will re-try it later. At that time, the server accepts the message and sends it to the recipient again. And now we’ve created a mess, because the recipient is left wondering why there are two messages that are exactly the same.

And if we are really unlucky, our recipient will get an infinite stream of the same message because the same problem is happening over and over and over again.

Let’s see how we can avoid this.

There are solutions for this exact problem that we can employ in the server portion of our app. But we’ll get to that later. For now, let’s look at how to solve this more generically, so that we can solve the same problem when it comes up in very different contexts, not just when sending messages.

Identity

In programming, we generally work with bits of data wrapped up in objects. If we want to be able to refer to an object at a later time, we must be able to reference it unambiguously. Sometimes that means giving something a name (like an auto incrementing number) and sometimes we can derive an identity from a number of uniquely identifying properties of the object.

For example: in an address book, assuming no two people have the same name (a bad assumption, but we’ll get to this shortly), the identity (or ID) of an object could be derived from the combination of the first name and the last name. That would be an example of what is known in database circles as a natural key.

The advantage of a natural key is that you don’t need to store any extra data on the object to be able to refer to it later. It also means your data is easier to de-duplicate.

Natural Key Disadvantages

There are two disadvantages to using natural keys: changes in the natural key and the need for uniqueness.

Key Changes

Say your natural key is somebody’s email address. Email addresses change, and you need to be able to deal with this change.

For example, if you update someone’s email address, but another object was using that as the natural key to refer to your object, that second object will need to update its reference.

While this type of change might be infrequent, other natural keys can change more frequently.

This is opposed to an opaque key, or surrogate key, that is a number (like an account number, a social security number, a phone number) or string of random characters (e.g. a UUID). Every time you’ve seen an auto incrementing ID column in an SQL database, that is a surrogate key.

Uniqueness

In our address book example, when your natural key is derived from last name and the first name, we may hit a problem. What if you know two people named Jane Smith? Our natural key is not unique, so if we try to look up “Jane Smith”, we can’t be sure which object to return.

Surrogate Key Advantages

Surrogate keys have the advantage of being agnostic to any changes in our objects’ data.

The disadvantage is that they are opaque, so there’s no natural relationship between an object’s ID and its data. The ID 43135 doesn’t tell you a whole lot about a user record for Jane Smith.

While these IDs are usually only used by computers, sometimes they leak out into the real world. Sometimes we even expect people to remember them and manipulate them. Not ideal. In addition, they make debugging and logging harder, as devs are now forced to map IDs to objects’ natural data to see anything useful.

Moving Forwards

Since we are looking at how to make data available offline, and since that means storing data on an end-user device and a server, we have multiple copies of all our data. So we need to be sure that any operations can unambiguously reference that data.

So, when creating a new object that we give it a unique ID. And we can’t guarantee this with natural keys. A new Jane Smith could be added on the server and the user’s device while they can’t talk to each other because one is offline. That would spell all sorts of trouble for us.

Unless any of our data lends itself to be a natural key without the disadvantages, we use surrogate keys.

Specifically we use UUIDs, because they have the convenient property that even though you still can’t guarantee the uniqueness of natural data on disconnected devices, the chance of assigning the same UUID to different objects on two or more devices is so unlikely, we practically don’t even have to consider it a possibility.

With that sorted, we have a new problem. We have multiple sets of data, which may or may not diverge from each other in various ways. In my next post, I’ll show you how to figure out what’s changed, what needs syncing, and in what direction.

Discover more ways to work with CouchDB — the syncing database — on the Neighbourhoodie blog, where we regularly publish tips, guides and tutorials.