If you're tired of users finding errors before you do, a library called Scientist has some answers.
When we release new code, we test it with as much rigor as we can, but it's very hard to replicate the full range of data, workflows, and environments where users use our software.
This makes early users effectively a form of canary testers like the canaries coal miners brought with them into the mines to get early warning of hazardous gasses.
The problem is, we never want our users to effectively become a "dead canary" and encounter a bug that we could have spared them from if something else could have found it first.
Scientist
The Scientist library offers a solution. Using Scientist, the new code is deployed alongside the old, and Scientist runs an experiment in which it executes the legacy code as well as an experimental new version, then compares the two versions together.
Regardless of whether the results of these two methods match, the result of the legacy version is used and returned to the caller, meaning that the user is shielded from any issues caused by the new version.
This means that errors in the new routine can be found without users ever seeing them. If a user’s data finds some error or logic gap in a new version of the code, the user should be completely ignorant of that fact and keep on using the software as if it worked the way it did prior to the update.
Instead, the results of this comparison are sent to a results publisher which can log to a number of places and allow the development team to tweak the new routine before going live with the feature.
This allows us to rewrite or expand portions of our code, ship the fixed version alongside the old, and then compare the implementations against live production data. Once we’ve collected enough data to be satisfied with the new routine, we can issue a new release removing Scientist and the Legacy Routine from the equation.
Using Scientist
An example in C# using Scientist .NET is listed below:
var legacyAnalyzer = new LegacyAnalyzer();
var newAnalyzer = new RewrittenAnalyzer();
var result = Scientist.Science<CustomResult>("Experiment Name",
experiment =>
{
experiment.Use(() => legacyAnalyzer.Analyze(resume, container));
experiment.Try(() => newAnalyzer.Analyze(resume, container));
experiment.Compare((x, y) => x.Score == y.Score);
});
In this example, we call out to Scientist.Science
and declare that we expect a custom result type of type CustomResult
back from the method invocation. From there, we give Scientist a name for the experiment (available to the result publisher) and tell it to Use a legacy implementation. This method’s return value will always be returned. We can also declare 1 to many different experiments to compare it to via the Try method. Finally, we can define a custom means to Compare the two results looking for equality.
Note that Scientist .NET will, by design, run the different routines in random ordering.
Testing with Scientist
Scientist can also be used inside of unit tests to compare a legacy and a refactored way of doing things. In such cases, you wouldn’t want the refactored version to exhibit different behavior, so you could rely on a custom result publisher to fail a unit test.
Result Publishing
The results of the experiment are then published to the result publisher. A custom result publisher is defined below for reference:
public class ThrowIfMismatchedPublisher : IResultPublisher
{
public Task Publish<T, TClean>(Result<T, TClean> result)
{
if (result.Mismatched)
{
var sb = new StringBuilder();
sb.AppendLine($"{result.ExperimentName} had mismatched results: ");
foreach (var observation in result.MismatchedObservations)
{
sb.AppendLine($"Value: {observation.Value}");
}
result.Mismatched.ShouldBeFalse(sb.ToString());
}
return Task.CompletedTask;
}
}
This will allow Scientist to run in such a way that mismatches throw exceptions, which is useful only in unit testing scenarios (for production scenarios you’d want to log to a log file, error reporting service, database, etc).
This publisher can then be provided to scientist by setting
Scientist.ResultPublisher = new ThrowIfMIsmatchedPublisher();
Summary
Scientist has a lot of value for cases where you want to replace bit by bit of an application, try a performance improvement, or other forms of incremental changes. Scientist is not, however, suited for scenarios where the code is creating some external side effect such as modifying a database, sending an E-Mail, or making some sort of modifying external API call since both the legacy and the new routine will run.
The library is available in a wide variety of languages from Ruby to .NET to JavaScript to R to Kotlin and others. Additionally, the core concepts of the library are simple and can be used by any language.
Ultimately Scientist is a very helpful library for comparing old and new versions of your codebase and for giving your users a buffer between bleeding edge features and unwittingly acting as a dead canary.
Top comments (3)
I like this concept and pattern! Testing with production data without risking user experience.
I wonder if there there is a way to extend this to functions that have external impact. I guess that would depend heavily on the application so might be tricky for a generic solution.
Thanks for bringing this to my attention. :)
Absolutely! I've focused strongly on quality for myself and my team this year and discovering Scientist has been perhaps the largest impact for our team, competing only with snapshot testing using Jest and Snapper.
I've applied to give a presentation on this subject in mid January at a large conference and I'm hopeful I'll be able to share it with more people, because it really is an interesting way of thinking.
You're spot on about the external impact aspects of this. While scientist is best suited for pure functions or things without an external impact, if you follow it to the extreme you start looking at using mock objects in production for your experiments so that they use a FakeEmailSender, for example.
Either that or Scientist forces you to structure your application in such a way that it has a lot of pure functions that can be tested. In the E-Mail example, you could structure your app in such a way that it builds an E-Mail object in one method and sends it in another, then you use Scientist to test the building of that object.
I suppose that expanding Scientist's scope forces you to follow the Single Responsibility Principle as well as encouraging Dependency Inversion.
I may write more on advanced Scientist use cases in the future.
Great! Looking forward to hearing more from you on the topic. Would be interesting to hear about Jest and Snapper also.