DEV Community: Duncan Mackenzie

Handling duplicate events from Stripe in your webhook endpoint

Duncan Mackenzie — Sun, 09 Jun 2024 22:13:27 +0000

In my recent post, detailing how I handle order fulfillment for my Stripe integration, I missed an important part of reacting to Stripe webhooks. The documentation explains that your endpoint may be called multiple times for a single event, but I don’t handle that in my code.

It also points out the ordering of events is not guaranteed, but that doesn’t matter in my case since I’m only handling one event type.

The issue with duplicate events

Every time I receive an event, my original implementation would push a new message into a queue and the order fulfillment would continue. If I receive a duplicate CheckoutSessionCompleted event, it isn’t a terrible problem; a customer might receive multiple emails from me with their photo. Each one would be a slightly different link though and, in addition to looking sloppy, it could cause them to worry they were charged multiple times. In many scenarios, this could be a larger problem; shipping two physical items, or signing them up for multiple digital purchases.

For all of those reasons, I’ve updated my code. Now, when I receive an event, I check a CosmosDB container to see if I’ve handled this event ID before.

(from Functions.cs)

OrderData data = await CreateOrderData();

bool exists =
 await data.checkForExisting("checkoutComplete",
 order.SessionID);

if (!exists)
{
 await queueClient.SendMessageAsync(message);
}
else
{
 log.LogInformation(
 $"Duplicate Event: {order.EventID}");
}

await data.insertLogItem("checkoutComplete",
 order.SessionID, order.EventID, exists);

If so, I skip pushing a message into the incoming order queue and return a 200 OK result. If I haven’t seen this event ID, I go ahead and push the message in and then log (insertLogItem) that I processed this event. I log when I see a duplicate as well, just in case I’m interested in the future.

This required adding a data storage class to my project, where I encapsulated all the initialization of Cosmos DB and handling both the check and insert steps.

(from OrderData.cs)

public async Task insertLogItem(string functionName,
 string sessionID, string eventID, bool duplicate)
{
 LogItem item = new LogItem()
 {
 function = functionName,
 checkoutSessionID = sessionID,
 eventID = eventID,
 duplicate = duplicate
 };

 await this.c_functionLog.CreateItemAsync<LogItem>(
 item, new PartitionKey(functionName));
}


public async Task<Boolean> checkForExisting(
 string functionName,
 string checkoutSessionID)
{
 string query = "SELECT c.id " +
 "FROM c WHERE c.checkoutSessionID =" +
 "@checkoutSessionID AND " +
 "c.function=@functionName";

 QueryDefinition q = new QueryDefinition(query)
 .WithParameter("@checkoutSessionID", checkoutSessionID)
 .WithParameter("@functionName", functionName);

 using (FeedIterator<LogItem> feedIterator
 = this.c_functionLog.GetItemQueryIterator<LogItem>(q))
 {
 if (feedIterator.HasMoreResults)
 {
 FeedResponse<LogItem> response
 = await feedIterator.ReadNextAsync();

 if (response != null && response.Count > 0)
 {
 return true;
 }
 }
 }
 return false;
}

This is not a perfect solution.

My Azure Function could be running on multiple threads (and/or servers), handling multiple requests at once, and therefore potentially posting duplicate messages.

I could avoid this by using a semaphore/locking mechanism, so that checking for an existing log entry, pushing the incoming order message, and adding a new log entry all happened as an isolated transaction. Doing this would reduce my Function’s ability to handle a high load of requests though, and while I don’t expect that to matter in my specific case, it seems like a bad pattern that someone may copy.

Instead, I’m going to go with a ‘belt and suspenders’ model (multiple preventative methods to avoid an issue), by adding another similar check to the second and third functions.

I could also force these functions to process messages one at a time through some configuration options. This would reduce their scalability, as discussed for the first function, but that is less of an issue for these stages in the order fulfillment.

Having written the check and insert as discrete functions makes it easy to repeat the pattern in each of the two other methods.

OrderData data = await CreateOrderData();

bool exists = await data.checkForExisting(
 "processOrder", order.SessionID);

if (exists)
{
 log.LogInformation(
 $"Duplicate Event: {order.SessionID}");
 return;
}

//do all the work

await data.insertLogItem(
 "processOrder", order.SessionID,
 order.EventID, exists);

Note that I had planned this in my head from the start of making these changes, which was why I used the function name as the partition key, and then the appropriate unique identifier as the id.

After adding these checks to the other functions, I will remove them from the initial HTTP handler. My goal with that webhook endpoint is to return as fast as possible, these data calls are quick but they still add time. Adding multiple entries to the incoming Order queue doesn’t have any negative impact, as long as the processOrder function knows to skip duplicates.

Wrapping up

These modifications have the positive side effect of giving me a nice, easy to read, log file of all the function executions.

As you build out your own webhook handlers, refer back to the Stripe webhook documentation for notes on this and other implementation details to be aware of.

Visual Regression Testing using Playwright and GitHub Actions

Duncan Mackenzie — Wed, 22 May 2024 15:14:18 +0000

This last weekend I was visiting my mom up in Canada, and she asked me for the URL to my website because she wanted to see all my recent photos. She brought it up on her laptop, and for the first time in months, I saw my site in ‘light’ mode. And there was a flaw in how some of the text was rendered. I, like many developers, have been only doing the simplest form of testing, which is to view my site in my own primary browser on my own machine. The site switches automatically to light or dark based on the settings of the user’s OS, and in my case that is always dark mode. I quickly checked in a fix, after switching my own OS settings to light mode and then going through and testing a few of the pages. Later, I did more extensive testing in light mode and found some color contrast issues as well.

Hmph. How embarrassing.

I realized I needed a better solution, to catch unintended bugs introduced as I update the styles and layout of my site. I could just promise myself that I would test all my future changes in multiple browsers, in both color schemes, and on a few mobile devices as well, but none of that is easy or even likely to happen.

What I needed was Visual Regression Testing

This form of testing is an automated way to test the rendered appearance of your site (or application) in a variety of situations, against a baseline set of images. After each change you make, new images are generated and if there is a difference between the new images and the baseline, these tests will fail. If you run visual regression testing as part of a CI (continual integration) process, this would block deployment into production. If it turns out the change is intentional (if I changed my site’s CSS for example), then I can generate new baseline images and the next run of the tests will succeed.

I had recently been looking at a cool project that generated screenshots of someone’s personal website every day, so I knew that Playwright, in a GitHub action, could at least get me part of the way to a solution. A few minutes later, I found the docs and exactly the info I needed. The installation steps for Playwright even included creating a basic GitHub action that would run on every push or pull request. It was all so easy, that I had screenshots being generated within 30 minutes of starting to investigate the idea.

Building a solution using Playwright and GitHub actions

Fine tuning everything to get exactly the results I wanted took many test runs and check ins, but I’ll go through each step here and maybe it will save you some time in the future.

First, I wanted to test dark and light mode, as that was the exact issue I ran into, so in the playwright config file (playwright.config.ts), I defined a copy of each browser with the addition of colorScheme: ‘dark’.

 /* Configure projects for major browsers */
 projects: [
 {
 name: 'chromium',
 use: { ...devices['Desktop Chrome'] },
 },
 {
 name: 'chromium dark',
 use: { ...devices['Desktop Chrome'], colorScheme: 'dark' },
 }
 ],

I then made about 10 tests, that covered all sorts of page types, each of which was a simple ‘visit this page and take a screenshot’.

test('Visual Diff Small Album', async ({ page }) => {
 await visualDiff(page, '/albums/fall-trail-walk/');
});

async function visualDiff(page: Page, url: string) {
 await page.goto(url);
 await expect(page).toHaveScreenshot({ fullPage: true });
}

To get these to work, I added a config section to start up Hugo’s built in web server and set up the default base URL to match.

 /* Shared settings for all the projects below.
 See https://playwright.dev/docs/api/class-testoptions. */
 use: {
 /* Base URL to use in actions like `await page.goto('/')`. */
 baseURL: 'http://localhost:1313',
 },
/* Run your local dev server before starting the tests */
 webServer: {
 command: 'npm run start',
 url: 'http://localhost:1313',
 reuseExistingServer: !process.env.CI,
 },

and setup my list of browsers with chromium, firefox, webkit, mobile chrome, and mobile safari (plus dark mode variations of them all). Running them locally (npx playwright test) generated baseline images (stored in a <test file name>-snapshots folder next to the test) and then running it from that point on would compare the new results against the baseline (and fail if they were different).

That was it. I honestly thought I was done at this point, so I did a commit and pushed the results up to GitHub.

Matching the OS of your test runner with your local dev environment

The new GitHub action ran automatically even on the push that added it, and it failed. All of the tests ran on the server, and failed with the error that no baseline images existed. The issue was that, in the GitHub action definition (yml file), it specified that the tests should be run on an ubuntu linux runner (the cloud virtual machine used to run the tests), and when that happened, it was looking for files like <test name>-1-<browser>-linux.png. I generated the baselines by running the tests on my Windows machine, so the baseline images had filenames like <test name>-1-<browser>-win32.png. You could fix this by creating unique filenames per test and passing that into toHaveScreenshot, so that they would match regardless of the OS used, or you could do like I did and change the OS of my GitHub action to windows-latest to have the CI tests run on a machine with the same OS as my local.

jobs:
 test:
 timeout-minutes: 60
 runs-on: windows-latest
 steps:

I pushed that change up, and the newly updated action ran automatically.

And failed again, this time with an actual visual difference instead of just an error message. Each failed test run produces a zip file version of Playwright’s HTML based report, so I could download it and see the issue.

Hiding parts of your page that change all the time

All my pages have a little bit of info below the content, the most recent github commit that led to this specific build of the site. This string, the commit ID, is going to change all the time, so that’s a problem. You can set the % of difference that is acceptable between screenshots, so I could fix this by setting it to be ok with a small % difference, but that would also mean that a small difference in some other part of the site layout would be ignored. Another option, which is provided by Playwright, is to supply a special bit of CSS to hide volatile elements when taking screenshots.

I made this CSS file, which hides the github commit string, and modified my tests to include it.

.gitinfo {
 display: none;
}

async function visualDiff(page: Page, url: string) {
 await page.goto(url);
 await expect(page).toHaveScreenshot({
 fullPage: true, stylePath: "tests/screenshot.css" });
}

Once again, with everything working fine locally, I pushed my changes up and waited confidently for the tests to run. As a side note, each run of these tests, with all the browsers was taking a long time, over 10 minutes. I decided to look at improving that once I had everything working.

Ensuring your time zones match

Well, you can imagine, the tests failed. Not on the about page, but on one of my blog posts. Turns out the image taken on the server was showing June 4th, while my baseline was June 5th. My blog posts have a UTC published date & time, but display it using the browser’s local settings. Depending on the timezone of the machine they are viewed on, you could see a different day.

Turns out there is a fix for that too, I can set the timezone in the Playwright config file.

 /* Shared settings for all the projects below.
 See https://playwright.dev/docs/api/class-testoptions. */
 use: {
 /* Base URL to use in actions like `await page.goto('/')`. */
 baseURL: 'http://localhost:1313',
 timezoneId: "America/Los_Angeles",
 trace: 'on-first-retry',
 },

By setting it to my timezone, Pacific time, the server should match my locally generated baselines. I committed this change and pushed it up. Still oddly optimistic this was going to be successful.

Handling lazy loaded images

It wasn’t. The git commit was hidden, the dates on the page matched, but it turns out the images in my blog posts were rendering inconsistently. There were two issues causing this, one was that loading images can take a moment, so the test needed to wait longer, and the other was that my images are set to loading=lazy, which means without anyone scrolling them into view, they wouldn’t even try to load.

A few Google searches found some similar issues, and I was able to add a quick ‘find all images and scroll to each one’ bit of code. I could also explicitly write code to wait for the images to be ready, but that didn’t seem to be necessary, so I went with just this.

async function visualDiff(page: Page, url: string) {
    await page.goto(url);
    // Trigger loading of all images
    for (const img of await page.locator('//img').all()) {
        await img.scrollIntoViewIfNeeded();
    }
    await expect(page).toHaveScreenshot({
        fullPage: true, timeout: 10000,
        stylePath: "tests/screenshot.css" });
}

Surprisingly, that was the final issue. Everything worked exactly as planned with this last change. The tests were still taking a long time though, and I knew that would be annoying when I wanted to do a content update.

It works, but it could be better

With everything working I started to look at a few issues that occurred to me while I was running these tests over and over again.

Making a minimal set of tests for your CI build

First, the set of tests was taking way too long. This is definitely a judgement call, more tests mean more coverage in terms of catching issues, but when I update content I often want to check my site until I can see the deployment happened without any issues. Right now, the wait is already about 4-5 minutes for the build and deploy, adding 10+ more minutes was not appealing.

To reduce the time, I created a smaller set of tests (minimum.spec.ts) and modified the GitHub action to run just those tests and just against two browsers (chromium and chromium dark). This reduces the time to just a few minutes, but I’m testing fewer pages on fewer browsers.

- name: Run Playwright tests
  run: npx playwright test minimum.spec.ts
      --project 'chromium' --project 'chromium dark'

Don’t bother running these tests if only content has changed

It occurred to me that, for a pure content update (a new blog post or new photo album), there was no need to run these tests at all. They are currently designed to hit pages that don’t change as I publish content and are testing layout/style issues, so if I’m just adding a new post, this is just a waste of time. Luckily GitHub actions has a nice configuration option of ignore-paths that lets me specify a pattern (content/** in my case) that shouldn’t trigger the action. If I make a change to anything outside of that pattern, the tests will still run, even if content is updated at the same time, so it is all good.

on:
  push:
    branches: [ main, master ]
    paths-ignore:
      - 'content/**'
  pull_request:
    branches: [ main, master ]
    paths-ignore:
      - 'content/**'

Ok, so now the length of the test is less of an issue when I’m updating content. This has me wondering if I should go back to a broader set of tests on more browsers. Something to look at in the near future.

Testing interactions

My site is boring, not much JavaScript and few clickable elements beyond links, but I do have a feature on my photo gallery pages that shows or hides a ‘buy’ button on a click. The bug that started this whole journey was with one of those buy buttons, so I need to test them. Fortunately, Playwright is primarily designed for doing automated web testing, so clicking a button and waiting for the page to change is all built-in.

    await page.click("#gallery > div.availableForPurchase");
    await page.waitForSelector("body.showBuyButtons");
    await expect(page).toHaveScreenshot(
        screenshot, { fullPage: true, timeout: 50000,
        stylePath: "tests/screenshot.css"});

I have other features, like image lightboxes, that could be tested if I want to increase the coverage of my tests.

What’s next?

This is a really limited set of tests, only covering a few types of content, and not testing key features like my home page. Right now, I’m doing this limited amount of coverage because I’m hitting my live content. Pages like my Blog post list, my homepage, or tag pages, would be changing all the time and would keep causing the tests to fail. The right way to handle this would be to set up a test configuration, that rendered all the same pages, but with a static set of content. Then I could test every single layout or markdown element, with no worries that content updates would invalidate the results.

Instead of doing nothing until I could do things the ‘right way’ though, the current set of tests is a great start. It was completely free, it took about 3 hours of work start-to-finish, and I’m slightly less likely to unintentionally break my website. I’d call that a win.

A Defensive Approach to Engineering Quality

Duncan Mackenzie — Sat, 11 May 2024 17:00:00 +0000

As an engineering manager, I have a certain amount of control over what ends up on our sites and in our systems. Not complete control, my voice is only one in the room, as it should be. One of the most important ways I can contribute to the quality of our sites is by saying no. Blocking or pushing back on requests that are contrary to our goals can be as important as what we decide to do.

The goal is not to prevent change. Change is great, but randomness ruins plans. If you build a hundred features and ten of them don’t fit into the product vision or the technical architecture, those ten will have a disproportionate impact on the team over the life of the product.

The pressure to do one-off features or changes is continual, but it is also quite normal. The stakeholder asking for this change isn’t a bad person, they just have a different perspective. While they may be able to understand how much effort is involved in their request, they are unlikely to understand the long-term costs in terms of maintainability. Often it is difficult for the engineering team to articulate this cost. For a single change it isn't significant, but all these changes add up over time.

Death by a thousand cuts

One concrete example that has been a constant in my projects, is teams asking for us to add third party scripts. There are a few reasons why this happens, but in my experience, the most common is a request to add a set of tracking code to a # of pages. Someone in the company is running an advertising campaign, and your site is the destination. Everything is nearly done and its days away from going live on Facebook and Twitter. No one looped engineering in before now, of course. An email comes in, often having bounced around, in search of the person who runs the site:

Hey Duncan, I hear you are the guy to ask about getting something onto the site. We are about to kick off a huge ad campaign across fifteen countries to drive folks to the new Azure solutions content. It’s using ads on Facebook and Twitter, so we need to add tracking pixels for both of those to the attached list of ten pages. I have included the zip files of javascript for both sites, if you could get it up on the site by tomorrow that would be great!

Thanks, Sam

Couple of things to unpack here. Sam hasn't done anything wrong, there's no evil intent. They didn't know to loop engineering in, and they don't know if adding two scripts to a set of pages is hard or at all controversial. They have a job to do and this is just one step.

The simple solution

Why not just do it? the business wants it, and our site exists to serve the business, so shouldn't we just make a ticket and get it done?

Let’s follow that path for a bit to see where it gets us (hint, simple changes generally aren't that simple).

To close the work item and hit Sam's deadline, we open the page, follow the instructions from each third-party and put all the script references in. Test it, seems to work, no errors, we see network calls to a bunch of places, done. Ship it, close the ticket.

Hard to push back against what is probably an hour's work, right?

Most big sites aren't a bunch of individual pages though, so you'd probably be editing a common template. To add these scripts only to the requested pages, you'll need a little IF statement (if url =-\_\_ or url =\_\_ ...). Now we are running a bit of logic on all our pages. It's tiny though, still hard to really raise a red flag. Your developer “Spidey-sense” is starting to tingle, but the work is done, Sam is happy.

A few weeks later, another email comes in:

Hey Duncan, thanks so much for helping out with that last request, the campaign is going great. We are trying out something new now, we’ve picked a mix of pages (some from the last campaign and some new ones) and we are adding LinkedIn to the mix. Super excited to see how that compares to the other campaign that is running. Anyway, the new ads start tomorrow and there is a lot of $$ being spent here, so if you could get these new scripts up that would be great. Scripts and a list of pages attached like last time. Thanks so much! Owe you a coffee :)

Sam

A new work item is created, maybe ends up with a new developer. If they saw the code from the last request, they might just extend it with some more if statements, to be consistent. Or maybe they’ll go a different way, it’s hard to predict, especially with a busy team.

One possible path is that these requests keep trickling in over the years, various bits of code are added, and we end up with a block of terrible code running on all our pages. Regardless of how many times this happens though, even doing it once can have unintended consequences.

Months after we add some third-party scripts, we end up looking at this page and we see an issue.

Choose your adventure time, the page is...

Performing poorly, the third-party script must work in any situation, so it loads its own version copy of jQuery and two other scripts you already have on the page.
Failing. Turns out the third-party script was updated and now has a conflict with something else on the page.
Dropping cookies. Most ad tracking scripts are going to drop cookies and the exact behavior could change over time. You carefully thought through privacy and GDPR for your scripts. Did you re-evaluate for every update to some ‘random ad platform’ script?

Every case described above gets worse over time. Two years from now, you do an audit of all the cookies you drop. Will you catch every one-off page that does its own special thing? If the Ad campaign ends after two months, do you remember to remove this code, or does it live on forever?

A better way to handle these requests

Third party script requests are just one example and I am sure you have your own set of examples. I've seen similar requests for one-off redirects, custom variations on styles or headers, etc. The common element is that it is a special-case request, seemingly one-off, and disconnected from any larger vision or roadmap. We know it’s a bad idea to hack this in, despite pressure or temptation to just do it. It is hard to quantify the negatives though, which puts the developer in a difficult spot.

I could tell you to ‘just say no’ to anything like this, and sometimes I do that myself, but that's not always reasonable or even appropriate. First, we should try to understand if this is a genuine business need. Are these ad campaigns important? Do we need to track their results?

Next, we take a step back from the specifics of the request and turn it into the actual business need. Sam asked us to add these scripts, but what they really want is to report on the effectiveness of their ad campaign. The script might be the right approach, but it’s always good to turn any request that has jumped ahead to implementation back into a description of the real goal. Can we accomplish within the existing capabilities of our platform? We have analytics on the site, could we add distinct tracking ids as query strings to the ads and report on them that way?

In the end, if this is needed and there isn’t any existing way to handle it, then we are back to the original request. At this point, we still have options that reduce the negative aspects of adding these scripts. Talk to Sam, or the set of Sams who work on ad campaigns. Are we planning to do a lot of these? Generally going all-in on the same three platforms?

If so, dig into the docs for each platform, so you can determine the best way to implement their scripts. Are the scripts the same for every campaign? Add an option in your site admin to turn “Advertising Scripts” on/off for a set of pages. If that type of self-serve config is too much work, test adding the scripts to every page. I know that sounds bad, but it might be a better solution than some sort of complicated configuration to control where they appear.

Now that we've decided to do this right, it’s a platform capability not a one-off hack. You can test it in depth, include it in your performance and privacy plans, and take it into account when you are making changes.

Handling these requests properly is a lot more work than a hack, in the short term, but it is the only sustainable option. We need to deliver on business requests, but remember that a stable, maintainable, and quality site is also a business priority. Sacrificing the long-term health of your system for a never- ending set of day-to-day requests is not a workable solution.

I understand this can be a hard sell.

You will be asked to “just take a short-cut this one time”. Sometimes you won't have the support needed to win the battle and you will have to do it. Try to minimize the impact of this type of work by

Setting a time to revisit the implementation (to remove or improve it).
Categorize and document "one-off" requests. If you can point at an ongoing pattern, it’s easier to justify doing a proper solution.
One-off solution results in an issue? It seems wrong to say, "I told you so", but it’s more support for your push back on these requests. Write it up and provide some clear data though, don't expect anyone else to remember or to make the connection to the original request.

Policies as an appeal to an external authority

Finally, the best advice I can give you is to create written guidelines/policies. Even if you wrote it, being able to say "we don't add third-party scripts to our pages outside of a defined feature" is a wonderful way to steer the conversation in the right direction.

My personal approach to this specific issue is to say, “We don’t do that”, suggest how they can use our existing capabilities to accomplish at least some of their goals, and if it turns out to be a high enough priority then we move to planning out the proper ‘real’ way to provide a sustainable solution.

If urgency, as in Sam's original request with a day’s notice, forces a short-term solution, we pro-actively plan to resist/remove at a specific date. I don't always get what I want of course, and some things slip through without my knowledge, but the fewer of these that happen the better off my team and system will be.

The Benefits of Peer Feedback

Duncan Mackenzie — Mon, 26 Feb 2024 00:03:11 +0000

It was just performance review season at my current job, so I spent a lot of time reading, writing, and talking about reviews. Around this time, someone asked me if (as a manager) I found peer feedback useful. While my answer is 100% yes, I have some thoughts on why that is, what makes feedback particularly useful, how I like to use it as a manager, and ways in which it can be less helpful.

First, what are we talking about? In my current role, and at my previous one at Microsoft, we go through an annual performance review. As part of that process, the employee writes up a self-review, the manager reads it and writes up their review of the person. With that input, discussions happen across the whole org to come to a consistent view of employee performance and assign the appropriate ratings. Peer feedback, or just any feedback outside of your manager’s, is a core part of the process; employees can request feedback, or another employee can submit feedback unprompted. Either way, that feedback is visible to the manager and to the employee (this might not be the case at every company, feedback might be only visible to the manager for example).

Even if you don’t have a system for this at your company, feedback flows in through less formal channels. After a project, you decide to send a message to someone’s manager to let them know how much you appreciated that employee’s work, or to comment on some issues you ran into with them.

How does this help your manager?

When writing up a review for an employee, I am attempting to explain why they deserve a specific performance rating. I have my opinion, but I need to back it up with some facts.

I might say that Susan is a deep expert in a specific area, and then include several examples of projects that illustrate that expertise. Feedback from another person is a wonderful way to bolster my own comments, adding in a different perspective. It may surprise you, depending on your relationship with your manager, but in the review process they are often seen as being your advocate. You are part of their team; they have a closer relationship with you than the other people in the discussion. This is great, but it also means their comments about you are treated with a bit of skepticism. Having a comment from another employee helps validate the point I’m trying to make.

Make your feedback as specific as possible

This gets to the first piece of specific advice I have for feedback writers; provide specific examples in your feedback, backing up your comments. If you write a comment to me, saying Susan was really helpful, that’s good. If you provide details describing how she was helpful, that’s amazing.

Consider these two pieces of feedback:

“Susan has been a great team member for all of 2023. She worked with me on a number of projects, and I found her to be a wonderful partner! Excited to see her continue to do great on the team in 2024.”

vs.

“Susan and I worked together on multiple projects throughout 2023. Her deep knowledge of React was extremely helpful, as she both delivered her own pieces of work, and guided me through a bunch of really challenging problems in my code. She was also a continual positive influence on the project, keeping us focused on the user, escalating issues, and helping to get them resolved. In one particular case, we realized the design wasn’t going to work in a mobile view, and she worked with the designer to come up with a new plan that would solve the problems without a lot of rework. Without Susan, it would have been exceedingly difficult to finish this project on time.”

The first example could be helpful if I needed to back up a general assertion that Susan is a positive addition to the team, but it doesn’t do much more than that. The second, which is longer and would have taken a bit more effort to write, gives me direct feedback on why Susan is good to work with, and specific examples that I can bring into my review. It also phrases the positive comments around areas that are particularly important in reviewing an employee. Susan is an expert, she helps improve the quality of other employees’ work as well, and her involvement makes projects more likely to succeed. I don’t depend on quotes to tell the story, but I would add in a line or two from this piece of feedback after stating my opinion.

As the feedback writer, put the time in to give details, and try to tie them into what makes a good employee. There is no need to reference specific company principles or job ladder descriptions, just real examples that you think make this employee someone you’d like to see stick around and succeed.

Write feedback when you have the thought, not when it gets close to review time

Related to the above, specific examples are easier to come by when it is fresh in your mind. If you are wrapping up a project with another employee and think to yourself “they really did great on that” … go write them some feedback. Just get some critical assistance on your own work? Send out feedback thanking that person and explaining what they did and how it helped. You won’t do it all the time, but however often you can remember to do this, the better for the employee and the company.

When you celebrate the completion of a project, include everyone who helped make it happen. Even if it isn’t direct feedback, being included in the announcement of a feature ship, or mentioned in the release notes of an update, is another concrete example that the employee and their manager can refer to in reviews.

Feedback from outside the team carries more weight

Just like your manager is seen as an advocate, your teammates are seen as a bit closer to friends than other employees in the company. Feedback from a teammate is valuable, especially as it helps show mentoring or support provided within the team, but feedback from some other team is more impactful. That person over on some other team has little connection with the employee, so their feedback comes across as both more impartial and as an example of a broader impact on the company. It’s the best!

To get this kind of feedback can take some additional effort. For example, if you get some positive feedback on Slack from another team “OMG, you completely saved the day here, thanks so much!” … first, take a screenshot or copy that into your running ‘brag doc’, second reply back privately with something like “No worries, I think it is really important for our team to help with … if you have a chance, I’d love to get some direct feedback through the system for future review cycles”. I know that may seem cringey and many people are reluctant to ask for it, but the person you are asking is an employee too so they know how important this type of feedback can be. They won’t always do it, people get busy, and it can slip their mind. Unless it was a large project, I wouldn’t push or nag them too much (or maybe not at all), but I would hang on to that original bit of feedback to reference.

Set the example in the feedback you provide

For all these points, the best way to encourage it to happen is to start by doing it yourself. Write timely feedback for people who you see doing excellent work, especially outside of your team, and use specific examples to illustrate your points. I have received feedback after a project and had it remind me that I could do that same, perhaps not for that person specifically, but that I had some positive comments in my head about various coworkers that would be good to write down. Avoid doing this as a favor or feeling an obligation to do this for all your teammates. A good rule is, if you don’t have some specific examples of work or actions to reference, you shouldn’t write any feedback at all at this point.

What’s in it for the feedback writer?

I’ve explained how your manager can use these comments and what makes for helpful feedback, but writing this material takes effort and a bit of time. Why would someone bother to do that if they aren’t your friend? I can’t speak for everyone who writes feedback, but I can suggest some good motivations. If you see an employee doing impactful work, it is in your best interest for them to be rewarded for that. The company will benefit (which hopefully is something that helps you out as well), and they’ll stick around to keep doing that same work, even on projects with you! As individual employees across a company, we don’t have a lot of direct control over the broader set of people we have to work with, but this is one way to influence things. Highlight the type of work you personally find valuable, and it will happen more. Following that same logic, if you aren’t particularly impressed with someone’s work, don’t give them glowing feedback just to be nice.

What about negative feedback?

In the past, at Microsoft, feedback was only visible to your manager, and we used to receive a more balanced set of comments. Still mostly positive, because that was the main reason people decided to write feedback, but often with a few “areas for improvement” thrown in. That could be helpful, but in practice that feedback was available to the manager through 1:1s and other channels, and since it was somewhat confidential, you wouldn’t quote it in the employee’s review anyway.

Eventually the feedback system became fully open, where whatever someone wrote about you was visible to both the employee and their management chain. I have no data to support this, but it did seem to result in more positive feedback, and a reluctance to raise any negative feedback beyond vague “keep doing what you are doing” type of comments. I’m fine with this myself, because negative feedback between teammates is difficult, and having it formally submitted can add pressure that results in no comments at all. If you do see an issue, then mention it to their manager. If you are also providing positive feedback when appropriate, this should produce a balanced view. You can also just mention something to the employee, which can be challenging, but if done politely and privately could be helpful for them. Just try to give them the appropriate framing for your feedback “You did great work on that presentation, but I felt you reacted a bit defensively and shut down some of the questions raised in the meeting” is more specific and will be received better than “you kind of messed up in the Q&A part”.

Summing it up

To make this a bit easier to remember and act on, I’ll wrap it up in some quick bullet points.

Feedback helps your manager to validate your excellent work.
Specific examples and details help more than just vague positive comments.
Feedback from others, especially outside of the team, is seen as more valuable than your own comments or your manager’s.
Write feedback when it is fresh in your mind, don’t wait until review time.
Encourage others to write useful, detailed feedback by doing so yourself.

Finish what you started

Duncan Mackenzie — Thu, 19 Oct 2023 14:51:45 +0000

Software development projects can only have an impact if they make it to production, so instead of having forty partially done bits of work, you should always prioritize having something actually done.

This may seem obvious, yet teams often don’t take the right actions to make it happen. If they really wanted to ship a feature, they’d focus on it, instead of having the team working individually on a variety of work. It can seem more efficient to have independent work streams, less co-ordination required, less chance of idle developers, but it can also result in a large set of projects that are all inching towards 90-95% complete, instead of getting something out the door.

Here are some distinct steps we can take to increase the chance of getting our features out into the hands of our users.

Break the work down into the smallest shippable unit

Priorities change, teams get moved around onto different work, so if we want to ship something we should divide it into the smallest useful set of functionalities we can. You could be building a search feature, and have plans to add filters, exportable results, fancy Boolean operators, but ship out that basic text search first. Then add one more bit of functionality, then another. This approach provides a whole host of benefits.

You provide value to your users as quickly as possible.
You start to get feedback and data on the feature faster, which could influence what you build next.
If the team must switch to something else, the feature is live, there’s no long running branch to get out of date with the rest of the code.
You get that nice little bit of positive reinforcement from seeing your work in production.

I had this exact policy on one of my teams. Everyone agreed it was the right approach, so we specified that each project would focus on shipping the smallest usable set of features first. Yet, often plans would come in that went way beyond this for the first release. After giving that feedback to folks a few times, I finally asked someone why they kept trying to pile more functionality into a given release instead of just considering that to be ‘part 2’. Turns out it was fear, fear that we’d ship the first feature, then never get back to this project.

That’s an understandable concern, in fact the ability to switch the team to something else is a benefit of this process. By repeatedly demonstrating that you will go back to valuable features and continue to ship improvements, trust in the process can build and there will be less desire to slip more into each release. There is one possibility to consider, however. The first release of a feature, with the minimum set of functionalities, will end up successful enough that further work isn’t a high priority. That’s a good thing. You avoided building more than we needed.

Always describe your features in terms of the user

You want to build small chunks of usable functionality and ship them, but how do we figure out what’s useful? The best way is to think in terms of the user. What are their goals, what are we providing for them, and would releasing the feature at this state meet those goals? Back to our search example. If the site or app doesn’t have search, and you think it’s necessary, would a basic text search be useful on its own? If so, then that’s the first thing to ship. If, due to the nature of your content, search would be useless without some minimal set of filters, then that becomes the first release.

Be very precise about the definition of the feature at this point, don’t make it “deliver a search experience” or “users can search the site”. That’s not helpful in deciding when you are done. Instead, write out detailed scenarios (user stories if that format appeals to you) such as “a user can type in ‘cat’ and get back entries for any page that has the letters ‘cat’ in the title, ordered by most recent”. When you are testing your work, it is hard to say if you’ve delivered a search experience, but easy to say “yes, searching on ‘cat’ brings back the entries that match”. In practice, I’d suggest you get even more detailed. What does it mean to get back entries, what information should be returned about each one, do you need pagination (not for the first release, just return a max number of items), etc.

Personally, I don’t think you need high-fidelity designs at this point, you are defining the desired result and breaking down the vague concept into something shippable. Once that’s decided, then having someone work up the UX, in parallel to other team members starting to build, can start. If you do have designs created though, they should be focused on exactly what you are planning to ship, not some mythical future state that you hope to get to. Just like the rest of the work, design should be done for the smallest shippable unit and then iterated on.

Work in parallel, but don’t obsess about wasted work

Efficiency is great, but our goal is to ship. If someone starts building the UX for showing search results, and then when the API is finished there are some little tweaks they must make, that can feel like rework or waste. If you follow that logic far enough, you will end up waiting to build any one part until all the dependencies are complete. If you are going to do that, then unless you want people waiting around, you should put only one person on this feature, or have the other team members work on something else while they are waiting. Now we have a 5-person team, building five features instead of focusing on one.

Some multi-tasking is normal, people will finish work at various times, and sometimes tasks will be blocked, but I’d recommend just having people pick up small bugs or issues from the backlog instead of starting into other features.

If you have work happening in parallel, then go ahead and build out the UX based on the expected set of data and then change it if you need to. Build the API and then when it turns out you need to tweak the input parameters, it’s ok if that means updating what you’ve built. Remember, you haven’t shipped yet, you are building this together.

If you want to avoid rework, then talk to each other more, document your plans. Provide a simple API endpoint right away that returns a random set of results, so the UX can be completed while the real API logic is being finished. Going the other way, provide a simple no-style test page that calls the API, so the API person/team can ensure their code is behaving as expected.

Pick the right size of team

There are limits to how many people can work on a single feature before they start just getting in each other’s way. In the search example, I could see 2-3 people easily, with a search form needed, a page of search results, and an API. If the API was complex or involved some new database work, maybe that’s another person. If you have ten people on your team though, you can split up and take on a few features at the same time. In every case though, remember that work means nothing unless it is completed, so if the search feature is almost ready to ship and just needs a bit of help to get it into production, the rest of the team should be happy to assist. Teams can be fluid, people aren’t Tetris blocks, and project sizes will vary.

What about shipping the right features?

Reading this post, you might think I’m advocating for speed more than anything else… ship ship ship. That’s not the case though. Deciding what features to build, and what mix of functionality goes into the first, second, and third releases is still an important part of the product development process. How you turn those decisions into code is my focus, and making sure you realize that ten features 90% done isn’t as useful to the customer as one feature they can use.

Related to this, deciding to abandon a project, or pause it, is another way to ‘finish what you started’. If a feature was a great idea and a top priority when you thought it was going to take a month of work, and it starts to slip into a second or third month, you can change your mind and just stop. This is another benefit of scoping your projects as small as possible, if you decide to stop, you aren’t losing out on as much work.

What about a long-term vision?

I find it extremely useful to think about the future state of a feature, even while advocating for shipping only the smallest possible first step. Having at least a vague idea of the end state, with some of the steps in between being a bit clearer, will help in your development decisions. You might change your vision, but looking ahead could help you choose a better path from the start . In our search example, if you think pagination is a definite ‘fast follow’ (a term for a feature that isn’t required for the initial ship, but you plan to release quickly after), then you could consider that when building the API or the underlying data structure. Sometimes a long-term vision is essential to explain the project to the rest of the management chain, but you should also be clear that you’ll be building it in stages.

Why don’t we do all development this way?

If I’m so convinced that delivering small shippable units is the right way to ship software, then why don’t I always do it? I can personally think of three situations where this happens, even with the best of intentions.

First, sometimes you just make a mistake and think “this is small enough that we can do all the features in the first release” even though it could be broken down. In this case, you need to remind yourself that even if a project doesn’t need to be broken up, there are benefits to coding it this way. Then, if you judged wrong and the full project is taking a long time, you should be able to ship something sooner.

Second, there is a fear that a minimum feature set will be poorly received by our customers. This case is a bit trickier, because it could be true. “We’ve been demanding search for years and this is what they give us?” is not a successful launch. There are a couple of solutions to help in that situation. One is to provide visibility into your plan, show that you have more releases planned (assuming you feel they are actually going to happen), describe this as a beta or preview to help set expectations, or perhaps you need to redefine the ‘minimum shippable unit’ to take into account what your users expect. Even in that last case, I’d still build it in smaller pieces, release it to a subset of users or behind a flag (a mechanism to restrict who sees a new feature, while still having it deployed), so that you are shipping something much quicker.

Finally, and this is one of the most difficult situations to find yourself in, it can be very hard to break up a complete system migration (moving from one tech stack to another for example) into small pieces. It is possible, but it will often end up looking like a longer process and a lot more work to do incremental releases. Experience suggests that, because it’s safer and because code that gets into production is getting validated and tested more often, the longer incremental release process won’t be any slower. It will seem longer when planning though, so it can be difficult to pick that path or convince your organization to take that path.

Where to go from here

If you’d like to do more development in this style, but don’t have the agreement of the rest of the organization, I’d suggest one of two approaches. Either convince them that it is worth trying on a few features, the set of work planned for an upcoming quarter, or the work being done by one specific development team or alternatively go ahead and do it behind the scenes.

The first path is nice, because you can be open about only building the full plan for each small release, and hopefully everyone will agree it is working great. The second path is more complicated. You’ll have to have a larger release planned, with release dates and a committed set of features. You can then implement it in small pieces, building and releasing each phase completely behind a flag, and never shipping anything to the real users until the end. That removes many of the benefits, but it can get your developers used to working in this mode and produces a more stable result because each phase had to be shippable. It also gives you the option, and multiple points in the development process, to ship a limited version of the full project, if you can convince your stakeholders it is valuable at that time.

The importance of visibility to individuals, teams, and companies

Duncan Mackenzie — Mon, 25 Sep 2023 00:10:16 +0000

I’m often asked if I have any advice for other software engineers, and the absolute number one piece of guidance I give is “make sure your work is visible”. Is it the most important thing? No, the most important things are doing an excellent job and being a positive part of your team, but right after those I would say is “making sure people are aware of the work you are doing”. I get quite a bit of push back on this idea, with comments such as:

I prefer to fly under the radar.
I don’t want to brag.
I’ve already done the work, why do I also have to tell people about it?

I think many people who resist trying to increase the visibility of their work misunderstand the reasons why we are doing this, and why it is important.

Does it help you as an individual? Sure! If your manager and your company understand all the work you are doing, it could positively affect your career. That is one of the benefits of making your work visible, but if that isn’t appealing to you or if self-promotion feels wrong, then let’s start with the other reasons.

It helps the manager or company understand how their money is being spent

I know that for salaried employees the link between their time and money is a bit vague, but it is definitely true that your company has paid you a sum of money for your work. Shouldn’t they understand what they received for that money? Even if this was Star Trek and you weren’t being paid, there are a certain number of people who only have so much time in their day, so that finite resource is being ‘spent’ on your work. You aren’t bragging, you are increasing awareness of the work that was done and ideally what impact it produced. This will inform future decisions, determine what kind of work should be done next, what teams and individuals should be encouraged to continue, or perhaps even given more funding. If there is a 5 person team, with only 10% of their work visible, and another team that has 50% of their work visible, it will seem like that second team is delivering a lot more value. Decisions will be made based on that information.

In some cases, the result of making everything visible is to discover that specific work is not useful. This is a good result too, although it may not feel like it when it is your project being discussed, because then you and the rest of the team end up focused on the work that matters.

It ensures you get to spend time on the work that matters

I often hear complaints, sometimes from the same people who push back on the idea of promoting their work, that the company focuses on the wrong things, that flashy features often get more attention than critical infrastructure work, or that “they only see a small bit of the work I do”. All of this is often true, but what are you doing to fix it? If you want time to be dedicated to focusing on performance work or infrastructure, or whatever you think is important, then make sure people know the impact of previous work in those areas. Without visibility into the work being done, it may seem like everything is just doing fine with no work needed. Future plans will continue to ignore these areas, assuming they take no effort and no attention.

It encourages others to focus on impactful work

Some work, like a new UX feature, is ‘self-documenting’ as everyone can see the results, but even then, it is important to explain the impact of the work. If you want the company (or team) to focus on the right things, then make sure they are aware of what that is. By building a habit and a culture around the impact of work, we can avoid a situation where critical work is avoided because it is less visible and less rewarding.

It is easy for a manager to understand a new UX feature on a web site, it is harder to understand a deep restructuring of the request pipeline to improve performance. It is up to you to explain the value of this work, ideally with metrics showing the impact. If you are lucky, maybe your manager will do this work for you, selling the higher ups on your excellent work, but by documenting what you’ve done, why it is important, and how it turned out you are making their job so much easier.

Putting this into practice

Making work visible isn’t just posting a bunch of slack messages congratulating yourself on shipping something, or sending emails that just announce that the work has completed. The medium used is probably specific to your company, but you should craft a story that includes:

The problem you were trying to solve.
What you did.
The impact produced.
What’s next?

You should explain all of this for a broad audience and make the overall message into a nice self-contained package.

Years ago, my boss at the time Jeff Sandquist, responded to an email of mine with some great advice. I had sent him a quick note, on top of a long thread where the dev team was discussing a set of performance improvements. My message was essentially “we did some good stuff”. He asked me to send him a ‘forwardable email’ , which I didn’t understand at all.

After a discussion, it all clicked for me. He wanted a message that explained the work in a way that he could send to other people in the company, and they would understand the value of what we had done. This is harder than what I was used to doing, but way more useful. By wrapping our work in the context necessary to understand it, including #s (and maybe even visuals like screenshots or graphs), it could reach a much broader audience. By increasing people’s understanding of your impact, you get to influence the conversation about what work is important.

This takes work, but it is essential. Back to the idea of “understanding what is being paid for”, we should always be able to explain the value of work. If we deploy a new feature, we have to be able to explain why it is useful and what impact it has had. I would encourage thinking through the message you’ll send, before and during the actual project. If your draft message includes some cool stats, you will know that you need to be tracking that data, and this often means being able to get a baseline before your work goes live. The worst feeling is when you’ve delivered something that you know is super impactful, but you don’t have any data to show it. A major benefit of thinking through this message from the start is that you might realize that you _can’t_explain why this work is important, which could allow you to change your plans before putting in all the effort.

By the way, I’m talking about all of this in terms of software engineering, but that’s just because it is my background. This applies to nearly every form of work I can imagine. In most jobs, you are spending time on various projects, and the short form of this whole post is “make people aware of what you are doing and why it is important”. I’m also focused on this from the point of view of the individual doing the work, but all of the same concepts apply for a manager, making the work of their team visible.

Finally, it really is helpful for you as an individual

You aren’t a stealth bomber, and management isn’t the enemy, so stop flying under the radar. In a perfect world, people would just do the work and be recognized for it, but in reality you have to work to ensure that work is seen and understood. Don’t fake it, and don’t take credit for other people’s work (in fact, working hard to ensure other people’s work is visible is a great idea), we aren’t trying to game the system. We just want everyone to be aware of what is being done. If you see someone who’s work is always getting recognized more than yours, consider what they might be doing to make that happen and learn from it.

Rewards are a message

Duncan Mackenzie — Tue, 19 Sep 2023 00:40:16 +0000

I was a manager at Microsoft for about 18 years and for quite a bit of it I was a ‘manager of managers’, which means I often spent time working with my team on how to properly handle annual rewards. I’m no longer at Microsoft, but I drafted this up back in 2021, so I thought I would try to finish it and get it live.

The TL;DR; of all of those discussions could be distilled down into “Rewards are a way to send a message to your team, what message do you want to send?”.

Small side note… At Microsoft folks often refer to managers as either M1, M2, or M3. I heard these terms for years without having any idea what they meant, until I became a M2 and it made sense. M1 is a manager with ICs (Individual Contributors or non-managers) under them, M2 is a manager of managers, and a M3 is a manager of managers of managers. This can continue up I assume, but M3 is where I was for the past few years, although I am now a M2. This could be a common term at other companies, but for me it was a mystery for a bit.

What are rewards?

Depending on your particular company, employees will be rewarded with some mix of

a raise to their base pay
a bonus
some type of equity (stock grants or stock options)

This is often an annual event, but it could be more frequent. I could write pages on the difference between those three kinds of rewards and why you would increase/decrease one vs. another, but for the purpose of this article I believe we can just lump them all together as ‘rewards’.

There is a limited amount of money for each year’s rewards, so like any budgeting exercise the manager must decide how to distribute funds across their team. How much money there is in total is, like the types of rewards above, not particularly relevant to this discussion. We can go super simple here and imagine a situation where we had ten employees and a total rewards budget of one hundred dollars.

We could give every employee the same reward, ten dollars, or we could give one employee one hundred dollars and the other nine nothing. Assume there are no restrictions. Now, with our problem space defined, we are in the same spot as most managers. How do we determine the rewards for everyone?

Consider the message

As an employee, you might not have much visibility into this process, you just know that last year you received a certain set of rewards and now you are about to find out this year’s. Looking at it from this point of view is critical.

If the amount is lower, you will feel like your manager (or your company) thinks you did a worse job than last year. If it is higher, they must feel you did a better job.

That is it. That is the key to understanding how this year’s rewards will be received. You can make this as complicated as you want, trying to quantify their impact and determine exactly what they did (and you should be able to document and explain their greater or lower impact to the team/company), but in the end you will be sending a message. They will receive that message no matter what other feedback you have delivered over the year. If you tell me I did awesome, but give me lower rewards, then I know you think I did not do as well as last year. If you tell me I had less impact this year and need to improve, but then give me increased rewards, I will be very confused.

As a manager, I suggest you go through your team many times, first assessing their impact (at Microsoft we explicitly liked to look at this as a holistic discussion, did they deliver on their work, did they help their peers or other teams, etc.), and then going back and considering the message you are sending. The two should line up. This is not suggesting that you make sure you are rewarding someone more than last year. Their rewards should be completely determined by the impact they had on the team this year, but if you want to know how these rewards will be received then look to see if it is more, less, or about the same as last year. In my experience, about the same is seen as ’less’ because people expect to see at least some increase year over year. The degree of difference matters as well, a little bit up or down is sending less of a message than a significant increase.

Peanut-buttering

Most new managers immediately say to me ‘but everyone did a great job, can’t I just give them all high rewards?’. Given a fixed budget, what this will do is give them all the middle of the possible reward range. I heard this referred to once as ‘peanut buttering’ and that name has stuck with me for years, spreading the rewards equally across the team. There is a positive to this, treating the team like a real team, that rises and falls together, but only if the rewards budget is tied to the team’s impact. If more money is available when the team has done highly impactful work than when they do just ‘good’ work, everyone will be rewarded more when the team delivers. ‘A rising tide lifts all boats’. If, on the other hand, the rewards available are fixed, then equally distributing the rewards each year just means that everyone gets medium rewards all the time. To go back to the point of this article, what message does that send?

Using our earlier example of 100 dollars for 10 team members, if the 100 dollars is fixed, we give out equal rewards that is 10 dollars for everyone. Next year, the team really pulls together and ships an impressive set of features… and we give them 10 dollars again. Do you think they will be happy? I am honestly not sure, if I understood that my salary and rewards were fixed when I signed up, then I might be ok with this. It does remove rewards as a motivator to deliver more impact though, and while that is not everyone’s only motivation, it is part of it for most people.

Either way though, if you feel everyone did exactly as good a job (adjusting for their seniority, we should not expect the same things out of a year-one team member as the experienced lead), then equal rewards is the right answer. That is not usually the case though, some people have a more positive or negative impact on the team and the results than others, and we should make sure their rewards reflect that. It is also worth noting that at Microsoft, I would not have been allowed to give out equal rewards to the whole team, some differentiation between high and low performers was required.

High rewards for some require cuts to others

This is where it starts to get hard though. Remember that we have a fixed total budget. So, if one person really contributed more to the team, and we want to reward them appropriately, that decreases the amount of money left for everyone else. But what if everyone else also did better than last year, but just not as good as that one person?

You may end up having to give them lower rewards, just to make the budget work out. Now you are not sending the message you want. With companies, such as Microsoft, getting rid of annual raises and/or decreasing bonuses, this is going to happen a lot. Everyone at these tech jobs is still being paid very well, but if there is slight change year over year, they will feel like they are doing poorly. If you factor inflation into this, they are having their compensation reduced, although I do not feel like inflation impacts high earners as much as most people.

Promotions as a proxy for rewarding high impact

Ok, so we want to reward someone for their contributions, but if we give them super high rewards, we will have to give other people less than we feel they deserve. It is a tricky situation, and it can lead some managers to consider a promotion as a form of reward. The employee will feel recognized, they will get some amount of compensation boost, and often promos are part of a different budget than the individual ratings we give out. So, it will work, but is it the right thing to do? Promotions are about impact, but they are also about capability. We promote someone from a junior to a senior, not just because they did some excellent work, but because we feel they are contributing like a senior employee. For me, that means being very self-directed, able to break up and drive larger bodies of work, and most importantly to be able to guide other members of the team to produce better results. If an employee meets all the criteria for a higher level, it is perfectly reasonable to use that promotion as part of their recognition for the year; we must work with the budget we have after all. Ideally, they would also get a high rating for their impact because that would be accurate, but you could shoot a little lower knowing that they will be happy with the promotion. It is not perfect, but it happens.

What you should never do, is promote someone who is not ready for a higher level, just as a replacement for a regular performance reward.

At that point, you have created a problem for them, in that they will be judged in the future against criteria they are not ready to meet, and a problem for your team in that they are seeing someone in a senior role that does not seem to be playing that role. This can happen right at the time someone is hired, the interview process determines they are not ready to be a senior, but meeting their salary expectations requires that level. Personally, I wish salary ranges and titles were decoupled, but that would be a hard system to manage.

What does it all mean?

My conclusion from all of this is not some clear guideline on how to reward employees, but that this type of review process has many fundamental flaws, so the goal is to work within it as best you can. Recognize the right people based on the results they produced, manage people’s levels based on their demonstrated behaviors, and reward everyone as fairly as you can.

While I do think one improvement would be to shift the budget based on the impact delivered by the whole team (therefore raising the pool of rewards for everyone on a high impact team), the danger would be that critical, but low visibility, teams doing operations, long-term maintenance, etc. would be unfairly penalized.

If anyone else has a better plan, you should author a book, and try to get it onto the desk of upper management at all these companies.

Don't depend on Referrer info

Duncan Mackenzie — Sat, 10 Jul 2021 20:25:00 +0000

Over the years, many people have asked me to give them data on, or act based on, the site that a user is coming from when they hit our pages.

Can we find out exactly how many people came to this page from that tweet?
If someone comes to the site after searching Google for X, can we redirect them to ?
Can we show a message to anyone who ends up on the site after a redirect from the old site?

There is a common misconception that we can always determine that information through the referrer (the HTTP_Referer header). In reality, this is often not available, and there is nothing that can be done to change that fact. I’ll suggest some alternative paths at the end of this article, but depending on the situation there might not be any real options.

A quick overview of the HTTP_Referer header

Imagine a user is visiting a random web page out in the world, say http://www.example.com/foo.html, and clicks a link on that page to visit my site at https://www.duncanmackenzie.net. The source of that traffic to my site is useful to you as a site owner, it can help you understand the impact of various marketing activities for example, so a header was defined to allow browsers to pass that source link along. At the most basic level, when a user clicks a link to page B on page A, the request for that new page (B) will contain a header (HTTP_Referer, and yes that is a spelling mistake on the word referrer) telling the new page that the user was coming from page A. One common use of this header, was to look for referrals from Google or another search engine, and then extract the search term from the URL. If I go to Google.com and search for Duncan Mackenzie Microsoft, then the URL of my page of results is:

https://www.google.com/search?q=Duncan+Mackenzie+Microsoft&source=hp&ei=X-rkYJHyOeKc0PEPrO6hsAE&iflsig=AINFCbYAAAAAYOT4b4Cz7MUaxenq5zP_Ktnl9jqHwrhp&oq=Duncan+Mackenzie+Microsoft&gs_lcp=Cgdnd3Mtd2l6EAMyBggAEBYQHjoOCC4QsQMQxwEQowIQkwI6CAgAELEDEIMBOgIIADoFCAAQsQM6CwguELEDEMcBEKMCOggILhDHARCjAjoFCC4QsQM6BQgAEMkDOgUIABCSAzoCCC46CAguEMcBEK8BOggILhCxAxCTAjoICC4QsQMQgwE6BwgAELEDEAo6BQguEJMCOgcILhAKEJMCOgQIABAKOgQILhAKOggIABAWEAoQHlC9E1iiNWCBN2gEcAB4AIABdogBsQ-SAQQyNy4xmAEAoAEBqgEHZ3dzLXdpeg&sclient=gws-wiz&ved=0ahUKEwjRnZrKz8_xAhViDjQIHSx3CBYQ4dUDCAo&uact=5

In the olden days, it was common for analytics software to extract the q=Duncan+Mackenzie+Microsoft bit out of that URL. Now you know what search terms led the user to your site and by the absence of a start=10 query in there, you can also determine the user found this link on the first page of their search results. Great info and very helpful to the site owner, but it is also a gaping security/privacy hole for the user.

Security and Privacy implications of referral information

The privacy issues inherent in a referral URL might not be immediately obvious when we are looking at random queries on Google, but it doesn’t take long to construct a more concerning scenario. Imagine I’m on a forum focused on LBTQIA+ topics, and for whatever reason (based on where I live or even just personal preference) I wouldn’t want people to know that I frequent that particular site. Now, as part of some discussion on that forum, someone suggests a great product over on Amazon.com and provides a link… by clicking that link, I’m giving Amazon some information I probably didn’t intend to share. Should that matter and would Amazon do anything with that info? It honestly doesn’t matter, the key here is: some information, private to you, is being shared without you intending it to be.

The security side of this is a bit harder to find an easy example, but it is not uncommon for sites to put some type of identifying information about you or your session into the URL (I’ve seen ?display_name=Duncan or ?uid=<guid> in URLs on various sites before) and now that information is being sent out to some random untrusted site just because you clicked on a link. Once again, you have no way of knowing if that information on its own is a real security issue, but it is being transmitted and that’s an issue.

The types of issues I’ve described were spotted pretty quickly by security/privacy experts and by the browser teams themselves, and the end result is that (with modern browsers) full referral information is rarely transmitted from one site to another. This is due to something known as the referral policy, and more specifically by the default policy browsers have implemented. I’m not going to go into a full explanation of this topic and how to use it on your pages, my focus in this article is what this means for you if you are interested in looking at referral information.

Going back to our Google search example, one of the links on that search result page was to a post on my blog. If you open up the network tab of your browser dev tools and then click on the link, you can see exactly what Referer header is being passed. In any modern browser, this header will be:

referer: https://www.google.com/

Just the origin is being passed, even though that is not the URL I was on before clicking the link, and this is missing all the useful info like the query terms that brought me to this results page. This is ‘by design’, as Google has set a meta tag on the search result pages of :

<meta content="origin" name="referrer"\>

The value of “origin” for the referrer meta tag (note that the meta tag is correctly spelled, even though the HTTP header is not) tells the browser to only pass along the root URL for the site when navigating. I mentioned this would be the case in any modern browser, and that’s because support for referral policy was only added to browsers starting back in 2016. If I jump over to Internet Explorer and try the same steps, the referer header has a lot more info:

Referer:
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwiq76HJ1s_xAhWIup4KHcInCfUQFjAFegQIFxAD&url=https%3A%2F%2Fwww.duncanmackenzie.net%2Fblog%2Fdocs-overview%2F&usg=AOvVaw3CauyB6Ji_GOgSFFQDqfoC

IE ignores the meta tag telling it to only include the origin, but if you look at that URL you will see it is still missing all the interesting query parameters such as my search terms. This is a special bit of coding done by Google, they know the URL is going to get sent in this case, so when an IE user clicks on a search result, they go through a redirect to clear some of that useful info out of the URL.

Default referral policy

In most browsers, the default policy if the page doesn’t specify one, isstrict-origin-when-cross-origin. This means that if the link is to another origin (so from a page on mycoolwebforum.com to amazon.com, for example) then only the origin of the source site is sent. That is still information and (depending on the site the user is on) it could be sensitive, but it is exposing a lot less than the full URL might. The ‘strict’ part of that policy also covers the specific case where the user is on a secure page (a https link) and navigates to a non-secure page (a http url) in which case no referrer value is passed at all.

Ok, so what does this all mean?

I’d call this the “TL;DR” (too long; didn’t read), except that I put it at the end.

If you are running a site and you want to be the most privacy conscious you can be, add a referrer meta tag value of no-referrer or strict-origin. In the first case, link clicks will never include the source URL, and in the second, links that go outside of the current domain will not include the source. Essentially, doing the second means that you can still use this data to understand user journeys on your site.

As a web developer though, what the main takeaway from all of this is that in most cases you cannot determine the page a user is coming from when they visit your site. You can, in some cases, determine the origin (so you can tell this is traffic from Google, Bing, Twitter, etc.), but not in every case depending on the way the source site is configured.

If you are building a feature that requires you know the source of some incoming traffic, then you should change the incoming URL. Adding a query param of ?traffic_source=<foo.com> would work or you can use campaign IDs (details depend on your analytics, but you’ll have seen examples like WT.mc_id in URLs all over the web) that have the advantage of already being supported by many types of analytics software.

If you have no control over the source page/link, then honestly, you are out of luck. You might pick up some referrer information, from older browsers, but you cannot depend on this to work in most cases.

Do as little work as you can

Duncan Mackenzie — Fri, 07 May 2021 04:04:51 +0000

One of the keys to scalability in software systems is to reduce the amount of work you do per action. In a website, think of this as the amount of processing and data transfer done per request. The smaller that set of work is, the better. This applies to all sorts of systems and at all layers of the stack, but I am going to focus on the website scenario because it is where I spend most of my time.

For every bit of work you do as part of your web application, you should be thinking about how to do it less. You have two main approaches to do this, but you can end up using them both depending on your situation. Caching, where we hold onto the output of some work so that we can reuse it and avoid doing the work again, and pre-processing, where we do work in advance so that we don’t have to do it ‘in real time’.

Our sample site

As I go through each of these concepts, we’ll focus on a fictional site that gives information about vehicles. The site works by pulling the make, model, and year out of the URL and rendering the appropriate information to the user. The site has this set of pages:

A home page (that is a list of vehicle manufacturers, the make),
A page per manufacturer (with a list of models),
A page per model (with a list of years that model was produced), and
A per specific vehicle, such as a 2011 Toyota Rav4

Thinking of this as URLs, which I love to do, we have:

/ (the home page)
/Toyota (and /Mazda, /Audi, /Nissan, etc.)
/Toyota/Rav4 (the model page)
/Toyota/Rav4/2011 (the specific vehicle page)

Let’s assume that behind the scenes, we have a database that has all the vehicle information in it, so the basic logic of the site would be a route for each of the four types of pages, the URL would provide the parameters and then the code would make an appropriate request to the database.

Something like the pseudo-code below

Home page (/)

SELECT DisplayName, UrlSafeName FROM Make ORDER BY DisplayName
Loop through and list out the manufacturers, with links to //

Manufacturer page (/Toyota/)

Pull name from URL
Check DB to see if it is a valid make, if not then 404
Pull list of models from the database: SELECT model.DisplayName, model.UrlSafeName FROM Model INNER JOIN Make ON Model.makeID=Make.ID WHERE Make.UrlSafeName= ORDER BY model.DisplayName
Display list of models, with each one being a link to the model page

And so on for the next two levels.

Caching in all its wonderful forms

In its simplest form, caching is simply hanging onto something so that you can re-use it, but this can happen at many different layers in a web application. We can do client caching, where we add headers to our response telling the client (the user’s browser for example) that the page is valid for the next hour. In that case, if a user visited the homepage of the site, clicked on Toyota, then Rav4, then 2011 the server would end up needing to make as many as ten database queries to return all these results. If the user then came back to the homepage twenty minutes later and clicked on Toyota again, their browser wouldn’t need to make any new requests as it (likely) still has the page cached. On the server, we’ve avoided making those ten extra database queries. Now if they came back again a few days later, the browser knows their local copy is too old and would make new requests resulting in new database requests on the server.

This is great, but it only addresses the idea of a single user visiting multiple pages. If we instead have thousands of users visiting a given page, every one of those requests is generating database requests. What you need to ask yourself as a developer is “are those requests necessary?”. In other words, do we think the results of those queries is changing so fast, that we need to retrieve the data on every request? Assuming, as would be the case in our example, this data is not changing that quickly, we are doing too much work. We have two ways to solve this with caching. We could cache the results from the database query on the server, changing our logic for the home page (as an example) to be more like:

Is my saved copy of the list of manufacturers still fresh enough (retrieved within the last hour, for example)?
If so, just use it, otherwise make a query to the database and then save it in memory.
Display the list on the page.

Caching has let us take our database transactions for the homepage from one call per visitor to one per hour. Assuming thousands of page views per hour, we’ve reduced the load on our database by a massive amount. We’ve also made the load ‘fixed’, in that it doesn’t increase as our traffic does.

There is still work on every request though, we still must check the cache and do our loop to render the list onto the page. These are both simple bits of work, but it is still work, and so our goal should be to get rid of it. Some webservers let us do ‘output caching’, where the server will hang onto the rendered HTML output and if someone request that page again (within a set time window) the server will just return the HTML without having to run our code.

That’s still work though, we’ve just moved it from our code to the web server. Just asking for the page and getting a response puts load on the server and we don’t want that.

This is where a CDN (Content Delivery Network) can come in. We point our domain name to a CDN provider like Akamai and tell it to use our server as the source (called the origin in the CDN world). Using the same type of cache headers we used for client caching, we tell the CDN that our homepage is valid for an hour. The CDN will make a request back to our server whenever its copy expires, but that should only be once an hour, so now we are down to one hit on our server per hour, per unique URL (because each page has different content, so the CDN caches each one independently). You might point out here that each request is still causing work, but it is to the CDN’s servers and those aren’t ours, so we don’t really care.

That’s awesome and about as far as caching is going to get us. We could play with the duration of the cache (should it be one hour or twelve?) but the basic reduction is there. To get to an even better state, we need to look at pre-processing.

Pre-processing

In our caching example, whenever the CDN needed a new copy of any given URL, it would request it from the source, which in turn would result in one or more database queries. It’s a minor amount of work, but it is still work. That once-per-hour request is putting load on our database and will be a tiny bit slower getting back to the user.

Our goal is to get rid of as much work as we can though, so that is where pre-processing would come in. What if, instead of ever hitting the database, we generated the HTML for a page ahead of time, whenever the underlying data changed? If we updated the information for a 2011 Toyota Rav4, we would generate the HTML that /Toyota/Rav4/2011 should return and push that updated content up to the webserver.

Now, when the CDN needs that page, it asks the web server for that page and the server just hands the HTML back with no database call. It’s not zero work, but we are getting close. These days, we use the term ‘static’ to describe a site like this, where you build the content only when needed and never in response to a user request. Combined with a CDN, this approach reduces the overall amount of work happening on every request down to as close to zero as you can get. Instead of running database queries whenever anyone asks for a page, we run it only once to generate the output ahead of time.

I used the example of a vehicle site because back in the early 2000s, the MSN Autos site used this exact technique. They pre-rendered all their vehicle info pages whenever they updated their data. It was a great idea at the time, and it still is!

There are other benefits beyond just performance and scalability with this approach, we’ve also simplified the system that is running in production. If our database is down, or slow, it won’t impact the live site. If the code that generates updated HTML fails, that will prevent updates (which could be important depending on your scenario), but it won’t cause any errors or outages on the site.

Putting it all together

All the various levels of optimization discussed above are not different paths you can take; you can do them all in any combination. Caching at the CDN is more impactful than caching on an individual user’s machine, but the local cache can still improve the performance for that user, so do both. If your page generation code, even though we only run it occasionally, is making the same database queries repeatedly, you should consider caching the results as you go to make the process faster.

I’ve written about this before in other contexts, but if you want a fast site, make it as static as you can, use a CDN, and avoid doing any more work than necessary.

Creating an effective bug report

Duncan Mackenzie — Sun, 25 Apr 2021 21:53:32 +0000

When I’ve had to contact a company’s technical support through a form, I provide a ridiculously detailed description of what the issue is, when it happens, the specific troubleshooting that narrows it down, and (if it is allowed by the form) screenshots. I suspect the people on the other end aren’t used to that, because I know it is not what happens most of the time when people submit issues to me.

It’s fun to point out that a message like “Hey folks, is it normal for the layout on some pages to be messed up on my phone?” isn’t useful, but perhaps it is more helpful to explain precisely why.

What’s the benefit of an effective bug report?

There are two sides to this, one is the efficiency of the interaction between the bug reporter and the (hopeful) bug fixer and the other is the actual likelihood that the issue will ever be fixed. In the first case, it is a bit like the idea of “no hello”… if you don’t provide enough information right from the start, there will need to be an ongoing back-and-forth.

User in a support thread: Hey folks, is it normal for the layout on some pages to be messed up on my phone?

Helpful Dev Team Member: No, that doesn’t seem normal. What page are you seeing this on?

U: Some of the pages in the .NET docs

D: Can you give a specific URL or URLs?

U: Sure, I’m seeing it on < >

D: That page seems fine to me, on my phone and on the couple other devices I’ve checked so far. What do you mean by “messed up” and what phone are you using?

U: Oh, it’s a Samsung, and the text is just not lining up right

… (imagine this continues for 5 or 6 more messages)

Imagine if each of these messages had a few hours or days between them, as people could be checking in on the thread infrequently, and perhaps time zones means that the two people are unlikely to be online at the same time. If, in the original message, the user had tried to supply enough data to reproduce the problem, this could work a lot differently.

U: Hey folks, I was viewing this page (link) on my Samsung Galaxy 9 and it looked messed up. Here is a screenshot and I’ve circled the area where a few of the lines of text are misaligned. Here is a screenshot of another similar page (link) showing the expected alignment. Does anyone know if this is an issue in that page, or the site?

D: Thanks for all the info, I looked at the links and screenshots you sent, and I was able to figure out the problem. A fix is being deployed today after a quick review.

That’s an ideal outcome of course, but even if the problem can’t be fixed that quickly, a clear set of steps to reproduce the problem (often called “repro steps”) is the best way to help move the problem forward. Less time is wasted, for everyone, and the fix could happen much quicker.

Why do we need “repro steps”?

The first part of fixing a problem is to make sure you understand it and can see it in action. If you can’t see the problem happen consistently, you can’t be sure you’ve fixed it. This is not a software development concept; it is true with nearly everything. If someone tells me the faucet is dripping, I’m going to want to see the drip happening before I try to fix it… then when I look and see it not dripping, I can feel confident that I resolved the problem. With a bug report, I always make sure I can ’test’ the bug by reproducing it, because then I know that I can follow those same steps to make sure I’ve fixed it in the end. We are talking about submitting and creating bug reports here, not fixing them, but ideally the developer would create a test that covers the same issue as the bug. That test would fail before a fix is in place and then pass.

As a user, you want your issue to be resolved, and you are in the best position to help with that. You’ve seen the issue in action right in front of you, that’s why you are filing the bug. If you want the most efficient path to solving the problem, figure out a clear set of steps to reproduce the problem and provide those when you submit the issue. Screenshots, especially if you can highlight what you are reporting, are extremely valuable, and for a web site, including a link is critical.

If just going to a page isn’t enough to see the issue, then describe the series of steps that will get you to the right point.

Is it possible to provide too much information?

I’d rather have too much detail than too little, but I also feel that the bug reporter shouldn’t be expected to try to track down every possible variable or to try to fix the problem. I’m guilty of this sometimes, I see an issue, I write up the detailed repro steps, then I fiddle with dev tools or browse though the code and try to produce ideas on what needs to be changed. I assume that most of the time, the actual experts are going to ignore my thoughts and just focus on the repro steps, but I could be adding confusion to the discussion.

For a user, you should consider anything beyond accurate and detailed repro steps to be ’extra’. I’ve had people tell me a lot of information without including the repro steps and that’s frustrating, because I can tell they really want to help and they are willing to put the time in, but their effort was spent on the wrong things. Telling me you found a broken link and were able to repro that on three machines and two browsers is great if you also tell me what link and where you found it.

Human Nature and Incomplete Issues

I mentioned two sides to this, the efficiency of interaction and the likelihood an issue will be fixed. If you assume someone will ask for all the missing info, even if it takes a lot longer, then the only benefit to a complete bug report is efficiency. In reality though, developers are human, and when they see an incomplete issue where it could take many back-and-forth conversations to get the details they need, they will avoid it in favor of one that they can pick up and work on right now with no delay. Your issue, which could be more important than anything else in the queue, could end up being ignored because it is unclear.

A detailed bug report, with clear repro steps, is a win for everyone.

Supporting and managing "Citizen Development"

Duncan Mackenzie — Mon, 29 Mar 2021 00:40:17 +0000

In any organization, users will end up creating their own tools, outside of the official engineering process. This is a good thing, as these ‘citizen developers’ are often closer to the work and are addressing a need that they can see better than upper management, but the lack of any structure can create a lot of issues for the business. I’ve been on both sides of this discussion a few times in my career so I thought I’d write up my thoughts on how this can work.

In the grand tradition of recipe blogs, I’m going to start this article with a story that I find particularly relevant.

The spread of RSS on MSDN

Back in 2005, RSS feeds were all the rage. People used them to get content updates from tiny blogs and massive news sites. MSDN (the Microsoft Developer Network), where I was an author and a member of the development team, had a few “official” feeds for top-level categories like “VB” or “SQL Server”, but there were hundreds of additional feeds for more specific topics like “Windows Server Deployment”, posted on the overview page for a specific sub-topic.

As more teams created these feeds and they became popular, we started to get bug reports about broken RSS. As a dev who has done a lot of coding on RSS and worked on the system that created our top-level feeds, the team would usually assign these bugs to me. As I dug into them, I found out that the authors were handwriting all these RSS files (which are just XML) in Notepad and publishing them manually. Hundreds of people hand editing XML each week to add and remove links was bound to have a high error rate, even with a group of technical writers, and malformed XML was common. I pointed out the errors, shared a link to an RSS feed validator, and assigned the bugs to the authors.

The engineering team had a few discussions about adapting the official system, which was based on product metadata, to support the very granular and custom-made categorizations that these feeds needed. Engineering’s capacity was full of higher priority work for quite a long way out, so this couldn’t happen anytime soon. I was in complete agreement with this decision, as doing this right would involve adding new types of metadata and some sort of feed designer. This would be a huge project in the MSDN system of the time, which was a mix of databases, Microsoft Word based authoring and file publishing through an FTP-like push system. We let the authors know that no solution was coming, and everyone seemed to find that reasonable and the discussion ended there.

Jump forward a few days to the weekend, it was Sunday and one of my kids had fallen asleep on me, “trapping” me on the couch with my laptop. I had an idea that could help these authors and would also make a fun article and code sample to post online. I created a simple Windows Forms app that provided a rich editor experience over RSS files. You opened an existing file, or created a new empty one, then you added/edited/removed items like any line-of-business data editing application.

With very restrictive data entry, the app produced valid RSS and made the weekly update much easier. I sent it out to an author so they could test it, explaining that it was just a replacement for a text editor, and the rest of their workflow would be unchanged. I wrote a quick post about this a few days later, but I didn’t think too much about it, I had helped out some folks and hopefully reduced the # of invalid XML related bugs our users would run into. The author used the app for their weekly updates, loved it and started sharing it around with other teams.

No good deed

After some time, someone in the content world wrote a nice email to my great-grand-boss (the head of MSDN engineering) along with their PM counterpart, praising this tool and how many hours it was saving their team.

This is where things started to go a bit sideways.

The PM lead was confused, why did we release a tool that wasn’t on the roadmap when we all agreed we couldn’t take on this RSS problem right now? I don’t know exactly what happened after that, but within days I was called to a meeting with all the top folks and everyone in between in my management chain. The various leads had decided that building this tool, outside of the official process, was unacceptable. All the authors were sent an email telling them to uninstall the app and to go back to using a text editor. Management explained to me that an unplanned app like this could create future issues, technical debt, etc. I argued about the relative simplicity and negligible risk of the app with no success.

Eventually management raised an issue that they saw as a bigger issue than the app itself. By working on this item , that we all agreed earlier was a low-priority, I was ignoring more important work. I was excited for a moment, here was the root of the issue and it was simply a bit of confusion that I could clear up easily. I explained that I had built this on the weekend, so it wasn’t instead of other work, I wasn’t ignoring the priority list and so there was no need to worry. The engineering and PM leads didn’t think this changed anything, if I was going to code on the weekend, I should be working on the next item in my queue.

I left that meeting a bit unsure of all of this, but not wanting to cause trouble, I took down an article with the RSS tool code and removed the link to download the tool itself. I decided that I would still do my normal technical blogging, but I would carefully avoid work-related topics in the future.

Some days later, my boss emailed me to let me know there had been more discussion about this. The PM lead had gone through my blog and saw that I had been writing sample code, little apps, and articles about coding for years. None of it was related to work, unlike the RSS tool, but it still concerned them. If I had time in my off-hours to code, I should be working on MSDN items. My boss said I was no longer allowed to write about technical topics, but I didn’t have to take down all my old posts.

It shouldn’t matter, but I would like to point out that I was not at all behind in my day job, in fact I was one of the more productive devs on the team. I’m not saying it would make it acceptable, but I would have found this reaction to my writing more understandable if I was having trouble hitting my deadlines at work.

I could have pushed back on this decision, and I’m not sure what would have happened if I had ignored the ’no coding outside of work’ rule, but I wasn’t confident enough to try it. So, I left the team at the first opportunity and moved to Channel 9, where my new boss was quick to reassure me that it was not only ok for me to blog and code in the public space, but it was awesome.

The tables are turned

Flash forward 15 years, I am an engineering manager, and folks are discussing how we should handle ‘unofficial’ tools and development by people outside of the engineering team. My hypocrisy-level was high as I went through an initial set of feelings, including “don’t these people have their own work to do?”. Luckily, I kept most of those reactions to myself.

People create internal tools, especially unofficial ones, because they see a need. They want to make their job easier, make the team more effective, or produce better results. These are all virtuous goals, so we don’t want to stop this innovation. There are genuine issues and concerns with unofficial, unplanned, and unregulated software development though, so what we need is a way forward that enables this type of tool creation while minimizing the negative impact.

A way forward

One of the reasons why it is hard to know how to handle ‘citizen development’ in your organization is that it covers a massive range of projects. No one objects to a useful Excel workbook to help people make budget plans, but what if that workbook grows to contain hundreds of macros and becomes the key to the budget planning system for the entire team? Before I left MSDN, I wrote up a proposal for how to decide if a project needed to go through the official software development process. It wasn’t well received, and I no longer have it around, but I think the idea still has merit. There are some general rules that apply to all tools and utilities, around setting expectations and keeping track of things, but how we handle things beyond that will depend on the tool’s role in the business, the risk it represents, and the impact it will have on the team that owns it.

Determining the level of concern

My criteria when evaluating a tool/system is not some precise metric, but hits these main areas:

What’s the business impact if this tool fails or is otherwise not available?
Any security or privacy concerns.
Sensitivity of data being handled.
The ongoing impact on the team who creates or owns it.

Let’s walk through a few examples to see how this could work, starting with my original RSS tool.

The FeedWriter app

The tool worked only locally, against a file that the user would still have to upload and publish through their regular process, which removes a lot of the concern about security and privacy. An author could have some information in there that was private, but they still had complete control over the actual publishing process. The tool couldn’t update the production site itself, for instance.

If the tool was broken or went away, the users would just go back to using Notepad or another editor, so the business impact would be low. The old process could be slower and more error prone, but it would be returning to a state that was previously determined to be acceptable. There could be an issue, after enough time, where users only know how to do this work using the tool and not manually, making it into a critical piece of the process.

In terms of ongoing impact to me, the creator of the app, it seemed like it would be low, as the scope of it was so constrained that it would be unlikely to have a lot of bugs. This could change though, if I kept adding features over time, which is not unusual.

There was one major issue with this app though, that I wouldn’t be willing to accept in my current role even though it didn’t occur to me at the time. Users installed my RSS tool from a link on my personal website, and it was setup to auto-update. Letting people inside the organization install software from an untrusted source, that updates without any oversight, is a big security risk. What if I left the company and decided to update the app to slip bits of profanity into the generated RSS files? Or someone was able to update the binaries up on my site to deliver a malicious payload to a bunch of internal Microsoft employee machines?

So, while this tool overall is minimal risk, it should at least be hosted internally, and its source stored in a place where our internal processes could see it. Expectations would also need to be carefully set that this was not official, even though someone from the dev team created it.

A TOC maker for a Microsoft Docs repo

On Docs, we have the idea of table-of-contents (TOC) files, that contain a list of all the sub-topics in an area. Keeping this up to date with all the articles in a folder seems like a manual task that folks would appreciate some help with. A script that runs against your local copy of the repo and generates the TOC file? Similar in many regards to the RSS tool, it only works locally and doesn’t publish anything itself, making it a low-risk tool. If it breaks, you can always update the TOC file manually, so the business impact is low, and it seems simple enough that I doubt it would put much of a bug-fixing and maintenance burden on the creator. As above, my only concern would be keeping the source in a trusted location and not sending our authors out to some third-party site to get the script.

A new validation “Check” that runs in GitHub against new pull requests

The Microsoft Docs system is based around Git repositories, so every content update is through a pull request. One terrific way to avoid publishing broken or low-quality content, is through automatic validation, GitHub runs checks as part of the PR process, and if the content fails one or more checks, publishing can be blocked. The engineering team has a few of these checks but imagine if an author decided to make a new one, ensuring that content always had their product name as .NET and not .net or .Net. Great idea as it helps avoid inconsistencies, and the team would love it.

Going through this list though, I would have a few concerns. What happens if this ‘check’ breaks and fails to run or returns incorrect results? Is publishing blocked because the check couldn’t pass? Since the check runs on pull requests, it can access content that isn’t published yet, which could include sensitive information. Are we confident it couldn’t accidentally leak that info? It needs some sort of access token or API key to work with our repository, is the creator storing and managing that authentication key safely (not checking it into the code, giving it only the minimal level of access needed)? If the creator leaves, can we update the app if needed or at least remove it from the repo to avoid blocking publishing?

A few more concerns that the previous two examples, so some oversight is needed, and the potential for blocking people’s work needs to be addressed.

A public API to let customers know about the latest content on Docs

This goes all the way to our original RSS feature on MSDN, people like to be notified about updated or added content. It’s a common request, so what if a helpful person created a public API that scraped the site to determine when discover new items, and then published that API out to the world? The app is only dealing with public information, so no risk there, but our previous examples were tools for other employees to use and this is targeting customers. My level of concern goes up just because of that. If our customers are hitting this API, then it is a big deal if it is throwing errors or just stops updating. It is accessible on the public internet, so it is a security risk.

This would be a production app, and the organization would need to treat it like one.

An app that generates content and publishes it automatically

Similar to the RSS tool, the TOC utility, and the even the API discussed above, what if we decide to have a page listing all the recently updated content on a part of our site? If we built this as a utility that scanned your local files, figured out what was new and spit out a text file, that would be handy and minimal risk. All the regular publishing workflow would apply, an author would look at the list and put it into a new markdown file, and someone would have to approve it before publishing. It’s always tempting to get rid of time-wasting manual steps though, so what if the tool went right ahead and updated the file using a pull request against GitHub? Not bad, still needs approval, so I wouldn’t be too concerned. That pull request though, that could get in way of keeping this file up to date, so why not simplify things by an automatic approval?

Every little step described above raises the level of risk, so even though the goal is the same, the level of concern could be widely different.

Setting the minimum bar

While only a few applications may be of high concern, there are some minimum standards we could set for any tool. If there is at least some negative impact if it stops working, then leaving them completely unmanaged would be a mistake.

As a baseline, I would suggest we track these applications, even if it is through some mutually agreed upon location like a list on a Wiki page or a SharePoint site. In that list, we could describe the app, give links/instructions to access it, and contact information if there is an issue. If there is source code involved, that source code should be in a company owned location, where it would be possible for someone to take it over if the original owner left the company or team. The same goes for hosting, even in the case of a Power BI dashboard or a OneNote file, putting it into the shared team location means that access can be controlled centrally as needed. If there are cloud resources involved, they should be in a team/company subscription, not a personal one. Finally, there should be some level of documentation, even if it just explains what the tool does, where the code is, and how to file an issue or contribute an update. I would not recommend hosting the code, documentation, or resources for these tools alongside the work of the engineering team though. If the source is in the same Azure DevOps instance as the engineering team’s work, there could be an assumption that they are ready and able to support this app if needed.

Messaging and setting expectations

To make any of this work, so that people can do this type of development, and it doesn’t result in chaos or a ton of unplanned issues, it is key to set the proper expectations. What level of support is the app creator promising, if any? If we have a link to “report an issue”, that implies someone is going to react to that issue. If the tool goes down, will someone work on getting it back up immediately, that day, or within a few weeks?

The organization needs to be clear about their position on this type of work. Is it allowed or even encouraged? Is it ok if it takes someone away from their ‘day job’? Do you need to ask first before rolling out a new project or only if it meets certain criteria? If the org does decide to encourage these projects, is there at least a need to document the tool or inform someone that you are building it? If an organization decided to completely forbid this type of work, it will still happen, but just more quietly.

As a tool creator, you need to be clear about what you are building, who it is for, and what your commitment level is. Are you open to feature requests? Do you have tons of time to spend on this or is it just something you built for yourself and people need to take it ‘as is’? Are you open to other people contributing?

Finally, the organization and the creators, need to set the right expectations with all the users. If it’s an unofficial tool, is it ok to use it? Should they be afraid that it is unsupported or that they could get in trouble for using it?

Moving from unofficial to official

When a ‘citizen developed’ app becomes widely adopted and mission critical to the organization, it may ‘graduate’ to an official app. At this point, the business will expect the engineering team to take it on like any other project. This is a common, and almost inevitable situation, and one of the reasons why the engineers may react negatively to these side projects. If you knew that any random app, developed outside of your existing architecture by someone who doesn’t necessarily build production systems, could become your problem then you’d be a bit grumpy about them too. Taking on another person or team’s project is a complicated topic, and deserves its own article at some point, but the short answer is that it should not be done lightly. You should create a plan, shared with all parties, and allocate time to understand the tool and its current state. Next, document all the steps that are needed to bring this system into the engineering team’s world. This includes:

Moving the code and any resources.
Adjust permissions to control who can update and release it.
Create or move a CI/CD workflow.
Create, move, or update the documentation.

The intent of all these steps is to make the tool’s new status clear to everyone, including the original author. Without that clarity, the system will be in a vague in-between state where the engineering team is treating it like a production system, but the original contributors and users are still acting like nothing has changed.

Finally, as early in this ‘graduation’ process as possible, ensure everyone understands how this new project will impact the team’s capacity. No tool is without ongoing cost, even if it was created by someone else, so taking this on will reduce the time the team has to work on other priorities.

You need a test environment

Duncan Mackenzie — Sun, 21 Mar 2021 08:44:00 +0000

Depending on the size of your organization, you may be working in a world with anywhere from one (production) to many (staging, PPE, Dev) different environments, but often none of those are an accurate real-world test environment. For a personal or small-company site, you may only have a production environment, and any change you want to make is being made in production. Many huge companies/teams have complex roll out plans and staging environments but are often missing a true test environment that is a real match for production. Microsoft Docs, for example, has multiple production instances, so we can do safe rolling deployments and we have staging and PPE (pre-production environments, often used for testing new code before it is even ready to be staged). We don’t have a live production environment that matches production closely enough to be used for performance testing though, behind the same CDN, running on the same hardware and without any authentication to corrupt the test results.

Using a secondary environment for performance testing

There are multiple reasons to run more than one environment including functional testing and safe rollout, but to do performance testing requires a much closer match to production than what I often see.

Let’s say you want to make a change to how your JavaScript is loaded. You can test this in your local dev environment, to make sure it seems to work, nothing is broken. You could do some performance testing, but your local environment is nothing like production in performance terms. Even something as small as domain lookup and SSL negotiation could make a seemingly great perf update have no real-world benefit or to do harm. So, you push your changes to the PPE or Dev environment, where it is up on a server that is hopefully close to production. PPE is only available behind your corporate authentication and doesn’t have a CDN in front of it though, so it isn’t really going to show you what will happen in production either. For functional testing, this is probably good, although you could still miss issues that will only happen in the actual configuration of the live site. For performance testing, PPE isn’t much better than testing on your local box.

A/B testing is often used to get some real-world data on a performance change, which is accurate and useful, if the testing framework itself doesn’t create a change in the experience. A client-side experimentation framework, that modifies something about the page after load, is not a valid comparison. An A/B test is still in production as well, so you are assuming that there isn’t any major negative impact to your test.

What you want is to create a complete copy of your environment including the CDN configuration, SSL certificates, DNS entries and more. If you pushed the live build to this environment, it should produce the exact same test results as your production site. You might need to send a bit of fake traffic to it, to get content cached by the CDN as it would be in production, but otherwise this is your test bed to try out any performance work. You are also going to want a rapid deployment cycle, because you will be making many tiny changes, testing, and then changing them again.

Creating a test site for my personal blog

I like to play with performance tuning, and I use my personal site as a place to experiment with a ton of minor changes. Up until this weekend though, I had just been pushing my updates to production, because:

The impact of a mistake was low (I don’t get that much traffic), and
I’m arrogant enough to think my local tests would catch anything major before I deployed.

Neither of these are valid reasons. A personal site is part of your personal brand, a showcase for your skills. If you were a professional editor, and your personal site had some typos, is that giving visitors the right impression? As for my lack of traffic, you never know when someone is going to be looking, and if you only get a small number of visitors, why risk giving one of them a bad experience? Finally, as I suspect most developers know, anyone can miss something and deploy a breaking change that they thought was completely safe.

With all of that in mind, I decided that I needed a proper test version of my site. My blog is a static site, generated using Hugo, hosted in Azure blob storagebehind an Azure CDN profile, so to create a copy I needed to create a set of new resources in Azure and Azure DevOps, including:

a blob storage account, setup to host a static web site
a CDN endpoint, with the blob storage account as the origin
a custom domain(stage.duncanmackenzie.net) on that CDN endpoint
an SSL certificate for that custom domain (automatically created and managed by just flipping a switch on the CDN endpoint)
a CI pipeline that is triggered by a different branch in my blog repo and pushes the built site to the new blob storage account

Including a few delays, and looking up various things on Docs, this was up and running in about an hour. Everything I did could have been scripted/templated though, which would have had two benefits: I could spin up another environment with nearly no work at all, and it would be easier to keep the two instances exactly in sync. I didn’t do it though, because I don’t have any plans to need more of these environments and because I didn’t think of it until about halfway through the steps.

There is one slight difference between the two sites though, I added a bit of conditional logic to the robots.txt template in my blog theme to add a site-wide Disallow line if the site was generated for anything other than www.duncanmackenzie.net. This change makes robots.txt on stage look like this:

User-agent: \*

Disallow: /

The Disallow statement instructs search engines not to crawl the site, which should keep it from polluting my search engine results and competing with the production site. I could go farther and add a noindexflag to every page, but I wanted to keep the changes between the sites to the absolute minimum.

One day later and having this site has been amazingly freeing. Being able to iterate through random thoughts and experiments with no fear of breaking the live site, led to about ten deployments in a couple of hours. I was able to try:

putting critical CSS inline on the homepage (a success!),
moving my tiny bit of JavaScript inline (slower!),
switching to only system fonts (minor benefit, but I didn’t like the look of it), and
then finally rewriting my template to get rid of the need for JavaScript altogether (a 25% improvement in time to Visually Complete).

A different approach using CloudFlare Workers

Matt Hobbs wrote up this great guide to an alternate way to do some quick performance testing on your site, without creating a secondary environment, and I love it. For my blog, a secondary environment was quick and easy to setup, but doing the same for Microsoft Docs (or for a site you don’t own) would be much harder. Following this path, you use workers to manipulate the responses coming from the live production site. You write scripts to apply your changes, and then test the result.

You should definitely check it out, and I may end up using it for Docs, but it is worth noting that the final result is not a true copy of your environment. In my case, I’m using the Azure CDN for example, but this setup would be going through Cloudflare. As a way to test a variety of changes without a lot of infrastructure though, this is a great option.

Do you really need a test environment?

There is a hierarchy of needs in terms of what is necessary for your web project. I believe a full-blown production copy site for testing is useful, but for a production site I would put it lower in priority than a safe deployment process.

Outside of work, I’ve often seen WordPress sites that had no way to test any change they made, including upgrading the entire version of their software. At the time, I don’t think I was as disturbed by this as I am now, but it is a terrible situation to be in. There are two different angles to this: reliability and workflow. If you need to update your homepage, is there anyway to get it running where you can show people and review it? If you need to update a critical plug-in or the version of WordPress, can you do it in a staging site, confirm it works 100% and then swap that with the production site… all with the ability to roll back if something is broken? If you can’t do either of these things, you should fix that before you worry about a mirror site for testing performance updates but keep it on your roadmap!