The good, bad and ugly about Canary functions

The Good

Synthetics monitoring was in preview on 25.11.2019. It was released in 2020.

Synthetic monitoring is a way to test a service availability from user point of view. This means that you could schedule a script with a minimum of 1 minute interval to measure response time in [ms], status code, login to specific service, check video url if they are broken and many more.

In the last project I even plan to use it for DynamoDB, DocumentDB and RDS monitoring. Everything is possible from making simple GET, PUT, POST to crawling a website, testing for broken URLs, UI testing, A/B testing and etc.

In this article i will present a non production example of canary function to test RESTful API.

Let us begin with the REST API. First we will define a async function loadBlueprint, which will load a each url from a list. The page object provides interface to interact with single tab in Chromium headless browser. We will define list of 1 url and we will hard-code the username and password and use login function to get a token for Cognito. After that the token will be used to sign user request to the RESTful API. To do that we add Authorization header with value of “Bearer”+”bearerToken”. The loadUrl function will perform the logic to determine if a service is healthy or not.

1.1. Function example

	const loadBlueprint = async function() {
	const response = login(USERNAME, PASSWORD);
	const bearerToken = (await response).AuthenticationResult.IdToken;
	const urls = ['https://probkotestov.io/api-users/setting'];

	// Set screenshot option
	const takeScreenshot = true;

	/* Disabling default step screen shots taken during Synthetics.executeStep() calls
	* Step will be used to publish metrics on time taken to load dom content but
	* Screenshots will be taken outside the executeStep to allow for page to completely load with domcontentloaded
	* You can change it to load, networkidle0, networkidle2 depending on what works best for you.
	*/
	syntheticsConfiguration.disableStepScreenshots();
	syntheticsConfiguration.setConfig({
	continueOnStepFailure: true,
	includeRequestHeaders: true, // Enable if headers should be displayed in HAR
	includeResponseHeaders: true, // Enable if headers should be displayed in HAR
	restrictedHeaders: [], // Value of these headers will be redacted from logs and reports
	restrictedUrlParameters: [], // Values of these url parameters will be redacted from logs and reports

	});

	let page = await synthetics.getPage();
	page.setExtraHTTPHeaders({
	'accept': 'application/json',
	'Authorization': 'Bearer ' + bearerToken,
	});
	for (const url of urls) {
	await loadUrl(page, url, takeScreenshot);
	}
	};

The login function is simple. The COGNITO client id is specified and username and password were hard-coded. For production I would suggest, that you use Secrets Manager. Canaries come with version of AWS SDK and there is no problem to obtain a secret, if you add the GetSecretValue permission to the canary role. A KMS permissions and a key resource based policy may also be required, as best practices require the encryption of secrets.

	async function login(email, password) {
	try {
	const cognito = new AWS.CognitoIdentityServiceProvider();
	return await cognito.initiateAuth({
	AuthFlow: 'USER_PASSWORD_AUTH',
	ClientId: 'XXXXXXXXXXXXXXXXXXX',
	AuthParameters: {
	USERNAME: email,
	PASSWORD: password,
	},
	}).promise();
	} catch (err) {
	throw err;
	}
	}

view raw loginFunction.ts hosted with ❤ by GitHub

We can execute multiple steps like making GET, POST and PUT request to guarantee the service availability. By doing this you can also test the the availability of services, that the main service depends on, like Database(Enabling services). Below is the loadUrl function, which will perform signed GET request against the URL, it will return, when the dom content was loaded and it will take a screenshot. (It is not necessary for REST API, but it is a nice feature). It will timeout after 30 seconds, which is adjustable.

	const loadUrl = async function(page, url, takeScreenshot) {
	let stepName = null;
	let domcontentloaded = false;

	try {
	stepName = new URL(url).hostname;
	} catch (error) {
	const errorString = `Error parsing url: ${url}. ${error}`;
	log.error(errorString);
	/* If we fail to parse the URL, don't emit a metric with a stepName based on it.
	It may not be a legal CloudWatch metric dimension name and we may not have an alarms
	setup on the malformed URL stepName. Instead, fail this step which will
	show up in the logs and will fail the overall canary and alarm on the overall canary
	success rate.
	*/
	throw error;
	}

	await synthetics.executeStep(stepName, async function() {
	const sanitizedUrl = syntheticsLogHelper.getSanitizedUrl(url);

	/* You can customize the wait condition here. For instance, using 'networkidle2' or 'networkidle0' to load page completely.
	networkidle0: Navigation is successful when the page has had no network requests for half a second. This might never happen if page is constantly loading multiple resources.
	networkidle2: Navigation is successful when the page has no more then 2 network requests for half a second.
	domcontentloaded: It's fired as soon as the page DOM has been loaded, without waiting for resources to finish loading. Can be used and then add explicit await page.waitFor(timeInMs)
	*/
	const response = await page.goto(url, { waitUntil: ['domcontentloaded'], timeout: 30000 });
	if (response) {
	domcontentloaded = true;
	const status = response.status();
	const statusText = response.statusText();

	logResponseString = `Response from url: ${sanitizedUrl} Status: ${status} Status Text: ${statusText}`;

	//If the response status code is not a 2xx success code
	if (response.status() < 200 \|\| response.status() > 299) {
	throw `Failed to load url: ${sanitizedUrl} ${response.status()} ${response.statusText()}`;
	}
	} else {
	const logNoResponseString = `No response returned for url: ${sanitizedUrl}`;
	log.error(logNoResponseString);
	throw new Error(logNoResponseString);
	}
	});

	// Wait for 15 seconds to let page load fully before taking screenshot.
	if (domcontentloaded && takeScreenshot) {
	await page.waitFor(15000);
	await synthetics.takeScreenshot(stepName, 'loaded');
	await resetPage(page);
	}
	};

view raw loadUrl.ts hosted with ❤ by GitHub

2. The bad and ugly — COSTS:

2.1. Assumption: We will deploy one canary with 1 minute execution interval:

NumberOfExecutions [60 canary runs per hour, hour of day, days in month]= 60*24*30 = 43200

PricingOneRunFrankfurt = $ 0.0016

Alarm pricing = $ 0.1 per alarm

The total costs for one canary is 69.22. I would suggest to be careful with canaries, because the cost could go up with each canary.

3. How it looks like in the console:

The whole code is available under.

	const { URL } = require('url');
	const synthetics = require('Synthetics');
	const log = require('SyntheticsLogger');
	const syntheticsConfiguration = synthetics.getConfiguration();
	const syntheticsLogHelper = require('SyntheticsLogHelper');
	const AWS = require('aws-sdk');
	AWS.config.update({
	region: 'eu-central-1',
	});
	//Username and password should be passed by using Secrets Manager get secret value api call.
	//Don't use this approach in production
	const USERNAME = 'martin.nanchev@example.com';
	const PASSWORD = 'probkoTesto123!';

	// Performs the get request for each url
	// The request is signed/authenticated using the bearer token
	const loadBlueprint = async function() {
	const response = login(USERNAME, PASSWORD);
	const bearerToken = (await response).AuthenticationResult.IdToken;
	const urls = ['https://probkotestov.io/api-users/setting'];

	// Set screenshot option
	const takeScreenshot = true;

	/* Disabling default step screen shots taken during Synthetics.executeStep() calls
	* Step will be used to publish metrics on time taken to load dom content but
	* Screenshots will be taken outside the executeStep to allow for page to completely load with domcontentloaded
	* You can change it to load, networkidle0, networkidle2 depending on what works best for you.
	*/
	syntheticsConfiguration.disableStepScreenshots();
	syntheticsConfiguration.setConfig({
	continueOnStepFailure: true,
	includeRequestHeaders: true, // Enable if headers should be displayed in HAR
	includeResponseHeaders: true, // Enable if headers should be displayed in HAR
	restrictedHeaders: [], // Value of these headers will be redacted from logs and reports
	restrictedUrlParameters: [], // Values of these url parameters will be redacted from logs and reports

	});
	// Start chrome headless browser tab
	// Set of the required headers
	let page = await synthetics.getPage();
	page.setExtraHTTPHeaders({
	'accept': 'application/json',
	'Authorization': 'Bearer ' + bearerToken,
	});
	// Load each url
	for (const url of urls) {
	await loadUrl(page, url, takeScreenshot);
	}
	};

	// Reset the page in-between
	const resetPage = async function(page) {
	try {
	await page.goto('about:blank', { waitUntil: ['load', 'networkidle0'], timeout: 30000 });
	} catch (ex) {
	synthetics.addExecutionError('Unable to open a blank page ', ex);
	}
	};
	// For each url it waits until domcontent is returned and takes screenshot
	// If it fails you will get the error
	const loadUrl = async function(page, url, takeScreenshot) {
	let stepName = null;
	let domcontentloaded = false;

	try {
	stepName = new URL(url).hostname;
	} catch (error) {
	const errorString = `Error parsing url: ${url}. ${error}`;
	log.error(errorString);
	/* If we fail to parse the URL, don't emit a metric with a stepName based on it.
	It may not be a legal CloudWatch metric dimension name and we may not have an alarms
	setup on the malformed URL stepName. Instead, fail this step which will
	show up in the logs and will fail the overall canary and alarm on the overall canary
	success rate.
	*/
	throw error;
	}

	await synthetics.executeStep(stepName, async function() {
	const sanitizedUrl = syntheticsLogHelper.getSanitizedUrl(url);

	/* You can customize the wait condition here. For instance, using 'networkidle2' or 'networkidle0' to load page completely.
	networkidle0: Navigation is successful when the page has had no network requests for half a second. This might never happen if page is constantly loading multiple resources.
	networkidle2: Navigation is successful when the page has no more then 2 network requests for half a second.
	domcontentloaded: It's fired as soon as the page DOM has been loaded, without waiting for resources to finish loading. Can be used and then add explicit await page.waitFor(timeInMs)
	*/
	const response = await page.goto(url, { waitUntil: ['domcontentloaded'], timeout: 30000 });
	if (response) {
	domcontentloaded = true;
	const status = response.status();
	const statusText = response.statusText();

	logResponseString = `Response from url: ${sanitizedUrl} Status: ${status} Status Text: ${statusText}`;

	//If the response status code is not a 2xx success code
	if (response.status() < 200 \|\| response.status() > 299) {
	throw `Failed to load url: ${sanitizedUrl} ${response.status()} ${response.statusText()}`;
	}
	} else {
	const logNoResponseString = `No response returned for url: ${sanitizedUrl}`;
	log.error(logNoResponseString);
	throw new Error(logNoResponseString);
	}
	});

	// Wait for 15 seconds to let page load fully before taking screenshot.
	if (domcontentloaded && takeScreenshot) {
	await page.waitFor(15000);
	await synthetics.takeScreenshot(stepName, 'loaded');
	await resetPage(page);
	}
	};

	exports.handler = async () => {
	return await loadBlueprint();
	};

	// Get Jwt token
	async function login(email, password) {
	try {
	const cognito = new AWS.CognitoIdentityServiceProvider();
	return await cognito.initiateAuth({
	AuthFlow: 'USER_PASSWORD_AUTH',
	ClientId: 'XXXXXXXXXXXXXXXXXX',
	AuthParameters: {
	USERNAME: email,
	PASSWORD: password,
	},
	}).promise();
	} catch (err) {
	throw err;
	}
	}

view raw restAPICanary.js hosted with ❤ by GitHub

**Summary: **AWS canaries allow you to perform test and to find issues with a system, before your user. It is faster than Cloudwatch and it gives you what is the impact. The problem with Cloudwatch monitoring alone is, that you don’t know what is the impact, when you receive DiskQueueDepth alarm and you don’t know how this spike is affecting the end users. The canaries allow you better visibility of the application.

Be careful with the number of canaries and the interval between the runs.

IMPORTANT NOTES: The canary require a role with permissions to access AWS service. In production I deployed them in VPC with enabled DNS resolution and support, which allows me to determine the health of private endpoints.

DEV Community

The good, bad and ugly about Canary functions

Top comments (0)

Read next

TinaCMS: A Headless CMS with Git Version Control

"Revolutionizing Quantum Error Correction: Meet Micro Blossom's Speedy Decoding!"

Arbitrum vs Ethereum: A Comparative Analysis

Study Shows AI Chatbots Become More Vulnerable to Fraud After Multiple Deceptive Attempts