DEV Community: Yogesh Manware

REST API Testing With CucumberJs (BDD)

Yogesh Manware — Wed, 16 Sep 2020 04:48:32 +0000

Introduction

BDD is very powerful tool for both non-technical and technical people.

In this article, I will demonstrate how to set up and run Cucumber, to test REST APIs.

What is BDD really?

BDD is a short for Behaviour Driven Development
BDD is a way for software teams to work that closes the gap between business and technical people by:

Encouraging collaboration across roles to build shared understanding of the problem to be solved
Working in rapid, small iterations to increase feedback and the flow of value
Producing system documentation that is automatically checked against the system’s behaviour

We do this by focusing collaborative work around concrete, real-world examples that illustrate how we want the system to behave. We use those examples to guide us from concept through to implementation.

What is Cucumber?

Cucumber is a tool that supports Behaviour-Drive Development(BDD). Cucumber reads executable specifications written in plain text and validates that the software does what those specifications say. The specifications consists of multiple examples, or scenarios. For example:

Scenario Outline: create a contact
    Given A contact <request>
    When I send POST request to /directory
    Then I get response code 201

(This scenario is written using Gherkin Grammar)
Each scenario is a list of steps for Cucumber to work through. Cucumber verifies that the software conforms with the specification and generates a report indicating ✅ success or ❌ failure for each scenario.

What is Gherkin?

Gherkin is a set of grammar rules that makes plain text structured enough for Cucumber to understand. Gherkin documents are stored in .feature text files and are typically versioned in source control alongside the software.

How Gherkin's .feature file glues to your code?

We write step definitions for each step from Gherkin's feature file. Step definitions connect Gherkin steps to programming code. A step definition carries out the action that should be performed by the step. So step definitions hard-wire the specification to the implementation.

Feature

A feature is a group of related scenarios. As such, it will test many related things in your application. Ideally the features in the Gherkin files will closely map on to the Features in the application — hence the name
Scenarios are then comprised of steps, which are ordered in a specific manner:

Given – These steps are used to set up the initial state before you do your test
When – These steps are the actual test that is to be executed
Then – These steps are used to assert on the outcome of the test

Example

I have created a simple REST API to manage a directory. I can create contact, modify it, read it and delete a contact. I have written BDD tests to make sure all features work as designed.

Setup NodeJs Project

npm init

Install Following Dependencies

 "dependencies": {
    "axios": "^0.20.0",
  },
  "devDependencies": {
    "cucumber": "^6.0.5",
    "cucumber-html-reporter": "^5.2.0"
  }

Create directory.feature file at src/features

@directory-service
Feature: Directory Service
  In order to manage directory
  As a developer
  I want to make sure CRUD operations through REST API works fine

  Scenario Outline: create a contact
    Given A contact <request>
    When I send POST request to /directory
    Then I get response code 201

    Examples:
      | request                                                                                          
      | {"id":99,"name":"Dwayne Klocko","email":"Rene30@hotmail.com","phoneNumber":"1-876-420-9890"}          |
      | {"id":7,"name":"Ian Weimann DVM","email":"Euna_Bergstrom@hotmail.com","phoneNumber":"(297) 962-1879"} |

  Scenario Outline: modify contact
    Given The contact with <id> exist
    When I send PATCH request with a <secondaryPhoneNumber> to /directory
    Then I get response code 200

    Examples:
      | id | secondaryPhoneNumber                       |
      | 99 | {"secondaryPhoneNumber": "(914) 249-3519"} |
      | 7  | {"secondaryPhoneNumber": "788.323.7782"}   |

  Scenario Outline: get contact
    Given The contact with <id> exist
    When I send GET request to /directory
    Then I receive <response>

    Examples:
      | id | response                                      |
      | 99 | {"id":99,"name":"Dwayne Klocko","email":"Rene30@hotmail.com","phoneNumber":"1-876-420-9890","secondaryPhoneNumber": "(914) 249-3519"}         |
      | 7  | {"id":7,"name":"Ian Weimann DVM","email":"Euna_Bergstrom@hotmail.com","phoneNumber":"(297) 962-1879", "secondaryPhoneNumber": "788.323.7782"} |

  Scenario Outline: delete contact
    Given The contact with <id> exist
    When I send DELETE request to /directory
    Then I get response code 200

    Examples:
      | id |
      | 99 |
      | 7  |

Create directory.js in src/steps

const {Given, When, Then, AfterAll, After} = require('cucumber');
const assert = require('assert').strict
const restHelper = require('./../util/restHelper');

Given('A contact {}', function (request) {
    this.context['request'] = JSON.parse(request);
});

When('I send POST request to {}', async function (path) {
    this.context['response'] = await restHelper.postData(`${process.env.SERVICE_URL}${path}`, this.context['request']);
})

Then('I get response code {int}', async function (code) {
    assert.equal(this.context['response'].status, code);
});

When('I send PATCH request with a {} to {}', async function (phoneNumberPayload, path) {
    const response = await restHelper.patchData(`${process.env.SERVICE_URL}${path}/${this.context['id']}`, JSON.parse(phoneNumberPayload));
    this.context['response'] = response;
})

Given('The contact with {int} exist', async function (id) {
    this.context['id'] = id;
})

When('I send GET request to {}', async function (path) {
    const response = await restHelper.getData(`${process.env.SERVICE_URL}${path}/${this.context['id']}`);
    this.context['response'] = response;
})

Then(/^I receive (.*)$/, async function (expectedResponse) {
    assert.deepEqual(this.context['response'].data, JSON.parse(expectedResponse));
})

When('I send DELETE request to {}', async function (path) {
    const response = await restHelper.deleteData(`${process.env.SERVICE_URL}${path}/${this.context['id']}`);
    this.context['response'] = response;
})

Create a service that does actual REST calls

You can use any http client, I used axios.

To run the test and generate report

npm i
"./node_modules/.bin/cucumber-js -f json:cucumber.json src/features/ -r src/steps/ --tags '@directory-service'"

In this command, parallel is used to run three scenarios concurrently.

That's all. I mean that is the gist of BDD with Cucumber and Gherkin.

Here is a sample cucumber report.

Sharing Data Between Steps

You would most likely need to share data between steps. Cucumber provides an isolated context for each scenario, exposed to the hooks and steps as this, known as World. The default world constructor is:

function World({ attach, log, parameters }) {
  this.attach = attach
  this.log = log
  this.parameters = parameters
}

Note: you must not use anonymous functions in steps if you want to use World in steps.

const {setWorldConstructor} = require("cucumber");

if (!process.env.DIRECTORY_SERVICE_URL) {
    require('dotenv-flow').config();
}

class CustomWorld {
    constructor({parameters}) {
        this.context = {};
    }
}
setWorldConstructor(CustomWorld);

Following are some handy libraries that I used during this demo.

.env file

I have used dotenv-flow npm to store environment specific variables.
Refer: https://github.com/kerimdzhanov/dotenv-flow

Setup Mock REST API

I have setup mock REST API using json server npm.
Refer: https://github.com/typicode/json-server

For Cucumberjs - https://github.com/cucumber/cucumber-js

Source Code - https://github.com/ynmanware/nodejs-bdd/tree/v1.0

In summary, BDD sets up ground for collaboration from all stakeholders. Using tags, you can run different set of BDD suits for DEV, SIT, UAT and even PROD through build pipelines. This setup could be really effective with CI/CD practice, it could speed up development and deployment cycle while maintaining the basic quality checks in place.

setImmediate() vs setTimeout() vs process.nextTick()

Yogesh Manware — Sat, 04 Jul 2020 13:44:20 +0000

NodeJS provides three ways to call asynchronous functions

setImmediate()
setTimeout()
process.nextTick()

I am writing this blog to explain the basic and advanced usage of these functions.

setImmediate()

Use setImmediate() when you want to execute some function asynchronously, but as soon as possible and after finishing the current block.

When you run following code, callback function passed to setImmediate() is executed immediately after the last line in this code



setImmediate(() => {
        console.info('2. Execution of Callback Function');
    });
    console.info('1. Execution of Main Module Ends');

Console

Execution of Main Module Ends

Execution of Callback Function

setTimeout()

Use setTimeout() when you want to execute some function asynchronously, after a specified delay and after finishing the current block.

When you execute this code, callback function passed to setImmediate() is invoked immediately after the last line in this code and after the specified delay, a timeout callback function.

There is one important point though - It is not guaranteed that the callback to setTimeout function is invoked exactly after the specified delay. The reason is explained later on this page.



   setTimeout(() => {
        console.info('2. Execution of Timeout Callback Function');
    }, 10);
    console.info('1. Execution of Main Module Ends');

Console

Execution of Main Module Ends

Execution of Timeout Callback Function

So far so good. Above information is enough for basic usage of these functions.

Let's dive deep into NodeJS Eventloop to know how these functions are different from each other and from process.nextTick().

Phases Overview (from NodeJS documentation)

1. Timers

In this phase, all timers and intervals are registered as well as tracked. It holds the stack of timers and goes through all active timers one by one. As soon as the timer expires, the callback function is added to the stack that is executed in Poll phase.

This is the reason callback is not executed immediately.

2. Pending Callbacks

Executes I/O callbacks deferred to the next loop iteration.

3. Idle, Prepare

only used internally.

4. Poll

Most of the execution is done in this phase. This is where the javascript code you have written in your file executes.
Node will go through the stack and execute all functions synchronously from oldest to the newest until the queue is empty.

It also retrieves new I/O events; executes I/O related callbacks (almost all with the exception of close callbacks, the ones scheduled by timers, and setImmediate()); node will block here when appropriate.

5. Check

setImmediate() callbacks are invoked here.

6. Close Callbacks

some close callbacks, e.g. socket.on('close', ...)

Note that each phase has its own queue that gets executed before Node moves on the next phase. One iteration or cycle of of this loop is known as 'tick'

Now let's switch back to our main topic.

setImmediate() vs setTimeout()

setImmediate() and setTimeout() are similar, but behave in different ways depending on when they are called.

setImmediate() is designed to execute a script once the current Poll phase completes. Execution of this callback takes place in Check phase (5).

setTimeout() schedules a callback function to be run after a minimum threshold in ms has elapsed. The expiry of timer is checked in Timer phase (1) and execution of callback happens in Poll phase (4).

process.nextTick()

As per NodeJs documentation, process.nextTick() is not technically part of the event loop. Instead, the nextTickQueue will be processed after the current operation is completed, regardless of the current phase of the event loop

process.nextTick() vs setImmediate()

We have two calls that are similar as far as users are concerned, but their names are confusing.

process.nextTick() fires immediately on the same phase
setImmediate() fires on the following iteration or 'tick' of the event loop
In essence, the names should be swapped. process.nextTick() fires more immediately than setImmediate(), but this is an artifact of the past which is unlikely to change.

From NodeJS documentation: -> We recommend developers use setImmediate() in all cases because it's easier to reason about.

Here is an example putting together all functions



    setTimeout(() => {
        console.info('4. Execution of Timeout Callback Function'); 
    }, 10);
    setImmediate(() => {
        console.info('3. Execution of Immediate Callback Function'); 
    });
    process.nextTick(() => {
        console.info('2. Execution of NextTick Callback Function');
    })
    console.info('1. Execution of Main Module Ends');

Console

Execution of Main Module Ends

Execution of NextTick Callback Function

Execution of Immediate Callback Function

Execution of Timeout Callback Function

Refer NodeJS documentation for more information: https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/

AWS Elasticsearch - Reindexing With Zero Downtime Programmatically

Yogesh Manware — Thu, 02 Jul 2020 20:47:54 +0000

Technology is changing faster than ever, there could be few more variations to do certain things or will evolve in future. Following is my opinion and others may disagree. So, take it with a grain of salt.

Scenario

Elasticsearch (ES) is used to store extremely high volume of data for a limited duration. In a greenfield project, there are generally quite a few moving parts and relentless requirement changes. Changing ES schema or field mapping is one of those. Elasticsearch allows adding new fields but it does not allow changing the data type or renaming fields etc without reindexing it. When the data is huge, reindexing would take some time (in minutes at times) and hence cause some downtime. Downtime is not acceptable for highly available applications, specially from the read aspect.

Using index alias, reindexing can happen within a millisecond.

High Level Design

It is required that Data Retriever is always up/running and returns consistent data for the given index at any point of time.

Initial Setup

Create two aliases on the day one

write_order_agg pointing to order_agg_v1
read_order_agg pointing to order_agg_v1

The key is both Data Processor and Data Retriever do not know the real index, what they have is Alias to the Index.

Here are the steps for reindexing

Stop Data Processor
- This is an optional step, required if the processing logic changes
Create new index with new mapping - order_agg_v2
Update write_order_agg alias to point it to this index and remove link to order_agg_v1
Deploy and Start updated Data Processor (opitional)
Copy (reindex) documents from order_agg_v1 to order_agg_v2 and remove link to order_agg_v1
Update read_order_agg alias to point to order_agg_v2
Delete order_agg_v1 (it is recommended to execute this step manually after making sure all is good with the new index)

Following are few code snippets that can be used to automate above steps using Elasticsearch Client (Javascript)

Create Client

const esClient = new Client({
                   node: esHost,
                 });

Create New Index With Mapping

esClient.indices.create({index: indexName, body: mapping, include_type_name: true});

Add and Remove Alias at the same time

esClient.indices.updateAliases({body: actions})

where actions is
 const actions = {
            actions: [{
                remove: {
                    index: 'order_agg_v1',
                    alias: 'write_order_agg'
                }
                add: {
                    index: 'order_agg_v2',
                    alias: 'write_order_agg'
                }
            }]
        };

Reindex (Copy Documents)

esClient.reindex({
            waitForCompletion: true, // make sure you wait until it completes
            refresh: false,
            body: {
                source: {
                    index: 'order_agg_v1'
                },
                dest: {
                    index: 'order_agg_v2',
                    type: 'doc'
                }
            }
        })

Automation of steps comes handy when there are significantly high number of indexes.

More information on Elastic Search API
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html
https://www.npmjs.com/package/elasticsearch

Inspired from: https://engineering.carsguide.com.au/elasticsearch-zero-downtime-reindexing-e3a53000f0ac

Springboot vs NodeJS with Kafka

Yogesh Manware — Wed, 24 Jun 2020 04:48:44 +0000

Recently, I got an opportunity to write a microservice using NodeJS that consume messages from Kafka, transforms it and produce to another topic.

However, I had to go through the interesting phase of convincing fellow developers and other stakeholders why we should be using NodeJS based microservice instead of Spring Boot.

There are a few existing microservices that are written in NodeJS / Python and are integrated with Kafka. These services are written in the span of last 2 to 3 years. Few libraries were tried and apparently the best at that time was chosen (kafka-node). These services do not work as per expectations and occasionally drops messages.

I have been following KafkaJS npm package and it looks modern and promising so I proposed it.

With little extra efforts, I developed a proof of concept. My goal was to address all the concerns raised by other developers who had bad experience with NodeJS + Kafka stack.

Here is the high level design -

Primary responsibility of the microservice is

Consume Json messages
Transform the Json into multiple small Json objects
Produce it on multiple Kafka topics based on some conditions

I compared the microservices based on SpringBoot vs NodeJs.
Following are my observations

Of course, it is well known fact that NodeJs is much better than Java in terms of resource consumption, I had to add these details as well to emphasise that it really make sense to use NodeJS.

NodeJS based Microservice

CPU Utilisation

Memory Utilisation

SpringBoot based Microservice (similar load)

CPU Utilisation

Memory Utilisation

The resource requirement for Java application is 6 times+ more than NodeJS application, so is the monthly AWS bill.

I used streaming feature, consuming one message at a time to keep it simple. Batch processing requires extra care and love.
Throughput can be increased by having more partitions.

Following are some of the concerns and my response

KafkaJS may not be reliable in long run

KafkaJS usage is steadily increasing and they have got a very supportive developers and community. It is less likely to go away in near future.

There are few open issues in the Library

There are open issues in all well established technologies that includes Java and Springboot. This cannot be the ground to reject the proposal. It is proved in POC that the functionality we needed works fine.

Does KafkaJS support particular version and implementation of Kafka that we are using?

This was proved in POC

Is Consumer/Producer Rebalancing supported?

When one consumer/producer goes down, another processor should attach itself to the partition and consume/produce the messages. This was proved as part of POC

Does it recover from broker going down and another instance coming up?

When one of the brokers goes down, consumer application should be able to reestablish the connection with new instance. This was proved in POC.

To prove that KafkaJS library is good enough, I prepared demo consumer/producer microservices and ran those over 3 to 4 days. These services processed thousands of messages in this time without dropping a single message in all failure situations.

Finally, the POC helped to make the way for KafkaJS in our tech stack. Nevertheless, I really appreciate my team and all for raising concerns and completing POC more convincingly.

At the end, I also believe that however much good are the platform and technology, it is up to a developer how he/she writes the code and take care of corner cases. Development cannot be always plug n play :).

Refer following links for more information on KafkaJS
https://kafka.js.org/docs/getting-started
https://github.com/tulios/kafkajs
https://www.npmjs.com/package/kafkajs

Microservices - Exception Handling - Circuit Breaker Pattern

Yogesh Manware — Wed, 24 Jun 2020 02:01:00 +0000

I have been working on Microservices for years. I am writing this post to share my experience and the best practices around exception handling from my perspective. Note that it may not be perfect and can be improved.

I am working on an application that contains many microservices (>100). It is an event driven architecture. An event is processed by more than one processor before it reaches to Store(like Elastic Search) or other consumer microservices

One microservice receives event from multiple sources and passes it to AWS Lambda Functions based on the type of event. There could be more Lambda Functions or microservices on the way that transform or enrich the event.

Here is a small part of my Architecture

Microservices has many advantages but it has few caveats as well. Exception handling is one of those. If exceptions are not handled properly, you might end up dropping messages in production. Operation cost can be higher than the development cost. Managing such applications in the production is a nightmare.

Following is the high level design that I suggested and implemented in most of the microservices I implemented.

It is important to make sure that microservice should NOT consume next event if it knows it will be unable to process it. The microservice should retry, wait, recover, raise alert if required. AWS Lambda re-processes the event if function throws an error. I have leveraged this feature in some of the exception handling scenarios. It is crucial for each Microservice to have clear documentation that involves following information along with other details

All possible exceptions
Happy flow logs
Errors and explanation in detail
Type of errors - Functional / Recoverable / Non-Recoverable / Recoverable on retries (restart)
When to set an Alert
Memory and CPU utilisation (low/normal/worst)
Add metrics for each type of error

If you have these details in place, supporting and monitoring application in production would be effective and recovery would be quicker.

Compressing and Decompressing Messages (Java/NodeJs) while working with AWS Kinesis

Yogesh Manware — Tue, 23 Jun 2020 23:56:50 +0000

We use AWS kinesis stream for event sourcing. AWS Kinesis has a limit of 1MB. With this restriction and combination of Java and NodeJS based microservices, we needed a way to compress and decompress the message across platforms. NodeJs and Java has inbuilt support for Compression and Decompression. Both platform use Gzip format.

Following are the examples in Java and NodeJs. It is possible to compress in Java, decompress in NodeJs and vice versa.

Compress and decompress in Java

import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.InputStreamReader;
import java.util.Base64;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;

public class GzipUtil {

    private static final String payload = "Compressing messages and decompressing messages";

    public static void main(String[] args) throws Exception {
        String encodedStr = compress(payload);
        System.out.println("Compressed String: " + encodedStr);
        String decodedStr = decompress(encodedStr);
        System.out.println("Decompressed String: " + decodedStr);
    }

    public static String decompress(String str) throws Exception {
        GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(Base64.getDecoder().decode(str)));
        BufferedReader bf = new BufferedReader(new InputStreamReader(gis, "UTF-8"));
        String outStr = "";
        String line;
        while ((line=bf.readLine())!=null) {
            outStr += line;
        }
        System.out.println("Decompressed String length : " + outStr.length());
        return outStr;
    }

    public static String compress(String str) throws Exception {
        System.out.println("Original String Length : " + str.length());
        ByteArrayOutputStream obj=new ByteArrayOutputStream();
        GZIPOutputStream gzip = new GZIPOutputStream(obj);
        gzip.write(str.getBytes("UTF-8"));
        gzip.close();
        String base64Encoded = Base64.getEncoder().encodeToString(obj.toByteArray());
        System.out.println("Compressed String length : " + base64Encoded.length());
        return base64Encoded;
    }
}

Compress and Decompress in NodeJS

const zlib = require('zlib'); //inbuilt in NodeJs
const sizeof = require('object-sizeof'); //npm

const input = "Compressing messages and decompressing messages";

(async function () {
    // compress
    console.info(`String size: ${sizeof(input)}`);
    let buffer = await zlib.deflateSync(input);
    const compressedString = buffer.toString('base64');
    console.info(`compressed String size: ${sizeof(compressedString)}`);
    // decompress
    buffer = await zlib.unzipSync(Buffer.from(compressedString, 'base64'));
    console.info(`decoded string : ${buffer.toString()}`);
})()

Real Time Event Processing with AWS Kinesis + Lambda + ElasticSearch

Yogesh Manware — Mon, 22 Jun 2020 13:19:14 +0000

I am using AWS kinesis, Lambda (NodeJs) and ElasticSearch in my Architecture. Each Kinesis Shard has per hour pricing. The requirement is to process millions of events per hour. It is an event driven architecture and expectation is to have a near real time processing.
While AWS kinesis has an excellent throughput, Elastic Search insert/update operation can be time consuming if you have considerable number of fields to be indexed.

Few Facts

One Kinesis Stream Shard provides a capacity of 1MB/sec data input and 2MB/sec data output. One shard can support up to 1000 PUT records per second. Kinesis Data Streams Quotas and Limits
Elastic Search documentation advises to use multithreading
AWS Elastic Search Limitation

High Level Architecture

There are N number of shards and a high capacity Elastic Search Cluster. One Kinesis Shard can trigger one lambda at any point of time.

Using concurrency and bulk update, real time processing can be achieved with optimum number of Kinesis Stream Shards.

I designed a Lambda to consume batch of 50 events at a time. The size of one event is between 5kb and 1MB. It inserts 50 records concurrently.

Note that concurrency, batch size and bulk record size needs to be tweaked based on the event size and Elastic Search Cluster's capacity/limits. You might need to review the memory requirement for Lambda based on concurrency, batch/event size.

Following code snippet is an example of concurrent processing in NodeJs using Async.

It is processing 100 events, in the group of 50 at a time.
Assuming it takes 1 seconds to insert one record in Elastic Search, It inserts all 100 records in ~3 seconds.

const async = require('async');

// can process 50 events at any point of time
const MAX_CONCURRENCY = (process.env && process.env.MAX_CONCURRENCY) || 50;

const data = new Map();

// create dummy events
for (let i = 0; i <= 100; i++) {
    data.set(`EVT00${i}`, [{a: `value ${i}`}, {a: `value 2a ${i}`}]);
}

async function process(payload) {
    const startTime = new Date();
    const functions = [];
    payload.forEach((feeds, id) => {
        console.log(`processing the feed for ${id}`);
        functions.push(processEvent.bind(null, feeds, id));
    });
    const result = await async.parallelLimit(functions, MAX_CONCURRENCY);
    console.info('processing is complete');
    const filteredResult = result.filter(element => element.data).map(element => element.data);
    console.info(JSON.stringify(filteredResult, null, 2));
    console.log(`Total time taken ${new Date() - startTime} millis`);
}

async function processEvent(arrayOfFeeds, id) {
    return new Promise((resolve, reject) => {
        // simulates //Elastic Search API call for each group of messages
        setTimeout(() => {
            console.log(`processing ${id} : ${JSON.stringify(arrayOfFeeds)}`)
            resolve({'data': {a: `aggregated value - must be an es update record ${id}`}});
        }, 1000);
    });
}

setImmediate(() => process(data));