DEV Community: anandsunderraman

Navigating Microservices Code Repos

anandsunderraman — Thu, 20 Jan 2022 02:03:22 +0000

This post is a repost from my personal blog The Code Sphinx

This happened when I was working for one of my previous employers.
I had just joined the company and was working on a microservice that was consuming a REST api exposed by another microservice.
There was this JIRA ticket I was working on and I was not sure about the data model exposed by this REST api call. At this point in time the adoption to OpenAPI / Swagger / RAML was just beginning. I was new and was wondering whom I should reach out to.
Just then my colleague, who had joined a month before me, sent me a link to the codebase of this repository that implemented this REST api.
He also went ahead and showed me the Crucible tool that showed me code reviews to this repo.
That was an "Aha !!" moment for me. It opened up new avenues to learn and to make new connections across teams.
Until then I had never explored a code repository that I did not work on.
But now, since I was seeking an answer to my question, it was motivation enough to explore this code repository.
It was like solving a puzzle and kept me hooked until I found an answer to the question.

Options to explore a new code repo

ReadMe / Wiki Documentation
Read the code
Unit Tests
Running the app

ReadMe / Wiki Documentation

The general expectation is that any good code repository has a good readme that talks about

What does the project / codebase do ?
How to set it up to run on a local machine ?
How to contribute to it ? The above are generally true for well maintained open source repositories. If there is a readme with accurate information then look no further, literally !!

Many organizations use other products to maintain internal documentation. A popular product adopted being Confluence. Confluence has a good search capability. A quick search about the repo name or the project name should list Confluence pages that list or mention this repo. This also might give insight into the context of the project and how it fits in the organization.

Read the code

This should be the most obvious choice. But the irony is, there is nothing obvious about reading source code. Source code is the ultimate source of truth.
It takes a lot of experience to try and figure out the flow of control in an app just by reading the code.
I am no expert but I do try poking around the code to understand what certain parts of the code do.

A quick way to think of an app is:

Bootstrapping / App initialization / Startup
Dependencies
Points of integration into the app

Bootstrapping / App initialization / Startup

Spring Boot Application: Look for Application.java file or a file with @SpringBootApplication annotation
Node.js: Look for index.js or look at package.json to see what starts the app.
Go: Look for main.go

Dependencies

Spring Boot Application: Look at the application properties or application yaml file.
Node.js / Go Lang: Look for the environment files or look at the starter / bootstrap file to see which files it refers to load the configurations.
Look for the infrastructure code to see where and what environment variables are set

Points of integration

An app can have multiple types of integration.
The most common ones being

REST API
Event driven interactions
Database (DB) interactions

REST API

Spring controllers in a Java Spring Boot based application
Routes / Middleware in a express node.js based application
Search the code for REST API path and figure out what is the handler for the code
Search the code for controllers

Event driven interactions

Search the code for "Listeners" or "Publishers"
Search for code references for the queue / topic name
Based on the code references for the queue or topic name, search for event handlers publish messages or subscribe messages.

Database (DB) interactions

Search the code / application configuration for the connection string to the database.
See if there are any .sql files in the code base that define the DDL
See if the code uses any sql migration tool and what scripts it might use. If thee DB connection string is obtained, one can easily connect to the dev instance of the DB using a client and try to understand the table and get a hang of the model that this app interacts with.

Again all of this is just a shot in the dark. But over a period of time and with experience one can narrow down areas to look at and inspect.

Unit Tests / Integration Tests

So what do we do when the instructions in the readme are not accurate or in the worst case there is no readme at all ?
Many times I have come across libraries that do not have great documentation for their api.
In such situations I read through the unit tests to see how the library / api is being used.
The hope is that the developer cares for the code developed and has tested all the functionality that the api / library has to offer.
This is like a backdoor to documentation.

Running the app

The last resort is to run the app.
This is where the fun and exciting part begins.
One must be careful not to get into a rabbit hole trying to get the app running. This effort must be time-boxed.

I have worked across technologies like Java, Go and Node.js.
So as a developer I am familiar with how to run a Java app or a Go app or a Node.js app.
I just attempt to get the app running, fingers crossed.
More often than not there are errors running the app and now is where the exciting part comes.
I start to resolve the errors one by one.
More often than not the errors are related to the application configurations.
It is a matter of figuring out which configuration is missing or what needs to be tweaked to overcome that error.
It is like being a detective solving the clues one by one.

First step is to clone the repository on your local machine.
The next step is to download the dependencies for your application and download them.

Spring Boot Application using Maven mvn clean install
Node.js Application npm install
Golang Code go mod download

Java Spring Boot Applications

I have been lucky that my organizations have always been able to provide me a license to use IntelliJ IDEA.
With IntelliJ, I just click on run on the Application.java file to try and start the app.
This creates a run configuration for me. More often than not the app would not run for me.
The errors would be singing a tune like
Could not instantiate a bean
A spring boot application is all about following the beans.
Looking at the bean one can see what properties it depends upon and try to correct the configurations.
If the app depends on a DB or an instance of a message broker, start up a local version of the DB or message broker and point your app to the local instance.
Eventually the app gets up and running.

Node.js Applications

Look for the startup file specified in package.json
The convention is that it should be index.js.
So I would run node index.js.
Again I follow the same process.
There would be errors starting up the app and it is a matter of reading / deciphering the errors to find out what configurations / environment variables need to be tweaked to get the app running.

Golang Applications

Look for main.go
If you have a GoLand license, just click on the Run next to the main function.
On command line run go main.go
Follow the process of deciphering the error messages and tweaking the configurations / environment variables to get the app running.

Docker

Look out for the Dockerfile if one exists.
If there is a Dockerfile, attempt to build the image and run it.
The Dockerfile also provides clues on what environment variables need to be set.

What's in it for me

You are not waiting on someone to walk through the code.
There is a sense of achievement and satisfaction to get an app running.
It validates my credentials as a Software Engineer
It helps validate assumptions I had about the app or it helps understand things I did not know about the app.

Real life experience

In my job as a tech lead, I joined a team that managed 20 or so microservices. It had a mix of Java and Node.js. It had a mix of REST API and Event Driven architectures. I had a choice, to wait on someone to walk me through them or do it by myself. I chose the latter and I understood the challenges the team was facing much better. It helped me in my ability to lead the team and make good architectural decisions.

Growing into a tech lead

anandsunderraman — Fri, 09 Oct 2020 12:58:24 +0000

Its almost 2 years since I assumed the role of a technical leader.
I felt it would be a good time to reflect on my experiences and learnings.

Building Trust

Growing into a tech lead role from a software engineering role was a different and challenging experience.
Being promoted as a tech lead from within a team can be one experience, but joining as a tech lead of an already established team can be a different ball game altogether. My case was the latter where I joined as a tech lead for an already established team.
So my first task was to build trust and the only way to build trust is to get to

Observe how the team works
Work along with the team to understand what they have to go through

Investing time in best practices

I take inspiration from cricket and best example of a leader is Mahendra Singh Dhoni.
My observation has been that he has focussed on the process and the results just followed.
Similarly in a software engineering team it is wise to invest time in best practices.
Inculcating these practices and processes cannot happen overnight and can take long periods of time, sometimes even months because of other priorities.
One just needs patience to pursue them and results will show over a period of time.

Collaboration

Being a tech lead is a huge responsibility but that does not mean you have to have to know the ins and outs of everything.
It is ok to say you don't know something and rely on advise from your team members to take the right decision.
Your way need not be the HIGHWAY, a collaborative design has the following advantages

Multiple diverse perspectives are accounted for.
Rather than the tech lead owning the design, the team owns the design.
Team know why certain decisions were taken during the design.

Enable your team

You might not know every technology out there but your experience can help the team to look at the right direction.
I am surprised many a times how I have not known anything about a technology but have been able to look at the right place just based on experience. Help team members with tips on troubleshooting.

Delegation

There are times where I am overwhelmed with multiple things to do.
I have made the mistake of taking it on all me and working overnight to get things done.
But very recently I have taken a different approach.
I make a list of things I need to do and then inspect if some of them can be delegated to folks in my team.
It has dual benefits

I am no longer overwhelmed
The team also learns and grows.

Handling Stress

Initially I would plan my day but then there would be unexpected issues that would derail my plans and I am always playing catch-up.
There is no single solution to this predicament.
All you can do is mentally be prepared for everything and then there is no stress at all !!
I personally reflect on what are the "unexpected" issues and try to see if I can invest time in better processes or practices to avoid them in future.

Be Fearless

Be fearless is another way of saying don't be afraid of failures. We all make mistakes. The important thing is to own it and take corrective action. I am lucky to be on a team where I am not penalized when I own a mistake. I believe a good leader is one who can talk about his/her failures so that others can learn from it.

Communication

Communication is key, especially in times like these where we are remote.
One rule of thumb I learnt early on was that if you find yourself typing huge paragraphs in a chat message it is time to get on a call to talk it out !!
If your team needs to succeed you have to communicate clearly.
When asking your team to work on a ticket or a story, explain to them why we need to work on it so that they have full context and are able to provide a better solution.

Learn

Always be open to learn.
Just because I am a tech lead does not mean I cannot learn from my junior team members. Many a times they are in the weeds of the problem and will come up with a better solution than you had imagined !!

Continuing to learn and grow ...

Running ELK (Elastic Logstash Kibana) on Docker

anandsunderraman — Mon, 21 Sep 2020 03:50:04 +0000

ELK (Elastic Logstash Kibana) are a set of software components that are part of the Elastic stack.

What does ELK do ?

To explain in layman terms this what each of them do

Elasticsearch is primarily a data store
Logstash is a data parsing software that stores the data in Elasticsearch in a desired format
Kibana is the UI that can be used to query / visualize the data that is stored in Elasticsearch

To get an in-depth understanding of what they do and how they work I would recommend Beginner's Guide To Elastic Search

How do I run ELK ?

Since each of the above components are separate pieces of software one way of running them is to head to the installation instructions and run each one of them separately.

An easier and a more convenient way to run them would be using Docker.

Most likely if you find yourself experimenting with this stack you would want to run all these 3 together. What better way to achieve than using docker and docker-compose

Docker Compose

At the time of writing this post I was experimenting with ELK stack version 6.6. Hence the following docker-compose.yml refers to image versions 6.6.

If you notice the above gist it references a directory by name logstash-conf. The contents of this directory is a logstash configuration file that dictates how the data needs to be parsed.

The contents of this file would be:

What are we configuring in Logstash ?

The following section says we will get the input for logstash via beats which is another software in Elastic which I will attempt to explain in another post.

We configure to obtain data via port 5044 and we expect the data to be in json format

input {
    beats {
        port => "5044"
        codec => "json"
    }
}

Here we state that we are using the json plugin in logstash and attempt to extract json data from the message field in our log message. I know this sounds a bit cryptic but hope you take the leap of faith with me on this.

filter {
  json {
    source => "message"
  }
}

Finally we have the output. We basically are passing on the data to elasticsearch to store the data in an index that is defined by "%{[fields][project]}-%{[fields][application]}-%{+YYYY.MM.dd}"

output {
    elasticsearch {
        hosts => "${ELASTIC_HOST}"
        index => "%{[fields][project]}-%{[fields][application]}-%{+YYYY.MM.dd}"
        codec => json
    }
}

How to run it ?

Start docker on your local machine
Run docker-compose up in the directory where you have the docker-compose.yml

How do I navigate to Kibana ?

Point your browser to http://localhost:5601/
Note this is based on the port 5601 provided for the kibana image on the docker-compose.yml

Conclusion

The main goal of this tutorial was to demonstrate how to get the ELK stack running using docker

JSON Logging in Spring Boot Applications

anandsunderraman — Fri, 18 Sep 2020 02:28:48 +0000

Application logs are like the black box. When troubleshooting issues first thing I look at is the application logs. It becomes essential to put thought into what we log in order to be able to identify issues based on logs.

Today we have a class of applications called log aggregation systems. Log aggregation is useful when we have lot of microservices and we want to trace logs across microservices.

I would like to demonstrate how to log application logs in a JSON format.

Advantages of JSON Logging

A traditional java application log would look something like the following

2020-09-17 21:56:10.740  INFO [Orders:restartedMain::] o.s.b.w.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 8080 (http) with context path ''

If one were to search it in a unix box one would do a

cat application.log | grep <your-search>

I recall days where logs like these were shipped by me to Sumo Logic (another log aggregation system) and I would use regular expression gymnastics to arrive at an optimal search.

Someone came up with this different approach of why not store the logs in a more searchable format when it is primarily used for searching. JSON happened to lend itself easily to being searched and hence json logging. Another way to refer to this is also structured logs, because the logs have a well defined structure which can later be used to search.

The same application log displayed above in a JSON format would look like

{
  "@timestamp": "2020-06-17T14:41:11.174-04:00",
  "@version": "1",
  "message": "Tomcat initialized with port(s): 8080 (http)",
  "logger_name": "org.springframework.boot.web.embedded.tomcat.TomcatWebServer",
  "thread_name": "restartedMain",
  "level": "INFO",
  "level_value": 20000
}

This now makes it easier to search the logs by say time or level or even thread_name.

JSON Logging in Spring Boot Application

For logging logs in a JSON format one needs to include 2 dependencies. If you use maven for dependency management one would include the dependencies as follows

<dependency>
    <groupId>net.logstash.logback</groupId>
    <artifactId>logstash-logback-encoder</artifactId>
    <version>6.4</version>
</dependency>
<dependency>
    <groupId>ch.qos.logback</groupId>
    <artifactId>logback-classic</artifactId>
    <version>1.2.3</version>
</dependency>

Here we are using the library logstash-logback-encoder

We can then configure the json encoder the logback with the following code snippet

<encoder class="net.logstash.logback.encoder.LogstashEncoder">
    <providers>
        <timestamp>
            <timeZone>EST</timeZone>
        </timestamp>
        <pattern>
            <pattern>
                {
                "level": "%level",
                "service": "orders",
                "traceId": "%X{X-B3-TraceId:-}",
                "spanId": "%X{X-B3-SpanId:-}",
                "thread": "%thread",
                "class": "%logger{40}",
                "message": "%message"
                }
            </pattern>
        </pattern>
        <stackTrace>
            <throwableConverter class="net.logstash.logback.stacktrace.ShortenedThrowableConverter">
                <maxDepthPerThrowable>30</maxDepthPerThrowable>
                <maxLength>2048</maxLength>
                <shortenedClassNameLength>20</shortenedClassNameLength>
                <rootCauseFirst>true</rootCauseFirst>
            </throwableConverter>
        </stackTrace>
    </providers>
</encoder>

So let us assume you want to log the data as traditional logs to the console and log them as json to a file we can configure the logback xml as follows

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <include resource="org/springframework/boot/logging/logback/console.xml"/>
    <appender name="stdout" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} %5p [Orders:%thread:%X{X-B3-TraceId}:%X{X-B3-SpanId}] %logger{40} - %msg%n
            </pattern>
        </encoder>
    </appender>
    <appender name="fileout"
              class="ch.qos.logback.core.rolling.RollingFileAppender">
        <File>./logs/orders.log</File>
        <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
            <maxIndex>8</maxIndex>
            <FileNamePattern>./logs/orders.log.%i
            </FileNamePattern>
        </rollingPolicy>
        <triggeringPolicy
                class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
            <MaxFileSize>128MB</MaxFileSize>
        </triggeringPolicy>

        <encoder class="net.logstash.logback.encoder.LogstashEncoder">
            <providers>
                <timestamp>
                    <timeZone>EST</timeZone>
                </timestamp>
                <pattern>
                    <pattern>
                        {
                        "level": "%level",
                        "service": "orders",
                        "traceId": "%X{X-B3-TraceId:-}",
                        "spanId": "%X{X-B3-SpanId:-}",
                        "thread": "%thread",
                        "class": "%logger{40}",
                        "message": "%message"
                        }
                    </pattern>
                </pattern>
                <stackTrace>
                    <throwableConverter class="net.logstash.logback.stacktrace.ShortenedThrowableConverter">
                        <maxDepthPerThrowable>30</maxDepthPerThrowable>
                        <maxLength>2048</maxLength>
                        <shortenedClassNameLength>20</shortenedClassNameLength>
                        <rootCauseFirst>true</rootCauseFirst>
                    </throwableConverter>
                </stackTrace>
            </providers>
        </encoder>
    </appender>
    <root level="info">
        <appender-ref ref="fileout" />
        <appender-ref ref="stdout" />
    </root>
</configuration>

For a complete spring boot application with json logging please refer to sample applications orders or shipping

Conclusion

JSON logging can be achieved in other languages / frameworks as well like node.js and python.

For node.js libraries like winston can be used.

Copying over data from MongoDB to S3

anandsunderraman — Sat, 25 Apr 2020 03:35:21 +0000

Copying over data from MongoDB to S3

Very recently we were tasked with copying over data from our MongoDB DB to an S3 bucket.
Since the timelines were tight our immediate solution to this was deploy a lambda that will run once a day, query data from MongoDB and copy it to s3.

We sized up the data to be around 600k records. It did not seem like a lot and we were confident of achieving the same.

Long story short this turned out to be a bigger task than we thought and we ran into multiple problems.

I would like to talk about the problems we faced at each stage and how we improvised and finally arrived at a working solution.

At the end of the process I learnt a lot but I learnt that I have lots more to learn.

Ok getting down to details.

Tech Stack

AWS Lambda on Node.js 12.x

First Attempt

Our first attempt was a brute force attempt in hindsight.

The approach was:

Query the collection asynchronously in batches of 100k
Do a Promise.all on all the batches of queries
Concatenate the results array
Write the data to a s3 file

Outcome:

Since we tried to load all the 600k records into a string to put an object into s3 we ran out of memory even after allocating the maximum permissible memory 3008MB

Code:

Second Attempt

Based on our first attempt it was clear we had to handle our arrays carefully.
In the first attempt we first flattened the results array into a single array.
We then iterated over the flatenned array and transformed each db record into a string and then push it into another array and hence the memory was insufficient

The approach was:

Do the array flatenning and transforming it to strings in a single array
Write the data to a s3 file

Outcome:

Success !! we finally were able to write all the records to a s3 file
The issue was we used up all the 3008MB. So although it works for the current scenario, it is not future proof and we might run into memory issues again

Code:

Third Attempt

So although from the previous attempt we tasted success we need a more efficient way to handle these huge arrays of data.

Streams

A little google search and stackoverflow questions led me to streams in node.js
I will not delve deep into streams but rather quote resources that I referred to.
The main concept of streams is that when you have large amounts of data to work with, rather than loading it all in memory, just load smaller chunks of it and work with it.
On digging deeper we found that mongodb find and aggregate operations by default return streams.
We also found that s3 upload api accepted a readable stream and had the ability to do a multipart upload. This seemed like a perfect way to work.
Mongodb query results would be the data source and s3 file would be the sink.

The approach was:

Stream the mongodb results
Mongodb aggregate default cursor size streams 16MB worth of data
Use s3 multipart upload api

Outcome:

Even more success !!. We managed to reduce the memory consumption from 3008MB to 200 - 300MB. That was a huge win for us.
The issue was that there was some code issue because of which the node script would not exit and the lambda would timeout after the max time of 900 seconds even though the actual execution was completed way before Due to the timeout issue the lambda retries 3 times and so the file is written 3 times, wasted executions

Code:

Fourth Attempt

We had nailed down most of the approach and the question was how to exit the node.js function. We realized we did not call the callback function of the lambda handler once the upload was done. Once that was done we were able to complete the execution under 490 seconds and exit the function.