DEV Community: ramsjha

Data as a Service: Domain-Driven Design

ramsjha — Mon, 21 Feb 2022 10:39:53 +0000

In an organisation, while working with data, we tend to consider every new request as a new data pointer and expose it to the visualisation layer in an isolated manner. There isn’t a problem in most cases, however, what will happen when the organisation grows to a bigger scale? Problems will start surfacing out and causing issues. There will no longer be enough resources to sustain pulling larger data chunks, and there will be issues with data alignment across different data pulls or dashboards. The inevitable performance impact and increased computational cost of a query are better to be avoided, so let’s try to find out how.

This kind of problem requires retrospection as to how to gain leverage and have a data model or domain-driven design well coupled with a data model. To better understand how this is done, let’s see what domain-driven design actually is first. The “Domain” part in Domain-Driven Design officially refers to a “sphere of knowledge and activity around which the application logic revolves”. In other words, the “Domain” is what is commonly referred to as “business logic” in the software world. In Domain-Driven Design, business logic is considered to be the heart of the software.

In the Investment Banking world, for example, trade settlement is a domain where settlement ratio, settlement exposure, etc. are key metrics that are sliced based on Asset type or Account Type, etc. Regarding settlement as a domain is the first key towards adoption and anticipating the needs based on demand or past queries done.
Once you get to the depth of the domain, we can tie functional needs to an efficient scalable data model so that we can serve all data requests in a consistent manner, and in the best abstract way. This helps users to think about functional needs, and technologists work towards a scalable model as a repository to build.

Once the domain and modelling are in shape, then, irrespective of need, we should invoke one query with a different filter or dimensionality needed to serve it, along with the real-time or batch requests. At this stage, all abstract data models need to be exposed as a service/API (REST/SOAP), so that consumption can be done not only specific to visualisation or DB but for widespread use. This process of exposing makes your data go places while at the core of it is an efficient domain-driven design and a data model.

Some insight on how to build API/Service is to start with databases, queries as a collection in No SQL catalogue as configuration to map service, DAO to map service to API, ElastiCache to serve faster, Flask, Angular and Python for API and UI.

In conclusion, always look from a business lens to consolidate requests as a domain, build it into a model and serve as a batch or API in the service method.

Transitioning your career into Data Engineering

ramsjha — Mon, 21 Feb 2022 10:38:35 +0000

As a: professional with substantial experience/a college graduate/anybody, how do I start a career in Data Engineering?

I get this question all the time.

Data Engineering is a complex field with a diverse landscape of technology, and it can be difficult to decide what the right starting point is. Will it stay relevant with time?

In this post, I will try to help those who want to get into Data Engineering as I once had to answer those questions myself, and now I have some ideas to share.

There are a few things that are preventing the new specialists from entering the field. First of all, let’s start with a common belief that after transitioning, all skills you have acquired so far are going to be thrown away. Not really! While it is going to be an incremental skill-building exercise, the existing skills will be of use. And a newbie will get a clean slate with software engineering awareness.

Let’s take a Mainframe professional, for example. The key skills these specialists possess are Data Storage and Retrieval, Processing of Data, Optimisation, and Visualisation. What will happen to these? Data storage and retrieval philosophy remain the same but the mechanism will change. Text format becomes the Avro/parquet file format, we use Snappy compression instead of EBCDIC, file pointer vs block read, etc. Processing of data is another key element in both professions; a page or row of file/DB can be morphed into a block process using map-reduce. Database Processing and System Design skills will also be built upon what you already have.

As you can see the key elements of playing with data remain the same, it is just the philosophy that changes with every new tooling or methodology that comes into the market. Just like with people in Peoplesoft, QA or ETL background – embrace the change and be resilient to adapt to it.

With what we know, let’s have a look at what skills are crucial to be a professional Data Engineer. First of all, you need to know any programming language like Python, Scala or Java. A low level, however, will require some work. Database skills are important, you should be familiar with Relational, MPP, NoSQL, Cloud DWH, and SQL. You should also know what to do with processing frameworks (Map Reduce, Spark, ETL framework etc.) The last skill you are going to need is Data-intensive design. With this at the core, we have to complement our skillset with Cloud, DevOps, Scheduling, Documentation & Infra as code, Security etc. Above all, problem-solving skills lie at the heart of it as the tech landscape changes but the solution approach and intuition will always remain the same.

Where to start is a tricky question. I recommend figuring out the gap in your knowledge and starting to bridge them by learning those concepts. A good idea is practising them by making Git Repo for showcasing. Download open datasets, ingest using real-time or batch and apply transformation using structured streaming, spark SQL or CDC. Then push to BI tooling, automate the workflow and in the end make it complex using assumptions. Once you start getting comfortable with individual projects, you may try going back to the concepts and upgrading them to the next level, and focusing on design patterns to go deeper where experience can be leveraged.

The journey might be daunting, and you will probably end up feeling lost or getting stuck at some point, but this is what transition always looks like. Once in the role of a Data Engineer, you will learn how to adapt to the constantly changing reality, and shaping your skillset accordingly won’t be as much of a challenge for you anymore.

Solving Analytical Problems: Working-backward Approach

ramsjha — Mon, 21 Feb 2022 10:36:40 +0000

When working on any kind of analytical problem, the first solution that comes to mind is using the technical approach. We try to over-index into it from the solution point of view using Data Science, Machine Learning, Data Engineering, Descriptive Analytics, etc. Traditionally, we start with defining the problem and figuring out the result strategy, this may lead to sub par solution which may not be optimal.
Using traditional methods isn't always the most efficient strategy, Sometimes, reverse-engineering the problem can yield far better results.
In this post, we’ll talk about the working-backwards approach through a business lens. In this case, we flip the process upside down, first studying the intended effect, and devising the process of reaching it. This method consists of 4 steps:

Step 1: Understanding the problem
First of all, we need to understand the business problem in detail and find out what exactly is going on, what is impacted, why the end-user needs a certain solution and, once solved, what kind of impact it will create. This step requires a business understanding of the domain to start with and taking the problem point into account. We need to define the scope of the problem, finding a logical block or a bottleneck. It is very important to clarify how the user sees it as well.
This step is crucial as it determines the end result impact alignment with the business expectations and shapes the process of solving the problem in a bottom-up way. It may be necessary to look for an opportunity to buy, build or reuse paradigm at inception.

Step 2: Data-driven exploration
This step is most underrated as mainly we do the exploration with a solution in mind, leading us to a blind spot. I am not saying that subject matter expertise is not helpful, but, as a rule of thumb, we have to sense check from all perspectives and co-relate. One way to look at data-driven exploration is to use any python library to explore the problem using statistical methods and describe it as it is stated. Another way is to define data in business contextual language with experiments rather than just statistics which can lead to A/B testing kind of approach or operational research etc. The end result will determine potentially the entitlement value of the business problem and direction for strategy.

Step 3: Scoping the problem
In this step, based on what we managed to learn during the previous step, we scope the problem and try to remove any possible ambiguity. We need to carefully consider all the objectives and requirements necessary to complete the project, and estimate the time and costs. We can now set the key milestones and summarise the process, describing it profoundly.
This is a fundamental step towards success, and the stakeholders and tech team have to agree on how they are going to move forward from here, as after this any misalignment will create churn and rework.
It is important that every step of the process is clearly explainable and the performance metrics are shared and discussed.

Step 4: Defining and dissecting the problem into parts
A business problem is not an ML problem or a descriptive one, it’s an amalgamation of describing it, applying ML for particular data pointers, with Data Engineering at its core. Once we break the problem into parts, we can assign a team to solve each of them or set the dependency matrix to proceed. This may include setting a data pipeline, doing descriptive analysis and predicting or prescribing the insight and co-related actions on top of it.
Once the outcome is defined, we go into validation and further consumption of data to run programmes so that action can be taken to capitalise entitlement. We can create a mechanism to have an automated hands-off-the-wheel insight to action loop and have it go on for as long as we need, saving the teams’ time and effort.

Conclusion
In conclusion, the working-backwards approach is the perfect method to implement when we want to be sure that the results of our work are going to meet the user’s expectations exactly, and it allows us to structure the process of problem-solving in such a way that the decisions it leads to are more precise and beneficial.