Learnings from my last assignment

#learnings #avro #kafka #presto

The year 2019 as well as my previous work assignment, has been very interesting, challenging and educating. I happen to work on different technologies and domains.
Here I am jotting down the learnings from my last assignment. While I learnt many things, one of the most important lesson to me goes in sync with a great man’s words “All I know is, Nothing“.

Standards

Importance of Coding and Logging standards for an organization or at-least a team. By defining those standards upfront, the effort saved as the application grows bigger was quite evident to me.
Importance of defining metrics and having a clear distinction between application and business metrics.
Identifying standards for publishing events and consuming them.
Identifying data storage formats, I worked mainly on Apache Avro. The advantages it brings when there is a Schema Evolution.
Importance Encryption and Tokenization standards for the data we deal with. You know privacy is no more a wishlist, but a law. It also made me understand difference between the two terms used here.

Apache Avro

Importance of defining a schema/model for the data we deal with and how Avro enforces certain data quality checks in the pipeline.
I also understood what schema evolution is and how its needs to be a planned move. Enforcing schema compatibility checks before changing them.
Avro IDL makes it super easy to define/design schemas for the data we deal with.
Challenges involved with Union and complex union types.
Impacts of breaking schema changes on production systems and probable solutions to handle that.
Defining our custom Logical type as part of Avro. Example : Encryption and Tokenization.

Kafka

I learnt a lot about how Kafka can fit into some of the applications, especially when its event based.
Difference between System and Business events.
Leveraging Schema registry to enforce checks while producing and consuming messages from topic.
Kafka headers and how that can be leveraged in the pipeline.
Producing data with multiple schemas versus single schema to a single topic.
Importance of metadata (Data about data).
Kafka Connect and use cases around it.
A little bit about Kafka Streams.

Spark

Learnt Scala (just enough for Spark), build and execute jobs by leveraging Livy.
Understand about partitioning, re-partitioning, data shuffles.
Consuming messages from Kafka in batch mode.
Learnt few things about the executor, executor cores and memory management in Spark.
Spark History, Zeppelin notebooks for Spark.
Unit testing in Spark and its importance.

Presto/Hive

Presto is nothing but SQL, but all the processing in-memory.
Writing UDF for Presto
Hive QL.

Airflow

Writing workflow as code and understanding of how Airflow works.
Creating custom Operators.

Docker and Kubernetes

How docker can help us have $0 infrastructure cost during development and Unit testing.
Kubernetes is still a partially known area. I learnt about accessing the pods, managing secrets, YAML files.

Design Patterns – Java

From reading about Design patterns to actually to see it being used.
Importance of Unit Testing and Code Reviews.

Domains

I have listed some of the domains that I have worked on. If one cannot traverse across domains for a specific Customer, then we are hardly making use of the richness of the data.

Customer – I was able to understand the different challenges a company will have in dealing with Customer data.
Sales – All the different attributes related to a transaction and why its critical to have them available in the system Near Real time.
Preference – For people who work under marketing, the preference of customer plays a great role and with more laws around them, its important to have them updated and be available to the marketing teams.
Loyalty – The success of this program can only be measured when the company can leverage this data to its benefit.

This is a brain dump of the previous assignment I worked on but it will also serve as a reminder of all the learnings as well as the unknowns. I also plan to write other posts related to some of the topics listed here as I truly believe in “To teach is to learn twice“.

The post Learnings from my last assignment appeared first on Anil Kulkarni | Blog.