DEV Community: Soumitra Banerjee

G for Git

Soumitra Banerjee — Tue, 13 Dec 2022 10:09:04 +0000

Maintaining code for us developers has been quite smooth because we have a tool that can keep track of each version of the code that is being modified and kept in distributed manner.

Git is not only knowing pull, commit and push, there is lot more to it. if we understand git more we might handle our code versions a little more sensibly.

Play around with git

Inside git directory if you further look into .git directory(which is hidden usually), you'll see the commits and logs and other informations are being stored here. Incase for example you want to search your commits you previously made with certain message, you can simply hit the command

git log --grep="example search word"

where grep is global regular expression
git having three components, repository, staging and working directory where every changes we make in working directory and usually we push it to stage and then commit it to repository and each commit produces a hash code aka SHA value which is 40 unique hexadecimal characters and this SHA-1 is generated based on each and every changes you commit (e.g. text file, code, binary file). Hence, even adding a space will generate a different SHA-1 value. Interesting thing here to note is, once you commit a change, HEAD will point to recent commit and once you commit another change the HEAD will now point to latest change and this this will also include the previous change and if you visualize, it will be a sort of linked list.

[ Commit 1 ]------[ Commit 2 ]------[ Commit 3 ]

[ 48ae79b ]<------[ 56f9ca ]<------[ 9c78de ] <== HEAD

Snapshot-1 ------ Snapshot-2 ------ Snapshot-3
Now let's say you ant to compare two commits, one handy tool to find difference between two commits is

git diff b3c98a..c6ae4b

and if you want to commit all the changes skipping the staging part then you can use

git commit -a -m "some msg"

Resources

If you want to study git in detail, you can go through this url: https://www.toptal.com/git/the-advanced-git-guide

Shhh! Hadoop is working

Soumitra Banerjee — Mon, 12 Jul 2021 18:01:50 +0000

Imagine someone with a brain that can store and process zettabytes of data. It is not possible for a "Normal" human to have such powerful brain. But, even though you don't possess a brain with such massive processing and storing capacity, you could use a tool that can do the processing and storing part on your behalf.
Yes, you guessed it correct - HADOOP!! is what I am talking about.

What is Hadoop?

It's a tool that can store and process huge amount of data as it follows parallel processing concept and stores multiple copies of the data in different systems so that, if one system fails, you don't lose your data.

Hadoop Architecture

There are mainly 4 components of Hadoop

MapReduce
Hadoop Distributed File System/HDFS
Yet Another Resource Negotiator/YARN
Utilities/Java Library

MapReduce

This component work as two units i.e. Map and Reduce. First, the data set goes as input into map function which is nothing but tuples having key-value pair. These key-value pair then goes to reduce function where the data are shuffled and sorted and then aggregated and written in the file with the help of record writer.

HDFS

HDFS is used for storage purpose. It works on master slave architecture where we have NameNode as Master and DataNode as slave. NameNode instructs DataNode, where and how to store data. It stores the metadata where as DataNode actually stores the data.

YARN

Its main task is to schedule jobs and manage resources. YARN decides when and which task or job is to be performed and how much resources to be allocated to perform the job.

Utilities/Java libraries

To run other components of Hadoop (MapReduce, HDFS and YARN) smoothly we need these java libraries. These libraries make sure that incase of any hardware failure or any other unfortunate circumstances Hadoop doesn't crash.

References

Wanna know more about Hadoop? click on the below links.
HDFS Architecture
YARN Architecture

PIVOT-UNPIVOT

Soumitra Banerjee — Sat, 01 May 2021 12:21:02 +0000

What is PIVOT?

You can rotate your SQL table using pivot function where you transform your row values into columns. Here you can turn unique values from one column into multiple columns.

What is UNPIVOT?

While PIVOT turns values from one column into multiple columns, UNPIVOT turns multiple column into row values of single column.

Use Cases

These two functions of SQL is generally used to reorganize data, so that, the data can be viewed more efficiently and is more understandable than before.

Syntax:

PIVOT

SELECT (Column1, Column2, ...) 
FROM (TableName) 
PIVOT
 ( 
   AggregateFunction(ColumnToBeAggregated)
   FOR PivotColumn 
   IN (PivotColumnValues1, PivotColumnValues2, ...)
 ) AS (Alias)

UNPIVOT

SELECT (Column1, Column2, ...) 
FROM (TableName) 
UNPIVOT
 ( 
   AggregateFunction(ColumnToBeAggregated)
   FOR PivotColumn 
   IN (PivotColumnValues1, PivotColumnValues2, ...)
 ) AS (Alias)

Example:

So, a table that looks like this:

Can be converted into something like this:

With this piece of code:

SELECT BRAND, TABLET, MOBILE, OS 
FROM(
    SELECT BRAND, PRODUCT, PRICE FROM TECHPRODUCTS
)
PIVOT (
    SUM(PRICE)
    FOR (product)
    IN ('TABLET' TABLET, 'MOBILE' MOBILE, 'OS' OS)
) ORDER BY BRAND;

If you want to learn more about PIVOT and UNPIVOT please refer:

https://blogs.oracle.com/sql/how-to-convert-rows-to-columns-and-back-again-with-sql-aka-pivot-and-unpivot

Big Data, What's the Big Deal

Soumitra Banerjee — Sat, 13 Mar 2021 12:50:54 +0000

What is Big Data?

A huge amount of data is being generated on daily basis from different devices connected through internet. This data is being stored mostly in clouds by businesses and they use this data to enhance their business activity.

How big is Big Data?

As of 2021 our entire digital universe is estimated to be more than 44 zettabytes of data and almost 2.5 quintillion bytes(A quintillion is a million times a trillion) of data are produced by humans every day. It is expected that 463 exabytes of data will be generated each day by humans as of 2025.
It's huge right? Even I am in astonishment, as I am writing this.
It is even more interesting to know that almost 90% of the total data has been generated in the last two years alone.

What we do with these data?

Well well well!! hold your breath.
What if I told you that the data generated out of your device is actually used to create an exact copy of yours? You won't believe right? But this is what it is. They track each and every move you make on internet and your each action is stored in the form of raw data. Data Scientists use these data and shape up a human behavior exactly like you and this is know as Data modeling. Then businesses use these models to manipulate you through different means, be it notifications, ad, suggestions etc.

But, not everything businesses does are bad. In fact majority of the cases are for greater good of humankind. One such example is the use of big data in Medical/Life sciences industries. In the year of 2020 the Corona Virus pandemic had disrupted every possible thing human can think of except internet. Medical industries worked closely with the governments and Hospitals across the world to ensure the track of the rising COVID cases. The data generated by the Hospitals were analyzed and industries started predicting which are the places that needs special attention. Data from all the COVID victims who were admitted in hospitals were collected and efforts were being made on how to detect pattern among the various COVID cases, how to minimize the spread of virus, for what age group the virus is affecting more, in which climate the virus grows stronger and so on. While medical teams were working day and night in treating people and taking care of their immunity power, tech industries were busy getting the data together to help people in every way possible. Even, while making the COVID Vaccine the test results and side effects of each volunteers were recorded in the form of data and were analyzed by various healthcare professionals across the world to improve the quality of vaccine.