DEV Community

Cover image for INDEX of DATA WORLD JOURNEY.
Nitin-bhatt46
Nitin-bhatt46

Posted on

INDEX of DATA WORLD JOURNEY.

BE A PART OF MY LEARNING JOURNEY.
Being a Hardcore Biology Student to Getting Excellence in DATA FIELD.

JOIN LEARN AND GROW.
Resources USED FOR LEARNING :-
Courses :-

Vleran ( FREE)
Pw-Skills ( PAID )
Intellipaat ( PAID)
Youtube ( FREE )

chatGPT ( FREE )
OTHER SECRET……….
BOOKS :-
Think Stats: probability and statistics for programmers - Allen B.Downey
An Introduction to Statistical learning.- Gareth James, Daniela Witten, Hastie Robert Tibshirani, Jonathan Taylor.
PRACTICAL STATISTICS FOR DATA SCIENTISTS- Peter Bruce, Andrew Bruce & Pere Gedeck.
Introduction to Machine Learning with python - Andreas C.Muller & Sarah Guido
Deep learning with Python - Francois Chollet.
Storytelling with data - Cole Nussbaum Knaflic.
MANY MORE.
*** ALERT ***

THIS is an overview of what we will be learning in our upcoming days. As technology advances I will update all the things which are mentioned in this page, like project technology to learn.
This is future proof and the tracker of my journey.

JOIN WITH ME TO EXPLORE DATA FILED AND THE POSSIBILITIES OF FUTURE.

INTRODUCTION TO DATA WORLD :-

Content :-

Data Analyst :-

Entry-level position focused on collecting and analysing data.

Excel OR Google Sheets
Mathematics ( optional only follow when you want to in top 1% )
Permutation & combination.
Probability.
Statistics.
graphs
SQL ( MSSQL )
Visualisation tool ( POWER BI )
Storytelling

PROJECT :-

  1. Retail Sales Analysis
    1. Website Traffic Analysis
    2. Supply Chain Analysis

Data Science :-

Analyses complex datasets, builds statistical models, and extracts insights & Focuses on designing and implementing machine learning models.

It has two career :-

Data Scientist & Machine learning Engineer.

Data Scientist :-
Analyses complex datasets, builds statistical models, and extracts insights.

Machine learning Engineer :-
Focuses on designing and implementing machine learning models.

Python
Basic Python
OOPS (Object oriented programming)
DSA ( Data Structure and Algorithms )

LIBRARY FOR DATA ANALYSIS IN PYTHON :- 
Numpy
Pandas
Matplotlib
Seaborn
    Scipy
         Scikit Learn
TensorFlow
Enter fullscreen mode Exit fullscreen mode

Mathematics :-
Statistics

linear Algebra
Calculus
Graphs

Projects :-
Data cleaning and exploration.
Machine learning :-
Introduction
Supervised
Regression
Model :-
Linear, polynomial, logistic
Evaluation metric :-
Mean squared error(MSE)
R-squared
Classification
Decision Trees
Linear classifiers
Decision Boundaries
K-Nearest Neighbours
Random Forest

Unsupervised
Clustering
K-means clustering
Hierarchical clustering
Mean-Shift Algorithm
Association
Association Rule learning

Reinforcement Learning.
Introduction to RL.

PROJECTS :-
Survey Data Analysis
Expense tracking and Analysis.

DEEP learning
NEURAL NETWORKS
Introduction to DEEP LEARNING
Introduction to Neural Networks
CNNs ( Convolutional neural networks)
RNNs ( Recurrent Neural Networks)

Computer vision

Natural language processing
Web Scraping
Beautifulsoup

Projects :-

NLP

MLOPS
Ensures seamless deployment and maintenance of machine learning models
Optimising the code.
Docker - Understanding of containerization
Kubernetes - orchestration tool.
Automating infrastructure - Terraform
Security
Cloud Computing
AWS

PROJECT :-
SAME project with production level coding and optimization.

Data Engineer :-

Big data and Hadoop
Intro to Big data
Hadoop and its evolution
HDFS Architecture
Hadoop ecosystem intro
Linux commands
HDFS commands

Map Reduce
Intro to Map Reduce
Different phases of Map Reduce
Combiners and Partitioners
Hash Function in Map Reduce
Shuffling and sorting in Map Reduce
Map Reduce Use Case

Hive
What is Hive
Hive Query Language
Comparison Hive vs RDBMS
Hive Architecture
Hive Views
Hive Subqueries
Built-in Functions
Partitioning
Bucketing
Ranking
Sorting
Hive File Formats

Sqoop
Introduction
Sqoop Import
Sqoop Eval
Sqoop Export
Connecting to MySQL
Sqoop Incremental
Sqoop job creation

HBASE

Introduction
Properties of HBase
RDBMS vs HBASE
HBASE Architecture
HFile
Zookeeper
Update HBASE Data
Delete HBASE Data
Cassandra Overview
HBASE vs Cassandra
Filters in HBase.

SCALA
Scala Introduction
Why Scala
Data Types
Strings
If/else
For Loop
While Loop
Functions
Arrays
Lists
Tuples
SetMap
Functional Program
Anonymous Function
Recursion
Scala Operators
Scala Type System

SPARK
Spark comparison with Map Reduce
RDD/DAG
Immutability
RDD Lineage
Accumulators
Spark Stages
Spark on Yarn
Spark Storage
Intro to SparkSQL
Handling columns in Dataframe/dataset
Aggregations
Window Aggregations
Joins using Data Frame
BroadCast Join
Shuffle sort-merge join
Spark optimization
Spark Streaming
KALFA
Introduction
Kafka Architecture
Index
Cluster
Integrating Kafka with Spark

AWS
AWS EMR
OnPrem vs Cloud
HDFS vs S3
What is S3
EC2
Elastic IP
AWS storage, networking
S3 and EBS
Athena
AWS Glue
AWS Redshift
AIRFLOW
Intro to Apache Airflow
Airflow Architecture
Airflow Installation
Creating and viewing DAG
Cron job creation
Logs Viewing
Sensors

Specialises in designing, building, and maintaining data architecture.
Apache Spark
Apache Kafka

PROJECTS :-

Stock and Twitter Data Extraction Using Python, Kafka, and Spark
Use Python to Scrape Real Estate Listings and Make a Dashboard
Realtime Data Analytics
Image Caption Generator

WAIT wait………….

In this Advanced World :-
We know CHAT GPT.
How to use advanced AI skills for our DATA WORLD.
As we reach this segment you will be able to do all the above work without even memorising the code, just use these tools to write the code.

Then you will ask why we need to read and study. We can just go and ask AI to create it.

Stop stop ….
Because before you build something you must know what to build with which tool, and you must know the basics to advance the method so you can know the possibility of the tools. So, to know that we must know how the tool and skill works.

Be with me …
FOLLOW FOR More……………..
Monday to Friday Every day 6:00 Pm Learn with me.
SAT AND SUN FOR COLLEGE STUDY.

DISCLAIMER :-
We will be learning all the data set and google sheets will be shared so you can visit and learn from it. Whenever you get a sheet try to make a copy and then edit it. Don’t ask for permission to edit.
Thank you.

Top comments (0)