<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lorna Munanie</title>
    <description>The latest articles on DEV Community by Lorna Munanie (@lornam12).</description>
    <link>https://dev.to/lornam12</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1174038%2Fa43b6ea2-a366-4cea-bff3-09394bbec0ec.jpeg</url>
      <title>DEV Community: Lorna Munanie</title>
      <link>https://dev.to/lornam12</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lornam12"/>
    <language>en</language>
    <item>
      <title>Data Engineering for beginners. A step by step guide</title>
      <dc:creator>Lorna Munanie</dc:creator>
      <pubDate>Fri, 10 Nov 2023 05:39:48 +0000</pubDate>
      <link>https://dev.to/lornam12/data-engineering-for-beginners-a-step-by-step-guide-20ka</link>
      <guid>https://dev.to/lornam12/data-engineering-for-beginners-a-step-by-step-guide-20ka</guid>
      <description>&lt;p&gt;The growing rate of big data has led to an increase in demand of real time data processing and analytics. Data engineers play a huge role in designing and implementing data pipelines where data travels through from input to storage.A data engineer is a professional responsible for building storage solutions for huge amounts of data.&lt;br&gt;
Data engineering on the other hand is the process of designing and implementing systems that collect and analyze data so as to get insights and understands trends and patterns in the data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Roles of a data engineer&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Extracting data from data sources - Data comes from different data sources eg. databases, external APIs among others. A data  engineer therfore integrates data from these sources into a centralzed data storage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Prepare data for analysis - data engineers are responsible of processing the data by applying some transformation, cleaning and validating making it ready for analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Designing data pipelines - a data pipeline is where the data travels through from input to the storage.Data engineers are responsible for designing and implementing data pipelines to extract, transform, and load (ETL) data from various sources into a centralized data repository.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Step-by-step guide&lt;br&gt;
step1: Master the basics&lt;/p&gt;

&lt;p&gt;Mastering the fundamentals of data engineering would be the first step. As a data engineer it is advisable to have strong foundations in programming languages such as python and also databases such as MySQL/PostgreSQL, still get to understand data modelling which help in structuring data in a logical manner.&lt;/p&gt;

&lt;p&gt;step2: Data manipulation and transformation&lt;/p&gt;

&lt;p&gt;Data originates from different sources, therefore data engineer is responsible for extracting ,transforming, loading (ETL) and also cleaning and transforming data to make it ready for analysis.&lt;/p&gt;

&lt;p&gt;step3: Getting insights and pattern from data&lt;/p&gt;

&lt;p&gt;Data engineers should be familiar with various tools for visualizing the data such as tableau and power BI, so as draw patters and get insights from the given data.&lt;/p&gt;

&lt;p&gt;step4: Building data pipelines&lt;/p&gt;

&lt;p&gt;Having gotten the insights from data, you design and implement data pipeline where the data will travel through from input to the storage. data pipeline act as a highway for the data. This can be done by help of Apache Airflow to ensure smooth flow of the data.&lt;/p&gt;

&lt;p&gt;step5: Data warehousing and data modeling&lt;/p&gt;

&lt;p&gt;Data warehousing is the storage system for huge amount of data while data modeling involves organizing data in a logical manner which helps in ensuring efficiency, and consistency throughout the data lifecycle., this can be achieved by the help of snowflake and star schemas.&lt;/p&gt;

&lt;p&gt;Conclusion&lt;br&gt;
Data engineering is a critical field that empowers organizations to harness the full potential of their data. As a data engineer you need to have familiarized yourself with basics such as programming, data manipulation that is (ETL), know how to use visualization tools such as tableau or power BI, build pipelines and also get to understand how to structure data in logical manner.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Complete Guide to Time Series Forecasting</title>
      <dc:creator>Lorna Munanie</dc:creator>
      <pubDate>Fri, 03 Nov 2023 09:07:20 +0000</pubDate>
      <link>https://dev.to/lornam12/complete-guide-to-time-series-forecasting-1lcl</link>
      <guid>https://dev.to/lornam12/complete-guide-to-time-series-forecasting-1lcl</guid>
      <description>&lt;p&gt;Time series involves analyzing data that evolves over some period of time and then utilizing statistical models to make predictions about future patterns and trends in the data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Characteristics of time series data&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Temporal Ordering - Time series data is ordered chronologically, with each observation occurring after the previous one. This ordering is essential for analyzing trends and patterns.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Time Dependency - In a time series, each observation is influenced by the preceding observations, creating a sequential relationship where the value at a given time depends on the values that occurred before it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Irregular Sampling - Analyzing and forecasting time series data can be challenging when there are irregular or uneven time intervals between observations. Dealing with missing or irregularly spaced data points necessitates the use of suitable techniques.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Components of time series&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Trend - This represents the long-term direction or tendency of the data. It captures the overall upward or downward movement over time. Trends can be linear (constant increase or decrease) or nonlinear (curved or oscillating).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Seasonality - Refers to patterns that repeat at fixed intervals within a time series. These patterns can be daily, weekly, monthly, or yearly. External factors such as weather conditions, holidays, or economic cycles often have an impact on seasonality.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Noise(random fluctuations/ irregularities) - Represents the unpredictable and random variations in the data and includes factors that cannot be explained by trend or seasonality. Measurement errors, random events, or unidentified factors can contribute to the presence of noise in the data.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Commonly used time series models&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Moving Average (MA) Model - This model calculates the average of past observations with the aim of predicting future values. It is useful for capturing short-term fluctuations and random variations in the data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Autoregressive (AR) Model - This model predicts future values based on a linear combination of past observations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Autoregressive Moving Average (ARMA) Model - The ARMA model combines the AR and MA models to capture both short-term and long-term patterns in the data. It is effective for analyzing stationary time series data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Autoregressive Integrated Moving Average (ARIMA) Model - This model extends the ARMA model by incorporating differencing to handle non-stationary data. It is suitable for data with trends or seasonality.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Seasonal ARIMA (SARIMA) Model - This model is an extension of the ARIMA model and includes seasonal components. It is useful for analyzing and forecasting data with recurring seasonal patterns.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Evaluating the performance of time series models.&lt;/strong&gt;&lt;br&gt;
Some commonly used metrics include:&lt;/p&gt;

&lt;p&gt;Mean Absolute Error (MAE) - This metric measures the average absolute difference between the predicted and actual values. It provides a straightforward measure of the model’s accuracy.&lt;/p&gt;

&lt;p&gt;Root Mean Squared Error (RMSE) - RMSE calculates the square root of the average squared difference between the predicted and actual values. It penalizes larger errors more heavily than MAE.&lt;/p&gt;

&lt;p&gt;Mean Absolute Percentage Error (MAPE) - MAPE calculates the average percentage difference between the predicted and actual values. It provides a relative measure of the model’s accuracy.&lt;/p&gt;

&lt;p&gt;Forecast Bias - Forecast bias measures the tendency of the model to consistently overestimate or underestimate the actual values. A bias close to zero indicates a well-calibrated model.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Exploratory Data Analysis (EDA)and visualization Techniques</title>
      <dc:creator>Lorna Munanie</dc:creator>
      <pubDate>Sun, 08 Oct 2023 20:34:26 +0000</pubDate>
      <link>https://dev.to/lornam12/exploratory-data-analysis-edaand-visualization-techniques-5gdp</link>
      <guid>https://dev.to/lornam12/exploratory-data-analysis-edaand-visualization-techniques-5gdp</guid>
      <description>&lt;p&gt;EDA is a data analysis technique that mainly focuses on on understanding the characteristics of a dataset. It involves using various statistical and visualization tools to explore data, identify patterns and uncover insights and relationships.&lt;/p&gt;

&lt;p&gt;Exploratory data analysis is an important step in the data analysis step. This is because it ensures that the data is really what it is claimed to be and that there are no obvious errors e.g missing values, outliers etc. EDA enhances accuracy, efficiency and reliability of data.&lt;/p&gt;

&lt;p&gt;Data visualization on the other hand represents the various techniques used to represent data visually through charts, tables, maps, graphs and other visual elements. These techniques usually help to represent complex data in a more simplified and understandable format.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common graphs used while performing EDA&lt;/strong&gt;&lt;br&gt;
Scatter Plot&lt;br&gt;
Pair plots&lt;br&gt;
Histogram&lt;br&gt;
Box plots&lt;br&gt;
Violin Plot&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performing EDA&lt;/strong&gt;&lt;br&gt;
We are going to use  a sample dataset which is the &lt;a href="https://github.com/LornaM12/EDA-and-Visualization/blob/main/haberman.csv"&gt;Haberman&lt;/a&gt; Dataset to perform EDA.&lt;/p&gt;

&lt;p&gt;We start by importing several python libraries'&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--fBAwG7Wr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xv8ndbgiwvn6c54a2r6n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fBAwG7Wr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xv8ndbgiwvn6c54a2r6n.png" alt="Image description" width="487" height="117"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table Headers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--nu5zAV4F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ttjp0vdr31ey6zscowau.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nu5zAV4F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ttjp0vdr31ey6zscowau.png" alt="Image description" width="388" height="146"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Age -Represents the age of the patients undergone the surgery. It ranges from 30 to 83. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5c7g85Od--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/94etdh60a13s7ln1fxo9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5c7g85Od--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/94etdh60a13s7ln1fxo9.png" alt="Image description" width="351" height="254"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Year- Year in which the patients had the operation. It ranges from 1958-1969.&lt;/p&gt;

&lt;p&gt;Nodes - A lymph node, or lymph gland is a kidney-shaped organ of the lymphatic system, and the adaptive immune system.&lt;/p&gt;

&lt;p&gt;Status — Denoted by 1 and 2. 1 means the patient survived 5 years or longer and 2 means the patient died within 5 years.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XG_SbHtW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rmn0cw0uyh9k4iywfyyp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XG_SbHtW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rmn0cw0uyh9k4iywfyyp.png" alt="Image description" width="373" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the above code, 225 patients survived 5 years or longer and 81 patients died within 5 years.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Visualization plots&lt;/strong&gt;&lt;br&gt;
Helps us understand the dataset much better in a visual way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Histograms&lt;/strong&gt;&lt;br&gt;
These are 2-D plots where the X axis can be divided into time intervals or numerical bin ranges. Histograms help in identifying patterns such as skewness, central tendencies, and outliers.&lt;/p&gt;

&lt;p&gt;From our example above:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1UgpsGQL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mixk2l2ttt09ly5cf610.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1UgpsGQL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mixk2l2ttt09ly5cf610.png" alt="Image description" width="470" height="105"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yDNz1_Xs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/crqg0k8cpog70xbawdy7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yDNz1_Xs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/crqg0k8cpog70xbawdy7.png" alt="Image description" width="697" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bar Charts&lt;/strong&gt;&lt;br&gt;
Bar charts are suitable for visualizing categorical or discrete data. They help understand trends.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GdHgRMzc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/h4l1jmvvlzrlbmm49lhy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GdHgRMzc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/h4l1jmvvlzrlbmm49lhy.png" alt="Image description" width="466" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scatter Plots&lt;/strong&gt;&lt;br&gt;
It is a type of plot which will be in a scatter format. It is mainly between 2 features. Here we will plot nodes Vs age and see if there is any linearity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CugqbZby--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1zlak8y2djatqqy8k3b5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CugqbZby--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1zlak8y2djatqqy8k3b5.png" alt="Image description" width="512" height="513"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here blue and orange dots represent the survival status of the patients. blue represents the patient survived 5 years or longer and orange dot represents the patient died within 5 year.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pair Plots&lt;/strong&gt;&lt;br&gt;
They display scatter plots for all possible pairs of continuous variables in a dataset. They provide a comprehensive view of the relationships between variables and are especially useful when exploring multiple variables simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dZ0zZ-cl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sktpm3zyxuudelssgnt5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dZ0zZ-cl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sktpm3zyxuudelssgnt5.png" alt="Image description" width="503" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the above plot we can get some interesting facts. We can say that plot 6(Year vs Nodes)is readable compared to the other two but certainly we cannot make any concrete observations based on this graph. The plot 4, plot 7 and plot 8 are the inverted plots of plot 2, plot 3 and plot 6 respectively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Box Plots&lt;/strong&gt;&lt;br&gt;
Box plots tell us the percentile plotting which other plots cant tell easily. It also helps in detection of outliers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JyJl22A2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hwgbhj4818sdodvejabl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JyJl22A2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hwgbhj4818sdodvejabl.png" alt="Image description" width="510" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In conclusion, these are some basic plots used in EDA. It is always important to read and understand what the plot is saying. It is never good to skip EDA for a machine learning project.&lt;/p&gt;

</description>
      <category>x</category>
    </item>
    <item>
      <title>Data Science for Beginners :2023 - 2024 Complete Roadmap</title>
      <dc:creator>Lorna Munanie</dc:creator>
      <pubDate>Sun, 01 Oct 2023 10:52:54 +0000</pubDate>
      <link>https://dev.to/lornam12/data-science-for-beginners-2023-2024-complete-roadmap-57bb</link>
      <guid>https://dev.to/lornam12/data-science-for-beginners-2023-2024-complete-roadmap-57bb</guid>
      <description>&lt;p&gt;Data science is the study of data in order to extract meaningful insights from it. It extracts insights by combining various subjects such as  math and statistics, specialized programming, advanced analytics, artificial intelligence (AI) and machine learning. These insights are then used by organizations in decision making and strategic planning.&lt;/p&gt;

&lt;p&gt;A data science roadmap is visual representation of a strategic plan designed to help one learn about and succeed in the field of data science.&lt;/p&gt;

&lt;p&gt;As a wide field in technology, data science has several career paths one can follow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Data Analyst - Collects, cleans and analyzes data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data Scientist - Builds predictive models and creates data driven solutions. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data Engineer - Builds infrastructure for generation, storage and retrieval.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;BI Analyst - Creates reports, dashboards and visualizations &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Machine learning engineer- Implements ML algorithms and models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;NLP Engineer- Focus on understanding and interpreting natural languages. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Data Science skills for beginners&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mathematical and Statistical Skills&lt;/li&gt;
&lt;li&gt;Programming Skills&lt;/li&gt;
&lt;li&gt;Communication Skills&lt;/li&gt;
&lt;li&gt;Curiosity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Mathematical and Statistical skills&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Statistics&lt;/strong&gt; - This is a branch of mathematics that teaches us how to collect and analyze  data so that we can find answers to questions. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Descriptive Statistics- Conducts experiments on the entire dataset&lt;/li&gt;
&lt;li&gt;Inferential statistics- conducts experiments from a small dataset then applies to the entire dataset.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Probability&lt;/strong&gt;- Numerical representations of the likelihood of an event.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Calculus&lt;/strong&gt;- Calculus is a branch of mathematics that deals with the study of rates of change and the accumulation of quantities. It has two main branches: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Differential Calculus - Differential calculus helps us understand how things change. It helps us understand how a function behaves at a single point&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Integral Calculus - Integral calculus helps us find areas and accumulate quantities. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Linear Algebra&lt;/strong&gt;&lt;br&gt;
This is a branch of mathematics that deals with vectors and matrices. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;PROGRAMMING SKILLS&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL(Structured Query Language)&lt;/strong&gt;- This is a an organized collection of data that handles large datasets.&lt;br&gt;
&lt;strong&gt;Python programming&lt;/strong&gt; - Python offers built in data structures and libraries that store and manipulate data efficiently.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lists&lt;/li&gt;
&lt;li&gt;Tuples&lt;/li&gt;
&lt;li&gt;Dictionaries&lt;/li&gt;
&lt;li&gt;Sets&lt;/li&gt;
&lt;li&gt;Strings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Data Analysis and Visualization&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Being a data scientist would require you to work on data visualization to display the pictorial forms of charts and graphs that can be easy to understood. There are hefty of tools that are being used and some of the popular ones are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tableau&lt;/li&gt;
&lt;li&gt;Power BI&lt;/li&gt;
&lt;li&gt;Looker Studio&lt;/li&gt;
&lt;li&gt;Python Libraries e.g. Matplotlib, plotly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;u&gt;Communication Skills&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ability to spread and influence ideas that are easy to understand and that can be used in decision making.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>data</category>
    </item>
  </channel>
</rss>
