<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mutanu-Vivian</title>
    <description>The latest articles on DEV Community by Mutanu-Vivian (@mutanuvivian).</description>
    <link>https://dev.to/mutanuvivian</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1852888%2F72c0d49d-15b6-4eb9-a7c6-7951ce281066.png</url>
      <title>DEV Community: Mutanu-Vivian</title>
      <link>https://dev.to/mutanuvivian</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mutanuvivian"/>
    <language>en</language>
    <item>
      <title>The Ultimate Guide to Data Analytics</title>
      <dc:creator>Mutanu-Vivian</dc:creator>
      <pubDate>Mon, 26 Aug 2024 21:20:22 +0000</pubDate>
      <link>https://dev.to/mutanuvivian/the-ultimate-guide-to-data-analytics-47jg</link>
      <guid>https://dev.to/mutanuvivian/the-ultimate-guide-to-data-analytics-47jg</guid>
      <description>&lt;p&gt;Data Analytics is a multifaceted field, encompassing many fascinating branches such as data science, machine learning, data analysis and analytics engineering. However, at the heart of all these areas lies data engineering, that ensures the smooth operation of the entire data ecosystem. &lt;/p&gt;

&lt;p&gt;This article is a guide designed to shed light on the importance of data engineering, giving a clear pathway for those interested.   &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Engineering Pathway&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Data Engineering and Its Importance&lt;/strong&gt;&lt;br&gt;
Data engineering is the practice of building and maintaining complex systems for data collection, storage and analysis of vast amounts of data. Data engineers create systems that ensure data is available, clean and ready for analysis. Their work includes constructing data pipelines to automate data flow, managing data warehouses to organize and store data effectively, and developing data architectures that support robust data processing systems.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core Responsibilities of a Data Engineer&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Designing and Constructing Data Architectures&lt;/em&gt;&lt;/strong&gt;: Developing systems that can handle volume, velocity and variety of the organization's data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Developing Data Pipelines&lt;/em&gt;&lt;/strong&gt;:Pipelines are responsible for moving data from its source to a storage location, where it can be analyzed, efficiently, reliably and securely, minimizing latency and errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Data Storage Management&lt;/em&gt;&lt;/strong&gt;: Choosing the most appropriate storage solutions that balance cost, speed, and scalability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Monitoring System Performance and Reliability&lt;/em&gt;&lt;/strong&gt;: Continuously monitoring the data systems to ensure they perform at optimal. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Quality Control and Data Integrity&lt;/em&gt;&lt;/strong&gt;: They implement validation processes to detect and correct errors in the data, enforce data governance policies, and secure data to protect it from unauthorized access.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Skills Required to Become a Data Engineer&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Technical Skills&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Programming language&lt;/em&gt;&lt;/strong&gt;: Python&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Database Management&lt;/em&gt;&lt;/strong&gt;: SQL, NoSQL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Big Data Technologies&lt;/em&gt;&lt;/strong&gt;: Apache Spark, Hadoop, MapReduce&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Data Warehousing&lt;/em&gt;&lt;/strong&gt;: Amazon Redshift, Google BigQuery, Snowflake&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;ETL Tools&lt;/em&gt;&lt;/strong&gt;: Talend, Informatica PowerCenter, Apache NiFi&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Data Pipeline Tools&lt;/em&gt;&lt;/strong&gt;: Apache Kafka, Apache Airflow&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Containerization&lt;/em&gt;&lt;/strong&gt;: Docker&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Analytical Skills&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Data Modelling&lt;/em&gt;&lt;/strong&gt;: Creating logical and physical data models for optimizing data storage, retrieval, and processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Problem-Solving&lt;/em&gt;&lt;/strong&gt;: Swiftly diagnose the root causes of data pipeline failures, and develop effective solutions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Communication&lt;/em&gt;&lt;/strong&gt;: Effectively communicate complex data-related concepts to non-technical stakeholders across different teams.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Attention to Detail&lt;/em&gt;&lt;/strong&gt;: Precision is crucial to ensure data pipelines are accurately designed and the data handled maintains high quality. This prevents error and data loss.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Project Management&lt;/em&gt;&lt;/strong&gt;: Organizational skills allow for efficient task prioritization, resource allocation, and progress tracking, ensuring timely project completion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Continuous Learning&lt;/em&gt;&lt;/strong&gt;: Adaptability to new technologies.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Data Engineering provides the systems that make data-driven insights possible.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>learning</category>
      <category>career</category>
    </item>
    <item>
      <title>Understanding Your Data: The Essentials of Exploratory Data Analysis</title>
      <dc:creator>Mutanu-Vivian</dc:creator>
      <pubDate>Sun, 11 Aug 2024 20:18:10 +0000</pubDate>
      <link>https://dev.to/mutanuvivian/understanding-your-data-the-essentials-of-exploratory-data-analysis-5gmi</link>
      <guid>https://dev.to/mutanuvivian/understanding-your-data-the-essentials-of-exploratory-data-analysis-5gmi</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Exploratory Data Analysis refers to the critical process of performing initial investigations on data in order to discover patterns, identify outliers, test hypothesis and check assumptions with the help of summary statistics and visualisations.  &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Exploratory Data Analysis (EDA)&lt;/strong&gt; is the process of understanding your data before performing complex analysis, or building models.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The Main checks done on the imported data frame are:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;&lt;em&gt;1. Understanding Your Data&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
This is the process of checking the dataset to establish the datatypes, as well as checking the shape of the data to view the size of the data. These checks allow you to better plan your analysis.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;2. Handling Duplicates&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
This is the process of checking for duplicates within the dataset. This improves the quality of analysis allowing the deletion or modification of the duplicates.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;3. Handling Missing Values&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
This is the process of checking the dataset to establish any omissions or incomplete data entries. The use of the &lt;em&gt;'isnull'&lt;/em&gt; function checks for the null values. For modification, the data is replaced with either the mean, median or whichever measure is best suited for the datatype being replaced. This process ensures that there are no repeated records in the dataset.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;4. Describing the Data&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
This is the process of carrying out &lt;em&gt;Descriptive Statistical Analysis&lt;/em&gt; to inform the kind of data models that can be used for analysis. The &lt;em&gt;'Describe'&lt;/em&gt; function is used to get an overview of the variables. Basic statistical calculations such as Mean, Mode, Variance, Min, Max and Percentiles are done.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;5. Understand Distribution of Data&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
At this point, visualization can be done to establish distribution and relationships between the variables.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Univariate Analysis&lt;/em&gt; (One Variable at a Time): Visualising individual variables. Use histograms for numerical data to see the distribution and bar charts for categorical data to understand frequency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Bivariate Analysis&lt;/em&gt; (Two Variables Together): Explore relationships between two variables using scatter plots for numerical variables and bar charts for categorical and numerical pairings.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Scatter Plots&lt;/em&gt;&lt;/strong&gt; and other visualisations can be used to further explore data to identify patterns or trends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;6. Handling Outliers&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Outliers&lt;/em&gt;&lt;/strong&gt; are individual data points that can be observed outside the average value of group of statistics.&lt;br&gt;
Outliers can be identified by the use of visualisations such as Box Plots.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;7. Exploring Correlation between the Variables&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Correlation&lt;/em&gt;&lt;/strong&gt; analysis is carried out to understand how one variable affects the other. It aids in establishing the extent to which the variables are independent.&lt;br&gt;
The Strength of Correlation can be measured in the following ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;Pearson Correlation Coefficient (PCC)&lt;/em&gt;&lt;/strong&gt;- a measure of linear correlation between two variables.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;Correlation Matrix&lt;/em&gt;&lt;/strong&gt;-displays the correlation values between all variables. A Heatmap can be used to visualize a correlation matrix.  &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This careful examination and cleaning of your data lays the foundation for more accurate and meaningful analysis, helping you to draw better insights from your data.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>programming</category>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>How to Build a Successful Career in Data Science</title>
      <dc:creator>Mutanu-Vivian</dc:creator>
      <pubDate>Sun, 04 Aug 2024 15:59:00 +0000</pubDate>
      <link>https://dev.to/mutanuvivian/how-to-build-a-successful-career-in-data-science-2bgd</link>
      <guid>https://dev.to/mutanuvivian/how-to-build-a-successful-career-in-data-science-2bgd</guid>
      <description>&lt;p&gt;&lt;em&gt;Are you passionate about problem-solving, and turning data into actionable insights? Welcome to the world of Data Science!&lt;/em&gt; &lt;/p&gt;

&lt;h2&gt;
  
  
  So...What is Data Science?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Data Science&lt;/strong&gt; is the study of data to extract meaningful insights. It is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The lifecycle of a Data Science project is broken down as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Data Collection&lt;/em&gt;&lt;/strong&gt;- obtaining data from various sources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Data Preparation&lt;/em&gt;&lt;/strong&gt;- preparing data for analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Exploration &amp;amp; Visualization&lt;/em&gt;&lt;/strong&gt;- identifying patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Experimentation &amp;amp; Prediction&lt;/em&gt;&lt;/strong&gt;- creating models and experiments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Data Storytelling &amp;amp; Communication&lt;/em&gt;&lt;/strong&gt;- providing insights&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Based on the product lifecycle shared above, let's explore the skills required to pivot into a Data Science Career.&lt;/p&gt;

&lt;h2&gt;
  
  
  Essential Skills for Advancing your Data Science Career
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Mathematics&lt;/em&gt;&lt;/strong&gt;- Calculus, Algebra, Probability, Statistics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Programming&lt;/em&gt;&lt;/strong&gt;- SQL, Python and/or R languages&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;em&gt;Machine Learning&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;em&gt;Data Visualization Tools&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Domain Knowledge&lt;/em&gt;&lt;/strong&gt;- Understanding data in a particular field&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Communication&lt;/em&gt;&lt;/strong&gt;- Written and Verbal communication&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With these key skills put into practice, let's get on to Job searching.&lt;/p&gt;

&lt;h2&gt;
  
  
  Top Tips for Job Hunting
&lt;/h2&gt;

&lt;p&gt;As a beginner, landing a job in Data Science may seem daunting. Let's delve into strategies that will make you stand out from the rest.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Practice&lt;/em&gt;&lt;/strong&gt;- Hone your skills through consistent practice. As they say, &lt;em&gt;"practice makes perfect!"&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;em&gt;Find a mentor&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Take up Projects&lt;/em&gt;&lt;/strong&gt;- build a portfolio to showcase your work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Shout about you work&lt;/em&gt;&lt;/strong&gt;- Share your projects online in the different social platforms. This gets you noticed and builds you a strong online presence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Networking&lt;/em&gt;&lt;/strong&gt;- Join Data Science communities and network with like-minded individuals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Curate your CV&lt;/em&gt;&lt;/strong&gt;- take time to research and create a CV that stands out.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With the right skills, a polished portfolio and CV, a positive &lt;em&gt;'can-do'&lt;/em&gt; attitude and strategic job hunting, your on the right path to building a successful career in Data Science!&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>beginners</category>
      <category>careerdevelopment</category>
    </item>
  </channel>
</rss>
