<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lilian Gicheru</title>
    <description>The latest articles on DEV Community by Lilian Gicheru (@liliang).</description>
    <link>https://dev.to/liliang</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1174030%2F00879214-45cf-434a-95da-d51915336e2c.jpg</url>
      <title>DEV Community: Lilian Gicheru</title>
      <link>https://dev.to/liliang</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/liliang"/>
    <language>en</language>
    <item>
      <title>Data Engineering for Beginners: A Step-by-Step Guide</title>
      <dc:creator>Lilian Gicheru</dc:creator>
      <pubDate>Wed, 01 Nov 2023 07:28:50 +0000</pubDate>
      <link>https://dev.to/liliang/data-engineering-for-beginners-a-step-by-step-guide-3ond</link>
      <guid>https://dev.to/liliang/data-engineering-for-beginners-a-step-by-step-guide-3ond</guid>
      <description>&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data Engineers are the architects behind the scenes, constructing the foundations upon which modern businesses thrive. From shaping data pipelines to enabling analytics, they are the unsung heroes for transforming raw data into actionable insights. &lt;/p&gt;

&lt;p&gt;Data engineers usually come from engineering backgrounds. Unlike data scientists, there is not much academic or scientific understanding required for this role. Developers or engineers who are interested in building large scale structures and architectures are ideally suited to thrive in this role.&lt;/p&gt;

&lt;p&gt;If you’ve ever wondered how to become a Data Engineer or are seeking guidance on how to scale your career in this dynamic field, this article presents a comprehensive data engineering roadmap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understand the Basics&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Begin by grasping the fundamental concepts of data engineering. Understand terms like ETL (Extract, Transform, Load), data warehouses, data lakes, and data pipelines. Research different data storage technologies such as databases, cloud storage, and distributed file systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learn different languages.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data engineering often requires programming skills. Start by learning a programming language commonly used in data engineering such as Python or Java. Python is widely used due to its simplicity and rich ecosystem of data processing libraries.&lt;/p&gt;

&lt;p&gt;Coding is one of the mandatory skills for data engineers. A basic understanding of programming language is essential for data engineering roles. There are many programming languages that can be used for data engineering, but Python is one of the best options.&lt;br&gt;
 Another reason to use Python for data engineering tasks is the presence of extensive libraries. These libraries provide a wide range of tools to manipulate, transform and store data effectively. Some of the popular Python libraries that streamline data engineering tasks are as mentioned below:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Pandas:&lt;/strong&gt; It is one of the most versatile Python libraries and is frequently used for data manipulation and analysis. Further, it is used to clean, preprocess and transform raw data for analysis.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;TensorFlow:&lt;/strong&gt; It is one of the popular Python libraries for Artificial intelligence, Machine learning, and deep learning. It can handle data-related tasks like data transformation, data processing, data visualization, and data analytics.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Scikit-learn:&lt;/strong&gt; It is another important Python library that is used for data engineering tasks like regression, classification, and clustering to streamline the building of machine learning models.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Familiarize yourself with databases and learn SQL (Structured Query Language). SQL is essential for working with relational databases, which are commonly used in data engineering. Understand concepts such as tables, joins, and indexes.&lt;/p&gt;

&lt;p&gt;It is one of the most dominant languages for data operations. Learning SQL helps data engineers easily collaborate with data scientists and analysts as they can have a common language for querying. SQL skills can be applied to a wide range of data management tools making it an important skill for people working in a variety of businesses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explore Big Data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This Big Data tutorial will help you understand why it's important to learn Big Data. The name Big Data itself represents a huge amount of data.&lt;br&gt;
Big Data comes to the rescue in such cases:&lt;/p&gt;

&lt;p&gt;• Big Data allows organizations to detect trends and find the hidden patterns from a significant number of data, which can be used in the future as a benefit.&lt;br&gt;&lt;br&gt;
• From a job perspective, Big Data professionals are in more demand because of the rapid incrimination of data.&lt;br&gt;
• More importantly, at this moment, the demand is much more than the supply, which causes a significant increase in salaries for the professionals who have the required skills to solve these problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clean and Transform Data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data is rarely clean and structured. You will need to clean, preprocess, and transform the data to ensure it's consistent and ready for analysis. Tools like Python and libraries like pandas are commonly used for data cleaning and transformation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Modeling&lt;/strong&gt;**&lt;/p&gt;

&lt;p&gt;Understand data modeling techniques. Learn about dimensional modeling and data normalization. This step is crucial for designing databases that are efficient and optimized for querying. Data modeling involves designing the structure of databases and data systems. Learn about different data modeling techniques such as relational modeling, dimensional modeling, and schema design. Understand concepts such as entities, attributes, relationships, and normalization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Building Projects&lt;/strong&gt;&lt;br&gt;
So, till now you must have learned all the important concepts required to become a successful data engineer. Now it is the time to apply this knowledge and skill to test yourself. The last and final step in becoming a successful data engineer. The project must hit all these domain data warehousing, data analytics, big data tools, and data pipelines. Some of the common projects that you can build to gain some practical insights and enhance your resume are Smart IoT infrastructure, event data analysis, data visualization, and data aggression.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stay Updated&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data engineering is a rapidly evolving field, so it's important to stay updated with the latest trends, technologies, and best practices. Follow industry blogs, attend webinars or conferences, and join online communities or forums to stay connected with other data engineers.&lt;/p&gt;

&lt;p&gt;The field of data engineering is continually evolving. Stay updated with the latest technologies and best practices. Engage with the data engineering community through blogs, forums, and conferences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Role and Responsibilities of a Data Engineer&lt;/strong&gt;&lt;br&gt;
• Prepare, handle, and supervise efficient data pipeline architectures.&lt;/p&gt;

&lt;p&gt;• Build and deploy ETL/ELT data pipelines that can begin with data ingestion and complete various data-related tasks.&lt;/p&gt;

&lt;p&gt;• Handle and source data from different sources according to business requirements.&lt;/p&gt;

&lt;p&gt;• Work in teams to create algorithms for data storage, data collection, data accessibility, data quality checks, and, preferably, data analytics.&lt;/p&gt;

&lt;p&gt;• Connect with data scientists and create the infrastructure required to identify, design, and deploy internal process improvements.&lt;/p&gt;

&lt;p&gt;• Access various data resources with the help of tools like SQL and Big Data technologies for building efficient ETL data pipelines.&lt;/p&gt;

&lt;p&gt;• Experience with tools like Snowflake is considered a bonus.&lt;/p&gt;

&lt;p&gt;• Build solutions highlighting data quality, operational efficiency, and other feature describing data.&lt;/p&gt;

&lt;p&gt;• Create scripts and solutions to transfer data across different spaces.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>The Complete Guide to Time Series Models</title>
      <dc:creator>Lilian Gicheru</dc:creator>
      <pubDate>Fri, 27 Oct 2023 15:21:28 +0000</pubDate>
      <link>https://dev.to/liliang/the-complete-guide-to-time-series-models-1k9p</link>
      <guid>https://dev.to/liliang/the-complete-guide-to-time-series-models-1k9p</guid>
      <description>&lt;p&gt;Time series models are a category of statistical and machine learning models that are used to analyze and make predictions based on data that is collected or recorded over time. As the name suggests, it involves working on time (years, days, hours, minutes) based data, to derive hidden insights to make informed decision making. It has a natural temporal ordering, making it unique compared to cross-sectional data. Characteristics include trend, seasonality, and autocorrelation. The models are widely employed in various domains, including finance, economics, climate science, and more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is a Time Series?&lt;/strong&gt;&lt;br&gt;
A time series is a sequence of data points collected at regular time intervals. Time series assist in providing insights and predictions to inform decision-making. Examples of time series includes stationary series, random walks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Components of Time Series&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Time series data consists of the following components:&lt;br&gt;
&lt;strong&gt;Trend:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the general tendency of data to grow or decline over a long period of time that is the long-term or downward movement in data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Seasonality:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Seasonality is characterized by repetitive patterns or cycles at fixed intervals. It occurs due to rhythmic forces which occur in a regular &amp;amp; periodic manner.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cyclical Variations:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These are movements in a time series that are not attributed to a regular movement. There is no fixed interval, uncertainty in movement and its pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Irregular Variations:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These are unexpected situations/events/scenarios and spikes in a short time span.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Types of Time Series Models&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Autoregressive (AR) model&lt;/strong&gt;. AR models use the previous values of the time series to predict the current value. For example, an AR model for daily stock prices might use the closing prices from the previous day, the previous week, and the previous month to predict the closing price for today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Moving average (MA) model.&lt;/strong&gt; MA models use the previous errors to predict the current value. For example, an MA model for daily stock prices might use the errors from the previous day, the previous week, and the previous month to predict the error for today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Autoregressive integrated moving average (ARIMA) model.&lt;/strong&gt; ARIMA models are used to model non-stationary time series data, which is data that has a trend or seasonality.&lt;br&gt;
 Seasonal autoregressive integrated moving average (SARIMA) model. SARIMA models are like ARIMA models, but they also account for seasonality in the data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Time series models are powerful tools for analyzing and forecasting time-ordered data. Selecting the right model and understanding the components of the data, are critical for accurate predictions. With the appropriate model and evaluation techniques, you can make informed decisions based on historical data trends and patterns.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>The Complete Guide to Time Series Models</title>
      <dc:creator>Lilian Gicheru</dc:creator>
      <pubDate>Fri, 27 Oct 2023 15:21:24 +0000</pubDate>
      <link>https://dev.to/liliang/the-complete-guide-to-time-series-models-4jfg</link>
      <guid>https://dev.to/liliang/the-complete-guide-to-time-series-models-4jfg</guid>
      <description>&lt;p&gt;Time series models are a category of statistical and machine learning models that are used to analyze and make predictions based on data that is collected or recorded over time. As the name suggests, it involves working on time (years, days, hours, minutes) based data, to derive hidden insights to make informed decision making. It has a natural temporal ordering, making it unique compared to cross-sectional data. Characteristics include trend, seasonality, and autocorrelation. The models are widely employed in various domains, including finance, economics, climate science, and more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is a Time Series?&lt;/strong&gt;&lt;br&gt;
A time series is a sequence of data points collected at regular time intervals. Time series assist in providing insights and predictions to inform decision-making. Examples of time series includes stationary series, random walks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Components of Time Series&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Time series data consists of the following components:&lt;br&gt;
&lt;strong&gt;Trend:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the general tendency of data to grow or decline over a long period of time that is the long-term or downward movement in data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Seasonality:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Seasonality is characterized by repetitive patterns or cycles at fixed intervals. It occurs due to rhythmic forces which occur in a regular &amp;amp; periodic manner.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cyclical Variations:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These are movements in a time series that are not attributed to a regular movement. There is no fixed interval, uncertainty in movement and its pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Irregular Variations:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These are unexpected situations/events/scenarios and spikes in a short time span.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Types of Time Series Models&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Autoregressive (AR) model&lt;/strong&gt;. AR models use the previous values of the time series to predict the current value. For example, an AR model for daily stock prices might use the closing prices from the previous day, the previous week, and the previous month to predict the closing price for today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Moving average (MA) model.&lt;/strong&gt; MA models use the previous errors to predict the current value. For example, an MA model for daily stock prices might use the errors from the previous day, the previous week, and the previous month to predict the error for today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Autoregressive integrated moving average (ARIMA) model.&lt;/strong&gt; ARIMA models are used to model non-stationary time series data, which is data that has a trend or seasonality.&lt;br&gt;
 Seasonal autoregressive integrated moving average (SARIMA) model. SARIMA models are like ARIMA models, but they also account for seasonality in the data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Time series models are powerful tools for analyzing and forecasting time-ordered data. Selecting the right model and understanding the components of the data, are critical for accurate predictions. With the appropriate model and evaluation techniques, you can make informed decisions based on historical data trends and patterns.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Exploratory-Data-Analysis-using-Data-Visualization-Techniques.</title>
      <dc:creator>Lilian Gicheru</dc:creator>
      <pubDate>Tue, 17 Oct 2023 04:51:01 +0000</pubDate>
      <link>https://dev.to/liliang/exploratory-data-analysis-using-data-visualization-techniques-41bn</link>
      <guid>https://dev.to/liliang/exploratory-data-analysis-using-data-visualization-techniques-41bn</guid>
      <description>&lt;p&gt;&lt;strong&gt;Exploratory Data Analysis (EDA)&lt;/strong&gt; is a key step in the data analysis process. Data visualization techniques bring Data to life, by making it accessible and understandable-and especially to people not adept to the tech world and help to understand the characteristics of your dataset.&lt;/p&gt;

&lt;p&gt;Data visualizations, either through shapes, graphs, patterns or colors, provide a better way in which we can perceive information.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Visualization techniques using python.&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Histograms-&lt;/strong&gt;  Visualize the distribution of a numerical variable, illustrate the distribution of data, making it easy to spot trends and outliers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;See below example.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;randn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="c1"&gt;### use your dataset
&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'green'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;edgecolor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'black'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Histogram Example'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Value'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Frequency'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Box Plots-&lt;/strong&gt;  This technique Show the summary statistics of a numerical variable, including the median, quartiles, and outlier, allowing you to grasp data variability from a glance.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;Below is an example. *&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;dataSet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;normal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataSet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;figure&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;boxplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataSet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;** Bar charts- ** Bar charts are an excellent way for comparing categorical data. enabling you to easily identify which categories are most prevalent and identify trends or animalities.&lt;br&gt;
&lt;strong&gt;Below is an example.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="n"&gt;categories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;' A'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'B'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;' C'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;' D'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'skyblue'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Bar Chart Example'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Category'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Value'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Scatter Plots:&lt;/strong&gt;&lt;br&gt;
Visualize the relationship between two numerical variables.&lt;br&gt;
&lt;strong&gt;Below is an example.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;

&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"purple"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;marker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"o"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;" Scatter Plot"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;###Insert your title
&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"X variable"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;###the x variable
&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Y variable"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;##y-axis label
&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Line and making each point with a circle.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;See example below.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;


&lt;span class="n"&gt;ypoints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;


&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ypoints&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;marker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'o'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Exploratory data analysis is an iterative process, and the choice of visualization techniques depends on your data and objectives. You can combine multiple techniques to gain a deeper understanding of your data and uncover important patterns and insights.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Data Science for Beginners: 2023 - 2024 Complete Roadmap</title>
      <dc:creator>Lilian Gicheru</dc:creator>
      <pubDate>Sun, 01 Oct 2023 12:24:39 +0000</pubDate>
      <link>https://dev.to/liliang/data-science-for-beginners-2023-2024-complete-roadmap-163c</link>
      <guid>https://dev.to/liliang/data-science-for-beginners-2023-2024-complete-roadmap-163c</guid>
      <description>&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;br&gt;
I will do through research on expectations of data science learning. I will join Online Data Science communities. I have already joined LUV Tech Academy to help me navigate this. &lt;br&gt;
I will begin with basic mathematics which will include Probability and Statistics as I believe they are applicable in data science. I will then learn Programming languages applicable in data science. E.g. Python, SQL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learning Mathematics&lt;/strong&gt;&lt;br&gt;
I will learn mathematics concepts especially in Statistics, Probabilities, algebra etc. I will use online materials to learn this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learn Programming Language – SQL &amp;amp; Python&lt;/strong&gt;&lt;br&gt;
I will use online learning sites such as W3 School, Udacity, Udemy, Coursera etc to learn SQL and Python. I will also use youtube to enable me to navigate through the learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Manipulation and Analysis&lt;/strong&gt;&lt;br&gt;
I plan to learn data manipulation and analysis through MySQL and Anaconda (Jupiter Notebook) for SQl and Python. I will supplement my knowledge through Youtube Content and also following-up with tutors in Luv Tech Academy whenever I get stuck. &lt;br&gt;
&lt;strong&gt;Data Cleaning&lt;/strong&gt;&lt;br&gt;
I understand the importance of data cleaning as preparation for data analysis. I would want to learn how to clean data using different tools that I will research through the internet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Visualization&lt;/strong&gt;&lt;br&gt;
I intend to learn on data visualization using different tools like Power BI and Matplotlib and other libraries from online sources.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Projects&lt;/strong&gt;&lt;br&gt;
I will undertake to do real-world project using the Data-camp materials to help me apply learnt knowledge. I believe this will be a good preparation for marketplace. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upskilling&lt;/strong&gt;&lt;br&gt;
I understand that becoming a data scientist is a continuous journey that requires consistent learning. I will purpose to enrol for online courses e.g. in Udemy, Coursera etc to enable me practice more and get certifications. I also intend to join Online Data Science Communities for networking and assistance in course of my career. &lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
