<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Samuel Musyoki</title>
    <description>The latest articles on DEV Community by Samuel Musyoki (@sammie_musyoki).</description>
    <link>https://dev.to/sammie_musyoki</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1169981%2Fcb13d47d-e2d0-46c0-88a2-05e048686cb8.jpg</url>
      <title>DEV Community: Samuel Musyoki</title>
      <link>https://dev.to/sammie_musyoki</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sammie_musyoki"/>
    <language>en</language>
    <item>
      <title>Exploratory Data Analysis Using Data Visualization Techniques 📊.</title>
      <dc:creator>Samuel Musyoki</dc:creator>
      <pubDate>Sun, 15 Oct 2023 18:10:30 +0000</pubDate>
      <link>https://dev.to/sammie_musyoki/exploratory-data-analysis-using-data-visualization-techniques--3ocm</link>
      <guid>https://dev.to/sammie_musyoki/exploratory-data-analysis-using-data-visualization-techniques--3ocm</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmoneycompass.com.my%2Fwp-content%2Fuploads%2Fbusiness_advantages_of_data_analysis.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmoneycompass.com.my%2Fwp-content%2Fuploads%2Fbusiness_advantages_of_data_analysis.jpg" alt="Data Visualization" width="800" height="400"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;Are you intrigued by the fascinating world of Data Science and eager to embark on a journey to unravel the hidden insights within data? If so, you've landed on the right path. &lt;strong&gt;Exploratory Data Analysis (EDA)&lt;/strong&gt; is a critical phase in the data analysis process that involves the initial investigation of a dataset to summarize its main characteristics, often with the help of data visualization techniques.&lt;br&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;EDA&lt;/strong&gt; is like peeling the layers of an onion to reveal the hidden insights within the data. This article will guide you through the exciting realm of Exploratory Data Analysis (EDA) using Data Visualization Techniques, an essential step in the data science process.&lt;br&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Exploratory Data Analysis&lt;/strong&gt; is a crucial step in the data analysis process. It allows data analysts and scientists to get a feel for the data, understand its characteristics, and generate hypotheses. Data visualization techniques are the tools that make EDA effective, providing insights that might otherwise remain hidden. In a data-driven world, mastering EDA is essential for making informed decisions and extracting valuable insights from your data&lt;/p&gt;

&lt;p&gt;Data scientists serve as the bridge between raw, unprocessed data and valuable business insights. They have the unique skill set required to manipulate vast and seemingly meaningless datasets, extracting meaningful patterns and trends. This analysis, in turn, plays a crucial role in driving modern economies and assisting governments and organizations in addressing contemporary issues.&lt;br&gt;&lt;br&gt;
 Data visualization techniques lie at the heart of this endeavor, helping data scientists and analysts make sense of the data and extract meaningful insights.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Understanding Exploratory Data Analysis&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.ibm.com/topics/exploratory-data-analysis#:~:text=Exploratory%20data%20analysis%20(EDA)%20is,often%20employing%20data%20visualization%20methods." rel="noopener noreferrer"&gt;Exploratory Data Analysis&lt;/a&gt;&lt;/strong&gt;, introduced by statistician John Tukey in the 1970s, is all about making sense of data without jumping to conclusions. It involves systematically examining data sets, summarizing their main characteristics, and creating visualizations to help understand the data's structure, patterns, and anomalies.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Data visualization&lt;/strong&gt; is at the heart of EDA. It's the process of representing data graphically to uncover patterns, trends, and anomalies. Here are some essential data visualization techniques frequently used in EDA&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The EDA Process&lt;/strong&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Data Collection:&lt;/strong&gt; The EDA process begins with data collection. It's essential to gather high-quality, clean data for meaningful analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Cleaning:&lt;/strong&gt; This step involves handling missing values, outliers, and inconsistencies in the data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Univariate Analysis:&lt;/strong&gt; In this stage, each variable is analyzed individually. This includes creating histograms, box plots, and summary statistics to understand their distribution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bivariate Analysis:&lt;/strong&gt; Bivariate analysis explores relationships between pairs of variables. Scatter plots and correlation matrices are commonly used in this phase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multivariate Analysis:&lt;/strong&gt; Multivariate analysis extends the exploration to multiple variables simultaneously. Techniques like heatmaps can be helpful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anomaly Detection:&lt;/strong&gt; EDA often involves identifying and addressing outliers and anomalies in the data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Data Visualiation techniques📉.&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Data visualization involves creating graphical representations of data, making it easier for humans to understand and interpret. Here are some essential data visualization techniques and their roles in EDA:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scatter Plots&lt;/strong&gt;&lt;br&gt;
Scatter plots are effective for visualizing the relationship between two continuous variables. They help identify patterns such as clusters, outliers, and trends. For instance, scatter plots can reveal whether there's a correlation between a person's age and income.&lt;br&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37voa11yctw9fucwxwgz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F37voa11yctw9fucwxwgz.png" alt="Scatter Plots" width="336" height="232"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Histograms and Density Plots&lt;/strong&gt;&lt;br&gt;
Histograms provide a visual representation of the distribution of a single variable. They can indicate whether the data follows a normal distribution or if it's skewed. Density plots offer a smoothed version of histograms, making it easier to see underlying patterns.&lt;br&gt; &lt;br&gt; &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivcawlsli4kd9vmu43hz.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivcawlsli4kd9vmu43hz.jpg" alt="Histogram" width="706" height="420"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Box Plots&lt;/strong&gt;&lt;br&gt;
Box plots display the distribution of a dataset, showing the median, quartiles, and potential outliers. They are excellent for comparing distributions between different groups or categories. For instance, box plots can help you compare the salaries of employees in different departments of a company.&lt;br&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5se7c3b6ztv5f02zg4ki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5se7c3b6ztv5f02zg4ki.png" alt="Box Plots" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Heatmaps&lt;/strong&gt;&lt;br&gt;
Heatmaps are valuable for exploring relationships between multiple variables. They visualize the correlation between variables in a matrix form, making it evident which variables are strongly related and which are not.&lt;br&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69vpk9b1fasx643kspip.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69vpk9b1fasx643kspip.png" alt="Heatmaps" width="800" height="1012"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Time Series Plots&lt;/strong&gt;&lt;br&gt;
Time series plots are ideal for visualizing data collected over time, such as stock prices, temperature, or website traffic. They help in identifying trends, seasonality, and anomalies.&lt;br&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1jc7zbe3dg8z3gocwc8o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1jc7zbe3dg8z3gocwc8o.png" alt="Time Series" width="757" height="373"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bar Charts&lt;/strong&gt;&lt;br&gt;
Bar charts are useful for displaying categorical data. They're often used for comparing the frequencies or proportions of different categories. For instance, a bar chart can illustrate the market share of different smartphone brands.&lt;br&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feyi9qry58rjgtrjbwqjs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feyi9qry58rjgtrjbwqjs.png" alt="Time Series" width="800" height="568"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Data Visualization: Bringing Data to Life📈&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Data visualization is a pivotal aspect of data science. After performing various data operations, the ability to convey insights through visualizations is essential for effective communication. Here are some valuable resources to help you master this skill:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.tutorialspoint.com/tableau/index.htm" rel="noopener noreferrer"&gt;Tableau&lt;/a&gt;&lt;/strong&gt;: Tableau is a powerful data visualization tool that allows you to create visually appealing and easy-to-understand charts, graphs, and dashboards. Its user-friendly interface makes it an ideal choice for data analysts and scientists.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://d3js.org/" rel="noopener noreferrer"&gt;D3.js&lt;/a&gt;&lt;/strong&gt;: D3.js is a JavaScript library that empowers you to create interactive and dynamic data visualizations. It's a popular choice for crafting custom and captivating visualizations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://seaborn.pydata.org/" rel="noopener noreferrer"&gt;Seaborn&lt;/a&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;a href="https://plotly.com/" rel="noopener noreferrer"&gt;Plotly&lt;/a&gt;&lt;/strong&gt;: These Python libraries are handy for creating engaging and informative visualizations. Seaborn is known for its beautiful statistical plots, while Plotly enables you to build interactive charts.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://powerbi.microsoft.com/en-us/" rel="noopener noreferrer"&gt;Power BI&lt;/a&gt;&lt;/strong&gt; Power BI, short for Power Business Intelligence, is a robust business analytics service and data visualization tool developed by Microsoft. It empowers organizations and individuals to analyze data, share insights, and make data-driven decisions.&lt;br&gt;&lt;br&gt;
Power BI is a suite of software services, applications, and connectors that work together to transform raw data into visually appealing and interactive reports and dashboards.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Building Your Data Science Portfolio&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;As you progress in your data science journey, start building your portfolio. Creating projects and writing articles about your data analysis experiences will set you apart. Consider using platforms like &lt;strong&gt;&lt;a href="https://product.hubspot.com/blog/git-and-github-tutorial-for-beginners" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt; to showcase your work. &lt;strong&gt;&lt;a href="https://www.kaggle.com/" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt;&lt;/strong&gt; is another valuable resource, providing access to extensive datasets and a community of fellow data scientists.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>luxdev</category>
      <category>beginners</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Data Science for Beginners📊: 2023-2024 Complete Roadmap✅</title>
      <dc:creator>Samuel Musyoki</dc:creator>
      <pubDate>Mon, 02 Oct 2023 21:50:47 +0000</pubDate>
      <link>https://dev.to/sammie_musyoki/data-science-for-beginners-2023-2024-complete-roadmap-1bgc</link>
      <guid>https://dev.to/sammie_musyoki/data-science-for-beginners-2023-2024-complete-roadmap-1bgc</guid>
      <description>&lt;p&gt;Are you new to &lt;strong&gt;Data Science?&lt;/strong&gt; Do you want to build a career along the field of data? This article is certainly for you.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwf0cha7tr2zj3zb4ti5a.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwf0cha7tr2zj3zb4ti5a.jpg" alt="Data Science" width="800" height="250"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;" &lt;strong&gt;Data Science&lt;/strong&gt; can be described as a field of study that involves manipulating and analyzing raw data or complex data sets using statistical computing methods and machine learning techniques to draw insights in order to make data-driven decisions."&lt;br&gt;&lt;br&gt;
&lt;/p&gt;
  
&lt;/blockquote&gt;

&lt;p&gt;Data Science is one of the fast growing fields of the 21st Century. Data science has rapidly evolved in the recent era due to increased amounts of data in big companies and the corporate world which must be analyzed in order to provide data-driven business decisions.&lt;br&gt;&lt;br&gt;
The field is immensely lucrative with a good pay even for the entry level personnel. Individuals who work in this field must be able to work with Big Data in order to provide solutions to the ever changing technology and the business sector.&lt;br&gt;
&lt;br&gt;&lt;br&gt;
The increasing value for data scientists lies within the endless need of businesses to harness large amounts of data in order to come up with a viable present and future solutions. Therefore, data science provides the conduit between raw data and business insights. It allows for the manipulation of large raw meaningless data stored in databases to extract meaningful value.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;p&gt;Data science is therefore one of the key drivers of today's economies as most of the steps being taken by government's and organizations to combat modern problems rely on keen analysis of data. Data Science can therefore be used to predict the future occurrences and trends. &lt;/p&gt;

&lt;p&gt;However, for one to really have a thriving career in this field, it must be instigated by passion for problem-solving rather than the money motive. One must be able to commit themselves into learning. There are numerous online courses, tutorials and YouTube videos where one can be able gain enough knowledge on data science.&lt;br&gt;&lt;br&gt;
There are also various bootcamps where one can kickstart their their science career. For in instance, bootcamps organized by &lt;strong&gt;&lt;a href="https://www.youtube.com/results?search_query=lux+academy" rel="noopener noreferrer"&gt;Lux_Academy&lt;/a&gt;&lt;/strong&gt; would be a good starting point   &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Data Science Essentials📚&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Any beginner wishing to venture into the field of data science must first equip themselves with the elementary building concepts that are inexpendable as far as data is concerned.  &lt;/p&gt;

&lt;p&gt;Data Science is an increasingly dynamic field, therefore ensuring a progressive learning approach for a beginner is highly recommended. Data Science is a multidisciplinary sector made up of a triplet domain:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;Mathematics&lt;/li&gt;
&lt;li&gt;Statistics&lt;/li&gt;
&lt;li&gt;Programming
&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is important to note that these three domains are correlated and intertwined. Therefore understanding the three and having a concrete knowledge is very vital. &lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Mathematics&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Mathematics in data science helps one to choose appropriate procedures and diagnose problems&lt;br&gt;
appropriately. It is one of the core building blocks of data science due to the fact that data comes in unusual formats. Some key components of maths one should pay close attention to include:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;a) &lt;a href="https://www.w3schools.com/ai/ai_probability.asp" rel="noopener noreferrer"&gt;Probability&lt;/a&gt;&lt;br&gt;&lt;br&gt;
b) &lt;a href="https://www.w3schools.com/ai/ai_statistics.asp" rel="noopener noreferrer"&gt;Calculus&lt;/a&gt;&lt;br&gt;&lt;br&gt;
c) &lt;a href="https://www.w3schools.com/ai/ai_algebra.asp" rel="noopener noreferrer"&gt;Linear Algebra&lt;/a&gt; &lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  a) Probability
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.javatpoint.com/probability" rel="noopener noreferrer"&gt;Probability&lt;/a&gt;&lt;/strong&gt; is the likelihood of an occurrence of an event. Data science uses the various types of distributions such as &lt;strong&gt;normal distribution, Bernoulli distribution* and *uniform distribution&lt;/strong&gt; to predict the likelihood of an occurrence.&lt;/p&gt;

&lt;h3&gt;
  
  
  b) Calculus
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.khanacademy.org/math/calculus-1" rel="noopener noreferrer"&gt;Calculus&lt;/a&gt;&lt;/strong&gt; in mathematics can be simply defined as the study of instantaneous or continuous the rate of change. &lt;strong&gt;Optimization&lt;/strong&gt; and &lt;strong&gt;integration&lt;/strong&gt; are important areas in Calculus that are very key in data science.&lt;/p&gt;

&lt;h3&gt;
  
  
  c) Linear Algebra
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.khanacademy.org/math/linear-algebra" rel="noopener noreferrer"&gt;Linear Algebra&lt;/a&gt;&lt;/strong&gt; is a mathematical field concerned with linear equations. &lt;strong&gt;Vectors&lt;/strong&gt; represent data points while &lt;strong&gt;scalars&lt;/strong&gt; represent numerical values in data science. However, there are a whole lot of Linear Algebra such as matrices, mappings and many more which are covered in Data Science.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Statistics&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Statistical concepts such as hypothesis testing, p-values, mean and average are key to data science. &lt;strong&gt;Descriptive statistics&lt;/strong&gt; helps one to describe the characteristics of a dataset, making it easier for one to understand and interpret data while &lt;strong&gt;inferential Statistics&lt;/strong&gt; is used for making estimates about a population and hypothesis testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Programming&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;I would be obliged to say that programming is the most important cornerstone concept of data science.&lt;br&gt;&lt;br&gt;
Whichever the programming language of choice, you must be able to have a solid knowledge of the language's variables, loops, functions and datatypes.&lt;br&gt;&lt;br&gt;
The ability to work with databases relies on a strong background of programming. There has always been a debate about which language should one learn or which language should precede the other. Is it &lt;strong&gt;Python, R or SQL?&lt;/strong&gt;&lt;br&gt;
There are various languages to work with various types of datasets and databases but i will only outline the major languages that cut across the field. Different companies also prefer different languages which lays responsibility for one to be conversant with the key languages.&lt;/p&gt;

&lt;h3&gt;
  
  
  - &lt;a href="https://www.python.org/about/gettingstarted/" rel="noopener noreferrer"&gt;Python&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Python is essential in data science due to it's flexibility and the availability of its huge libraries which enables data scientists to easily work with data. Python's syntax is also easy to grasp. The key Python libraries for Data Science include &lt;strong&gt;&lt;a href="https://pandas.pydata.org/" rel="noopener noreferrer"&gt;Pandas&lt;/a&gt;, &lt;a href="https://matplotlib.org/" rel="noopener noreferrer"&gt;Matplotlib&lt;/a&gt; and &lt;a href="https://numpy.org/" rel="noopener noreferrer"&gt;NumPy&lt;/a&gt;&lt;/strong&gt;. Python consists of many libraries in addition to the three.&lt;br&gt;&lt;br&gt;
You should be able to understand basic python like its syntax and various data structures such as lists, tuples and lists.  &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;&lt;a href="https://www.geeksforgeeks.org/r-programming-for-data-science/" rel="noopener noreferrer"&gt;R&lt;/a&gt;&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;R is an open-source language which provides a wide range of statistical and graphical techniques for exploring, analyzing, and visualizing data. R offers powerful data manipulation and transformation capabilities, allowing users to clean, reshape, and prepare data for analysis.&lt;br&gt;&lt;br&gt;
The &lt;strong&gt;"dplyr" and "tidyr"&lt;/strong&gt; packages, for example, are popular tools for data wrangling.  R provides a variety of machine learning libraries and packages e.g. &lt;strong&gt;caret, randomForest, xgboost&lt;/strong&gt; that allow data scientists to build and evaluate predictive models.&lt;/p&gt;

&lt;h3&gt;
  
  
  - &lt;a href="https://www.w3schools.com/sql/" rel="noopener noreferrer"&gt;SQL&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;SQL&lt;/strong&gt; or &lt;strong&gt;&lt;a href="https://www.w3schools.com/sql/#gsc.tab=0" rel="noopener noreferrer"&gt;Structured Query Language&lt;/a&gt;&lt;/strong&gt; is a query language that is used for interacting with relational databases.&lt;br&gt;&lt;br&gt;
You may be asking yourself now &lt;strong&gt;what is a database or what is to query?&lt;/strong&gt;&lt;br&gt;
A &lt;strong&gt;database&lt;/strong&gt; is just a collection or related data which are organized in tables and columns. A phonebook for instance is a database because it has a collection of all your contacts which have similar characteristics like the phone number and a name, very simple.&lt;br&gt;&lt;br&gt;
To &lt;strong&gt;query&lt;/strong&gt; is to simply issue a request or a command to the database for a specific information. Therefore, &lt;strong&gt;SQL&lt;/strong&gt; is used to interact with databases through a &lt;strong&gt;&lt;a href="https://www.tutorialspoint.com/dbms/index.htm" rel="noopener noreferrer"&gt;Database Management System (RDMS)&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The main purpose of a RDMS is to perform the &lt;strong&gt;CRUD&lt;/strong&gt; operations &lt;strong&gt;(Create, Read/Retrieve, Update and Delete)&lt;/strong&gt; data on a database.&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Machine Learning
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.simplilearn.com/tutorials/machine-learning-tutorial" rel="noopener noreferrer"&gt;Machine learning&lt;/a&gt;&lt;/strong&gt; is one of the fields of Artificial Intelligence which primarily involves development of models and algorithms that enable machines to learn, adapt and communicate with each other.&lt;br&gt;&lt;br&gt;
In the field of data science, machine learning is a valuable tool that enables extraction of patterns and insights from datasets. There are various types of machine learning such as &lt;strong&gt;supervised, unsupervised and deep learning.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Visualization
&lt;/h3&gt;

&lt;p&gt;After performing the various operations on data, effective visualization is essential in communicating the insights.  Various data visualization tools such as &lt;strong&gt;&lt;a href="https://www.tutorialspoint.com/tableau/index.htm" rel="noopener noreferrer"&gt;Tableau&lt;/a&gt;&lt;/strong&gt; enable one to provide data visualizations that are easily understood by everyone.&lt;br&gt;
 One must also be able to present the insights in a fluent and appealing way to the respective audience. Therefore, data science does not entirely rely on the technical skills of working with data.&lt;br&gt;&lt;br&gt;
Communication skills is key for a data scientist as insights must be relayed with utmost fluency.&lt;br&gt;&lt;br&gt;
&lt;br&gt; &lt;br&gt;
Finally, it is very crucial for one to build up their &lt;strong&gt;portifolio&lt;/strong&gt; while learning. One should commit themselves to building projects and writing articles about data science. These articles and projects go a long way on increasing your probability of securing a career in a data related field.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;&lt;a href="https://product.hubspot.com/blog/git-and-github-tutorial-for-beginners" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;/strong&gt; is a good starting point where you can create an account and include all your articles and projects in repositories. One can also use &lt;strong&gt;&lt;a href="https://www.kaggle.com/" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt;&lt;/strong&gt; where you can interact with fellow data scientists and get access to large amounts of data sets.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>datascience</category>
      <category>luxdatanerds</category>
      <category>luxdev</category>
    </item>
  </channel>
</rss>
