<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Esther Njihia</title>
    <description>The latest articles on DEV Community by Esther Njihia (@esther_njihia).</description>
    <link>https://dev.to/esther_njihia</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F815150%2F8788c6d0-cb4d-4e01-a9ee-826c4c7a532a.png</url>
      <title>DEV Community: Esther Njihia</title>
      <link>https://dev.to/esther_njihia</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/esther_njihia"/>
    <language>en</language>
    <item>
      <title>Analysis of 120 Years of Olympics Data</title>
      <dc:creator>Esther Njihia</dc:creator>
      <pubDate>Sun, 22 Jan 2023 06:12:18 +0000</pubDate>
      <link>https://dev.to/esther_njihia/analysis-of-120-years-of-olympics-data-13jl</link>
      <guid>https://dev.to/esther_njihia/analysis-of-120-years-of-olympics-data-13jl</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;The Olympic Games are one of the most prestigious and widely-followed sporting events in the world. With a history dating back over a century, the Olympics have seen countless athletes from all corners of the globe compete for glory and honor. In this project, we will use Python, SQL and Tableau to analyze 120 years of Olympics data sourced from Kaggle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Collection
&lt;/h2&gt;

&lt;p&gt;The data for this analysis was sourced from Kaggle and includes information on athletes, teams, events, and medals. The data was collected for all Summer and Winter Olympics from 1896 to 2016.&lt;br&gt;
Data source: &lt;a href="https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Data Cleaning and Preparation
&lt;/h2&gt;

&lt;p&gt;Before we could begin our analysis, the data needed to be cleaned and prepared for use. This involved removing any duplicate or irrelevant data, correcting any errors or inconsistencies, and formatting the data in a way that would be compatible with our analysis tools.&lt;/p&gt;
&lt;h2&gt;
  
  
  Data Cleaning with Python
&lt;/h2&gt;

&lt;p&gt;Load data&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data/athlete_events.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Calculate the number of missing values in the data&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isnull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We got some missing values with the age, height and weight variables. How about we fix that.&lt;br&gt;
Fill in the age missing values with its mean.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Age&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repeat the same for the height and weight variables.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQL Analysis
&lt;/h2&gt;

&lt;p&gt;Once the data was cleaned and prepared, we used SQL to extract and analyze the data. We were able to answer questions such as:&lt;br&gt;
1.What is the Gender distribution of athletes across different sports and event?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;Sport&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Sex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;athlete_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Athlete&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;Sport&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Sex&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;athlete_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2.Which city and season have the most successful athletes?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;City&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Season&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Medal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;Medal_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Athlete&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;Medal&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;City&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Season&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;Medal_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;3.Which athletes are the most successful in terms of number of medals?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Medal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;medal_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Athlete&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;medal_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;4.Trend analysis to identify any patterns in the number of medals over time&lt;br&gt;
By year&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nb"&gt;Year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Medal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;medal_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Athlete&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Year&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;Year&lt;/span&gt; &lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;5.Use statistical analysis to find factors associated with winning more medals.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;first create a sub query to find the total count of medals per city and season
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;city_season_medals&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;City&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Season&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Medal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;medal_count&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Athlete&lt;/span&gt;
    &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;City&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Season&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;use statistical methods to find the relationship between city &amp;amp; season and the total number of medals. Fo this instance I used z-score. The higher the z-score the higher the correlation.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;City&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Season&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;medal_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;medal_count&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;medal_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;STDEV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;medal_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;z_score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;city_season_medals&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;z_score&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are some of the questions that were answered during the SQL analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualization with Tableau
&lt;/h2&gt;

&lt;p&gt;To make the data more accessible and understandable, Tableau was used to create visualizations of our findings. Some of the visualizations we created include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A pie chart showing the gender distribution across all sports&lt;/li&gt;
&lt;li&gt;A bar chart showing the most successful athletes over time.&lt;/li&gt;
&lt;li&gt;line charts to show trends and patterns in the number of medals won over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;View the whole Olympics dashboard:&lt;a href="https://public.tableau.com/app/profile/esther7982/viz/OlympicAnalysis_16743097934740/Olympicsdashboard" rel="noopener noreferrer"&gt;Olympics Dashboard&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Through this analysis, I was able to gain a deeper understanding of the history of the Olympics and the performances of countries and athletes over the past 120 years. By using SQL and Tableau, I was able to extract valuable insights from the data and present them in a clear and accessible way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code and Data
&lt;/h2&gt;

&lt;p&gt;The code and data used for this project is available at &lt;a href="https://github.com/EstherNjihia/120-Years-of-Olympics" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By using the above techniques, we can also use the data to predict future results, and also to identify patterns and trends that can be used to improve performance and achieve success in future Olympic games.&lt;/p&gt;

</description>
      <category>discuss</category>
    </item>
    <item>
      <title>Data engineering 102: Introduction to python for data engineering.</title>
      <dc:creator>Esther Njihia</dc:creator>
      <pubDate>Sun, 04 Sep 2022 20:39:09 +0000</pubDate>
      <link>https://dev.to/esther_njihia/data-engineering-102-introduction-to-python-for-data-engineering-52p1</link>
      <guid>https://dev.to/esther_njihia/data-engineering-102-introduction-to-python-for-data-engineering-52p1</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;The most interesting thing about python language is sort of all rounded. Due to its readability which is easy and the fact that it has very many applications including data engineering. In this post, we will discuss the most important concepts in the python language.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python Syntax
&lt;/h3&gt;

&lt;p&gt;Python, builds the code structure using whitespace and indentation. &lt;/p&gt;

&lt;h4&gt;
  
  
  Comments
&lt;/h4&gt;

&lt;p&gt;The comments are just as crucial as the code itself, because they explain why a piece of code was produced&lt;br&gt;
The comments are ignored by the Python interpreter when it runs the code.&lt;br&gt;
A single line comment in Python starts with the hash symbol (#), which is followed by the comment.&lt;/p&gt;

&lt;h4&gt;
  
  
  Identifiers
&lt;/h4&gt;

&lt;p&gt;Python uses identifiers, or names, to designate variables, functions, modules, classes, and other types of objects.&lt;br&gt;
An identifier's name must begin with a letter or an underscore. After that, only underscores or alphanumeric characters are allowed.&lt;/p&gt;

&lt;h4&gt;
  
  
  Keywords
&lt;/h4&gt;

&lt;p&gt;Keywords are words that have special meaning in python. Python has a special module keyword that outlines all the keywords that are used in the Python Language.&lt;/p&gt;

&lt;h3&gt;
  
  
  Control statements
&lt;/h3&gt;

&lt;h4&gt;
  
  
  If statements.
&lt;/h4&gt;

&lt;p&gt;If statements are conditional statements. Below is the summary of the different if statements:&lt;br&gt;
When you want to run a code block based on a condition, use the if statement.&lt;br&gt;
If you want to execute a different code block if the condition is not True, use the if...else statement.&lt;br&gt;
When you wish to check multiple conditions and execute the associated code block after the condition that evaluates to True, use the if...elif...else statement.&lt;/p&gt;

&lt;h4&gt;
  
  
  for loop
&lt;/h4&gt;

&lt;p&gt;A code block can be performed a predetermined number of times by using the for loop statement.&lt;br&gt;
To alter the loop, use the range(start, stop, step).&lt;/p&gt;

&lt;h4&gt;
  
  
  while loop.
&lt;/h4&gt;

&lt;p&gt;If a condition is True, use the Python while loop statement to run a code block.&lt;/p&gt;

&lt;h3&gt;
  
  
  Functions
&lt;/h3&gt;

&lt;p&gt;A named, reusable chunk of code called a function in Python executes a command or returns a value.&lt;br&gt;
Create a new function by using the def keyword. The function definition and body make up a function.&lt;br&gt;
A function may have 0 parameters or more. You must pass a function the same amount of arguments if it has one or more parameters.&lt;br&gt;
A function can do a task or give back a value. If you want to return a value from a function, use the return statement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lists
&lt;/h3&gt;

&lt;p&gt;An orderly group of elements is referred to as a list.&lt;br&gt;
If you want to retrieve a list element by its index, use the square bracket notation []. The index of the first element is zero.&lt;br&gt;
To reach a list element from the list's end, use a negative index. The final component has an index of -1.&lt;br&gt;
To change an entry from a list, use list[index] = new value.&lt;br&gt;
To add a new element to the end of a list, use the append() method.&lt;br&gt;
To insert a new element at a certain location in a list, use insert().&lt;br&gt;
To remove an entry from a list and return it, use the pop() function.&lt;br&gt;
To eliminate an element from a list, use remove().&lt;br&gt;
Tuples are immutable lists.&lt;br&gt;
Use tuples when you want to define a list that cannot change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dictionary
&lt;/h3&gt;

&lt;p&gt;A Python dictionary is a group of key-value pairs, each of which has a corresponding value.&lt;br&gt;
In order to access a value by its key, use square brackets or the get() method.&lt;br&gt;
To remove a key-value pair by the key from a dictionary, use the del statement.&lt;br&gt;
To iterate through the keys, values, and key-value pairs in a dictionary, use a for loop.&lt;/p&gt;

&lt;p&gt;This post includes a summarization of some most important concepts as a beginner in Python.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>python</category>
    </item>
  </channel>
</rss>
