<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Elvis Mburu</title>
    <description>The latest articles on DEV Community by Elvis Mburu (@mburu_elvis).</description>
    <link>https://dev.to/mburu_elvis</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F949783%2F74cb4a6e-1ab2-4c30-be1d-a0dcec57f34a.jpg</url>
      <title>DEV Community: Elvis Mburu</title>
      <link>https://dev.to/mburu_elvis</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mburu_elvis"/>
    <language>en</language>
    <item>
      <title>Machine Learning</title>
      <dc:creator>Elvis Mburu</dc:creator>
      <pubDate>Wed, 10 May 2023 11:21:24 +0000</pubDate>
      <link>https://dev.to/mburu_elvis/machine-learning-g3a</link>
      <guid>https://dev.to/mburu_elvis/machine-learning-g3a</guid>
      <description>&lt;p&gt;In this day and age there are booming buzz words in the tech sphere. Some being Artificial Intelligence, machine learning and deep learning.&lt;/p&gt;

&lt;p&gt;What do these words mean? We'll cover them in a few but we'll put our focus majorly in &lt;strong&gt;machine learning&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Artificial Intelligence
&lt;/h2&gt;

&lt;p&gt;According to a research paper by John McCarthy artificial intelligence is the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable.&lt;/p&gt;

&lt;p&gt;Turing on the other hand, provided a set of tests that an artificially intelligent computer has to pass.&lt;/p&gt;

&lt;p&gt;Generally AI focuses on the simulation of human intelligence in computers/machines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Machine Learning
&lt;/h2&gt;

&lt;p&gt;It is a subset of artificial intelligence (AI) in which algorithms are developed and trained on a dataset from which they make predictions or decisions based on the in fed data.&lt;br&gt;
This means that computers learn from data rather than being programmed.&lt;/p&gt;

&lt;p&gt;There are three types of machine learning algorithms.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supervised Learning&lt;/li&gt;
&lt;li&gt;Unsupervised Learning&lt;/li&gt;
&lt;li&gt;Reinforcement Learning&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;a. &lt;strong&gt;Supervised Learning&lt;/strong&gt;&lt;br&gt;
The algorithms are trained with labeled datasets. This means that the training data has both the input features and the expected output. &lt;br&gt;
It pretty similar to when you teach a class by also providing them answers for the test so that they can gauge their understanding.&lt;br&gt;
Thus the algorithm learns to recognize patterns to make predictions/decisions about new, unlabeled data.&lt;br&gt;
Some of the common supervised algorithms are:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;linear regression&lt;/li&gt;
&lt;li&gt;logistic regression&lt;/li&gt;
&lt;li&gt;random forest&lt;/li&gt;
&lt;li&gt;support vector machine(svm)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;b. &lt;strong&gt;Unsupervised Learning&lt;/strong&gt;&lt;br&gt;
The algorithm is trained on unlabeled data. This means that the algorithm has to recognize patterns in the data and then group similar data together.&lt;/p&gt;


&lt;/blockquote&gt;

&lt;p&gt;Some common algorithms used in unsupervised machine learning are:&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;neural networks&lt;/li&gt;
&lt;li&gt;k-means clustering&lt;/li&gt;
&lt;li&gt;probabilistic clustering methods&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;c. &lt;strong&gt;Reinforcement Learning&lt;/strong&gt;&lt;br&gt;
The algorithm learns to make decisions by interacting with an environment and receiving feedback in the form of reward or penalties. The algorithms strives to maximize rewards while reducing penalties. &lt;/p&gt;


&lt;/blockquote&gt;

&lt;p&gt;An example of a reinforcement learning algorithm is a robot to pick items. It holds items and attempts to lift them. If the item falls the robots via trial and error adjusts various features such as holding more tightly or looking for a more suitable place to hold the item. Over time it learns the best way to pick up the item and now significantly does so effortlessly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Applications of Machine Learning
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;self driving cars&lt;/li&gt;
&lt;li&gt;speech recognition&lt;/li&gt;
&lt;li&gt;computer vision&lt;/li&gt;
&lt;li&gt;recommendation engines&lt;/li&gt;
&lt;li&gt;fraud detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Machine learning is an essential tool in data science. We'll explore these algorithms as we progress.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>ai</category>
      <category>deeplearning</category>
    </item>
    <item>
      <title>Comprehensive Guide to GitHub for Data Scientists</title>
      <dc:creator>Elvis Mburu</dc:creator>
      <pubDate>Sun, 02 Apr 2023 20:40:31 +0000</pubDate>
      <link>https://dev.to/mburu_elvis/comprehensive-guide-to-github-for-data-scientists-1lnn</link>
      <guid>https://dev.to/mburu_elvis/comprehensive-guide-to-github-for-data-scientists-1lnn</guid>
      <description>&lt;h2&gt;
  
  
  What is GitHub
&lt;/h2&gt;

&lt;p&gt;Github is a code hosting platform for collaboration and version control.&lt;br&gt;
It facilitates social coding by providing a hosting service and web interface for git code repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is version control
&lt;/h2&gt;

&lt;p&gt;It is the practice of tracking and managing changes to software code.&lt;br&gt;
Version control software keeps track of every modification to the code in a special kind of database.&lt;br&gt;
Git is a version control.&lt;/p&gt;

&lt;p&gt;The version control system assigns a unique hash code for every modification done to the source code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version Control Benefits&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;History Tracking&lt;/li&gt;
&lt;li&gt;Collaborative history tracking&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Github Terms
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Installing git on linux&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt install git-all
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;if you are on another system check out &lt;a href="https://git-scm.com/book/en/v2/Getting-Started-Installing-Git"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configure user&lt;/strong&gt;&lt;br&gt;
set the username for the local repositories&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git config --global user.name "[username]"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;set the email to attach to the commits&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git config --global user.email "[email]"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;set the password &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git config --global user.password "[password]"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Repository
&lt;/h3&gt;

&lt;p&gt;A repository is a centralized location  in Git where files and their version history are stored.&lt;br&gt;
In other sense it's a directory that contains all the files and sub-directories associated with a project, along with the entire revision history of each file.&lt;/p&gt;
&lt;h3&gt;
  
  
  Branch
&lt;/h3&gt;

&lt;p&gt;A branch is a parallel version of a repository.&lt;br&gt;
The default branch is called &lt;code&gt;master&lt;/code&gt;&lt;br&gt;
Any other branch is a copy of the master branch at a particular time.&lt;br&gt;
Each branch contains changes that are different from the main codebase i.e. the master branch&lt;br&gt;
&lt;strong&gt;Benefits of using branches&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;parallel development without disrupting the main codebase&lt;/li&gt;
&lt;li&gt;It facilitates collaboration across teams&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Commits
&lt;/h3&gt;

&lt;p&gt;These refer to the changes in a repository.&lt;br&gt;
Each commit has a description/message why or what change was made.&lt;/p&gt;
&lt;h3&gt;
  
  
  Pull requests
&lt;/h3&gt;

&lt;p&gt;They are very instrumental to enabling seamless collaboration.&lt;br&gt;
With pull request you are proposing that your changes should be merged with the master branch.&lt;br&gt;
They show content differences, changes, additions and subtractions in colors (red and green)&lt;/p&gt;

&lt;p&gt;Pull requests are merged to the main branch by the repository owner or the code reiewer&lt;/p&gt;
&lt;h2&gt;
  
  
  Github Events
&lt;/h2&gt;

&lt;p&gt;Now that we have a brief overview of what Github is all about, let's dive into some of the events :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;creating and deleting a repository&lt;/li&gt;
&lt;li&gt;pushing a code into a repository&lt;/li&gt;
&lt;li&gt;creating a branch&lt;/li&gt;
&lt;li&gt;opening and closing a pull request&lt;/li&gt;
&lt;li&gt;code reviewing&lt;/li&gt;
&lt;li&gt;merging&lt;/li&gt;
&lt;li&gt;opening and closing issues&lt;/li&gt;
&lt;li&gt;assigning issues&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Creating a repository
&lt;/h2&gt;

&lt;p&gt;Creating a repository alias repo&lt;br&gt;
There are two ways of creating a repo&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;github user interface&lt;/li&gt;
&lt;li&gt;creating from a folder&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;a. repo from github user interface&lt;/strong&gt;&lt;br&gt;
from github click the green button on the top-left&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JzV84Pl3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ri3byli0qq0clm5a2i36.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JzV84Pl3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ri3byli0qq0clm5a2i36.png" alt="click repo" width="316" height="110"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;b. repo from a folder&lt;/strong&gt;&lt;br&gt;
You may want to make a existing folder in your local machine a repo.&lt;br&gt;
In the terminal go to the existing project you want to start tracking.&lt;br&gt;
Then enter the command below to initialize a folder as a repository.&lt;br&gt;
This thus creates a new repository in the current directory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Xg7dWZKC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nrikwm8z6gjlr7hvavdc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Xg7dWZKC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nrikwm8z6gjlr7hvavdc.png" alt="git init" width="684" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You then use the command&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git add .
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This command is used to add all changes in the current directory and its sub-directories to the staging area (the temporary storage area in Git where you can prepare changes to be committed to the repository).&lt;br&gt;
Instead if you want to commit selected files you can instead of &lt;code&gt;git add .&lt;/code&gt; use:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git add filename
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;the command adds a file named filename to the staging area&lt;br&gt;
or&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git add file1 file2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;incase of multiple files&lt;/p&gt;

&lt;p&gt;To commit the changes in the staging area to the repository we use &lt;code&gt;git commit&lt;/code&gt; command. Example:- &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git commit -m "first commit in the repository"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;the &lt;code&gt;-m&lt;/code&gt; option allows you to specify a message for a commit.&lt;br&gt;
This is often used as a brief summary of the changes that were made.&lt;br&gt;
&lt;strong&gt;Benefits of a good commit message&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enhance clarity of what changes were made&lt;/li&gt;
&lt;li&gt;Acts as a historical record of the changes made&lt;/li&gt;
&lt;li&gt;Facilitates collaboration among team members&lt;/li&gt;
&lt;li&gt;Helps in debugging as it helps identify which changes caused the errors/bugs&lt;/li&gt;
&lt;li&gt;A commit message can serve as documentation for the code changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;now let's rename the current &lt;strong&gt;&lt;em&gt;branch&lt;/em&gt;&lt;/strong&gt; to main&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git branch -M main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;this commands simply just renames the current branch to main.&lt;br&gt;
The default branch is &lt;code&gt;master&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Now let's add a new remote repo named &lt;code&gt;origin&lt;/code&gt; to the local git repo.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git remote add origin git@github.com:usename/new_repo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We now push our changes to the remote repository named &lt;code&gt;origin&lt;/code&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git push -u origin main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;if you want to clone a repo from github to your local machine&lt;br&gt;
you can use the command:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone url/to/the/repo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;this creates a directory with same name as your repo with the project contents also&lt;/p&gt;

&lt;h2&gt;
  
  
  Following Github Flow
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Create a branch&lt;/strong&gt;&lt;br&gt;
create a branch in your repository.&lt;br&gt;
There are two ways of creating a branch to your repository&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;from the github interface&lt;/li&gt;
&lt;li&gt;from the terminal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;create branch from github interface&lt;/strong&gt;&lt;br&gt;
click on the dropdown on the left of your screen&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4Zlwy84c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l0tiff6veao0cnon9vs9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4Zlwy84c--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l0tiff6veao0cnon9vs9.png" alt="create branch1" width="310" height="104"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;write the name of your branch &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--s5NLyvG0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sspdly11p6oz3n0vfsz9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--s5NLyvG0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sspdly11p6oz3n0vfsz9.png" alt="branch name" width="371" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;then click on the part &lt;strong&gt;&lt;em&gt;create branch:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6La5W5P0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x9rp789v70j5mparyeqx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6La5W5P0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x9rp789v70j5mparyeqx.png" alt="click create branch" width="241" height="30"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;create a branch from the terminal&lt;/strong&gt;&lt;br&gt;
Check the current branch using the command&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git branch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;create a new branch using the command&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git branch &amp;lt;branch_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now switch to the new branch using the command&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git checkout &amp;lt;branch_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;or &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git checkout -b &amp;lt;branch_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now you'll be making changes to the new branch instead of the main/master branch.&lt;br&gt;
To list the branches present in the repo &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git branch --list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;You can commit and push your changes to the branch&lt;br&gt;
Also you can be able to revert if a mistake is made&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deleting a branch&lt;/strong&gt;&lt;br&gt;
To delete a branch you use the command&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git branch -d [branch-name]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Create a pull request
&lt;/h3&gt;

&lt;p&gt;Creating a pull requests is vital especially in a collaboration environment.&lt;br&gt;
Some pull requests require approval before merging it.&lt;br&gt;
When you create a pull request, include a summary of the changes and the problem they solve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On github web interface&lt;/strong&gt;&lt;br&gt;
navigate to the main page of the repository&lt;br&gt;
in the branch menu, choose the branch that contains your commits&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eCBf-cgE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2tcspsovb2pk3as72pjt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eCBf-cgE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2tcspsovb2pk3as72pjt.png" alt="pull request " width="800" height="215"&gt;&lt;/a&gt;&lt;br&gt;
click on &lt;code&gt;New pull request&lt;/code&gt;&lt;br&gt;
You can choose the branch you want to create a pull request for&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IXcGgjla--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x4y3x1lfb5b9cp1q8z1n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IXcGgjla--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x4y3x1lfb5b9cp1q8z1n.png" alt="pull request2" width="800" height="454"&gt;&lt;/a&gt;&lt;br&gt;
If no issues you can click on the &lt;code&gt;Create pull request&lt;/code&gt; grren button&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vhIX8gQ9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vvolyiitcyvcr03nlevd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vhIX8gQ9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vvolyiitcyvcr03nlevd.png" alt="pull request" width="209" height="47"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The repo owner or code reviewer will then review the pull request and merge it to the main branch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create the pull request using the CLI&lt;/strong&gt;&lt;br&gt;
To create a pull request we use the&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gh pr create --assignee "@username"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;or you can use "@me" to self assign the pull request&lt;/p&gt;

&lt;h2&gt;
  
  
  Synchronize changes
&lt;/h2&gt;

&lt;p&gt;To synchronize your local repository with the remote repository on Github&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;git fetch&lt;/strong&gt;&lt;br&gt;
It downloads all history from the remote tracking branches&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;git merge&lt;/strong&gt;&lt;br&gt;
It combines remote tracking branch into the current local branch&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;git push&lt;/strong&gt;&lt;br&gt;
Uploads all local branch commits to Github&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;git pull&lt;/strong&gt;&lt;br&gt;
Updates your current local working branch with new commits from the corresponding remote branch&lt;br&gt;
It is a combination of &lt;code&gt;git fetch&lt;/code&gt; and &lt;code&gt;git merge&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Commit Changes
&lt;/h2&gt;

&lt;p&gt;To list the version history for the current branch use the command&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To list the version history for a file, including renames&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git log --follow [file]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To show content differences between two branches&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git diff [first_branch] ... [second_branch]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Snapshots of the file in preparation for versioning&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git add [file]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Redo Commits
&lt;/h2&gt;

&lt;p&gt;To undo all commits after [commit], preserving changes locally&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git reset [commit]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To discard all history and changes back to the specified commit&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git reset --hard [commit]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>datascience</category>
      <category>github</category>
      <category>git</category>
    </item>
    <item>
      <title>Sentiment Analysis</title>
      <dc:creator>Elvis Mburu</dc:creator>
      <pubDate>Sat, 25 Mar 2023 12:23:26 +0000</pubDate>
      <link>https://dev.to/mburu_elvis/sentiment-analysis-548</link>
      <guid>https://dev.to/mburu_elvis/sentiment-analysis-548</guid>
      <description>&lt;h2&gt;
  
  
  Getting Started With Sentiment Analysis
&lt;/h2&gt;

&lt;p&gt;It is the process of detecting positive or negative sentiment in text.&lt;br&gt;
It is also referred to as opinion mining.&lt;br&gt;
It is an approach to natural language processing (NLP) that identifies the emotional tone &lt;br&gt;&lt;br&gt;
behind a body of text.&lt;br&gt;
It is vastly used by organizations to determine and categorize opinions about a produt, service or idea&lt;br&gt;&lt;/p&gt;

&lt;p&gt;Sentiment analysis involves the use of &lt;em&gt;data mining, machine learning (ML), artificial intelligence&lt;br&gt;&lt;br&gt;
and computational linguistics&lt;/em&gt; to mine text for sentiment and subjective information.&lt;br&gt;&lt;br&gt;
Such information maybe classified as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;positive&lt;/li&gt;
&lt;li&gt;neutral&lt;/li&gt;
&lt;li&gt;negative
This classification is also known as &lt;strong&gt;&lt;em&gt;polarity&lt;/em&gt;&lt;/strong&gt; of a text.
&lt;strong&gt;&lt;em&gt;Graded Sentiment Analysis&lt;/em&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;very positive&lt;/li&gt;
&lt;li&gt;positive&lt;/li&gt;
&lt;li&gt;Neutral&lt;/li&gt;
&lt;li&gt;Negative&lt;/li&gt;
&lt;li&gt;Very Negative
This is also referred to as graded or fine-grained sentiment anlysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Types of Sentiment Analysis&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Intent-based - recognizes motivation behind a text&lt;/li&gt;
&lt;li&gt;Fine-grained - graded sentiment analysis&lt;/li&gt;
&lt;li&gt;Emotion-detection - allows detection of various emotions&lt;/li&gt;
&lt;li&gt;Aspect-based - anayses text to know particular aspects/features mentioned in all the polarity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We will not dive into these types for now.&lt;/p&gt;

&lt;p&gt;This in turn helps organizations to gather insights into real-time customer sentiment,&lt;br&gt; &lt;br&gt;
customer experience and brand reputation.&lt;br&gt;&lt;br&gt;
Generally these tools use text analytics to analyze online sources .&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Benefits of sentiment analysis&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sorting data as scale&lt;/li&gt;
&lt;li&gt;real-time analysis&lt;/li&gt;
&lt;li&gt;consistent criteria&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Steps involved in Sentiment Analysis
&lt;/h2&gt;

&lt;p&gt;Sentiment analysis generally follows the following steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Collect data&lt;/em&gt;&lt;/strong&gt; - The text to be analyzed is identified and collected.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Clean the data&lt;/em&gt;&lt;/strong&gt; - The data is processed and cleaned to remove noise and parts of speech 
that don't have meaning relevant to the sentiment of the text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Extract features&lt;/em&gt;&lt;/strong&gt; - A machine learning algorithm automatically extracts text features 
to identify negative or positive sentiment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Pick an ML model&lt;/em&gt;&lt;/strong&gt; - A sentiment analysis tool scores the text using rule-based,
automatic or hybrid ML model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Sentiment classification&lt;/em&gt;&lt;/strong&gt; - Once a model is picked an used to analyze a piece of text,
it assigns a sentiment score to the text including positive, negative of neutral.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's have a  deep dive in sentiment analysis using an example&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1. Collect Data
&lt;/h2&gt;

&lt;p&gt;We are going to used a data set from UCI Machine Learning Repository.&lt;/p&gt;

&lt;p&gt;Let's start with importing the libraries that we will be using:&lt;br&gt;
&lt;code&gt;punkt&lt;/code&gt; is a data package that contains pre-trained models for tokenization.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# import the required packages and libraries
import numpy as np
import pandas as pd
import nltk
nltk.download('punkt')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  loading the dataset
&lt;/h3&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pd.set_option('display.max_colwith', None)
df = pd.read_csv('https://gist.githubusercontent.com/fmnobar/88703ec6a1f37b3eabf126ad38c392b8/raw/76b84540ccd4b0b207a6978eb7e9d938275886ff/imdb_labelled.csv')
df.head()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ln-oTklm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vfe0j0zsyebssaqkofev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ln-oTklm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vfe0j0zsyebssaqkofev.png" alt="header for data" width="572" height="248"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can now see that there are only two columns &lt;code&gt;text&lt;/code&gt; and &lt;code&gt;label&lt;/code&gt;.&lt;br&gt;
The &lt;code&gt;label&lt;/code&gt; indicates the sentiment of the review&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 indicates a postive sentiment&lt;/li&gt;
&lt;li&gt;0 indicates a negative sentiment.
This thus indicates the polarity of the sentiment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We now create a sample string, which is the first entry in the &lt;code&gt;text&lt;/code&gt; column of the dataframe &lt;code&gt;df&lt;/code&gt;.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sample = df.text[0]
sample
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oH_WKhQc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/t9aj11abhp4fxi1jqhvm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--oH_WKhQc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/t9aj11abhp4fxi1jqhvm.png" alt="sample" width="579" height="60"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tokens and Bigrams
&lt;/h2&gt;

&lt;h3&gt;
  
  
  a. Tokens
&lt;/h3&gt;

&lt;p&gt;A &lt;code&gt;token&lt;/code&gt; is a single unit of meaning that can be identified in a text.&lt;br&gt;
It is also known as a &lt;code&gt;unigram&lt;/code&gt;.&lt;br&gt;
&lt;code&gt;Tokenization&lt;/code&gt; is the process of breaking down a text into individual tokens.&lt;br&gt;
The functions that perform &lt;code&gt;tokenization&lt;/code&gt; are called &lt;code&gt;tokenizers&lt;/code&gt;.&lt;br&gt;
This concept is implemented with the &lt;code&gt;nltk.word_tokenize&lt;/code&gt; function.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the function takes a string of text as input and returns a list of tokens.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;it splits the text into individual words and punctuation marks.&lt;br&gt;
Let's see an example the functions usage by tokenizing the &lt;code&gt;sample&lt;/code&gt; text.&lt;/p&gt;

&lt;p&gt;sample_tokens = nltk.tokenize(sample)&lt;br&gt;
sample_tokens[:10] # view a list of elements upto the 10th token&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ka8O78aG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z6gbggsopnnwbgg7rfjv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ka8O78aG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z6gbggsopnnwbgg7rfjv.png" alt="Sample Tokens" width="673" height="60"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  b. Bigrams
&lt;/h3&gt;

&lt;p&gt;If we combine two unigrams/tokens we form a &lt;code&gt;bigram&lt;/code&gt;.&lt;br&gt;
A bigram is a pair of adjecent tokens in a text.&lt;br&gt;
They are used to capture some of the context in which a particular word &lt;br&gt; or phrase appers.&lt;br&gt;
They are used to build statistical models of language which are &lt;br&gt; sequences of n words/tokens.&lt;br&gt;
By analyzing the frequency of different n-grams in a large corpus of text,&lt;br&gt;NLP systems can learn to predict the probability of dofferen words occuring in a particular context.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;bigrams&lt;/code&gt; are implememted with the &lt;code&gt;nltk.bigrams&lt;/code&gt; function&lt;/p&gt;

&lt;p&gt;Let's see this in action&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sample_bitokes = list(nltk.bigrams(sample_tokens))

# Return the first 10 bigrams
sample_bitokens[:10]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LpglFTUJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/np3cytthignf61dxaa6d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LpglFTUJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/np3cytthignf61dxaa6d.png" alt="bitokens" width="254" height="229"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequency Distribution
&lt;/h2&gt;

&lt;p&gt;Refers to the count or proportion of words or prases asscociated with positive or negative sentiment.&lt;br&gt;
It basically counts the occurrence of each sentiment-bearing word/phrase&lt;br&gt; and then calculate the frequency distribution.&lt;/p&gt;

&lt;p&gt;implemented using the &lt;code&gt;nltk.FreqDist&lt;/code&gt; function&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;What are the top 10 most frequently used tokens in our sample?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sample_freqdist = nltk.FreqDist(sample_tokens)

# Return the top 10 most frequent tokens
sample_freqdist.most_common(10)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qqG0JmW7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ezwyje8yo44qnsvi0u5v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qqG0JmW7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ezwyje8yo44qnsvi0u5v.png" alt="freqDist" width="201" height="219"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This results ultimately make sense:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a comma, &lt;code&gt;the&lt;/code&gt; , &lt;code&gt;a&lt;/code&gt; or periods can be quite common in a phrase.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's create a function named &lt;code&gt;tokens_top&lt;/code&gt; that takes in a text &lt;br&gt; as input and returns the top n most common tokens in a given text.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def tokens_top(text, n):
    # create tokens
    tokens = nltk.word_tokenize(text)

    # create the frequency distribution
    freqdist = nltk.FreqDist(tokens)

    # return the top n most common tokens
    return freqdist.most_common(n)

# Call the function 
tokens_top(df.text[1], 10)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--s-OkauLW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/myokkh3r9yvekmk8uuq3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--s-OkauLW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/myokkh3r9yvekmk8uuq3.png" alt="def freqdist" width="201" height="219"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Document-Term Matrix
&lt;/h3&gt;

&lt;p&gt;It is a matrix that represents the frequency of terms that occur in a collection of documents.&lt;br&gt;
The rows represent the documents in the corpus and the columns represent the terms .&lt;br&gt;
The cells of the matrix represents the frequency or weight of each term.&lt;/p&gt;

&lt;p&gt;We can implement this with &lt;code&gt;scikit-learn's&lt;/code&gt; &lt;code&gt;CountVectorizer&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#import the package
from sklearn.feature_extraction.text import CountVectorizer

def create_dtm(series):
    # Create an instance/object of the class
    cv = CountVectorizer()

    # create a dtm from the series parameter
    dtm = cv.fit_transform(series)

    # convert the sparse array to a dense array
    dtm = dtm.todense()

    # get column names
    features = cv.get_feature_names_out()

    # create a dataframe
    dtm_df = pd.DataFrame(dtm, columns = features)

    # return the dataframe
    return dtm_df
# Call the function for df['text].head
create_dtm(df['text'].head())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--AK53rnuV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2ws4xtm0vlbuxjrx1z5s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AK53rnuV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2ws4xtm0vlbuxjrx1z5s.png" alt="dtm" width="575" height="156"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Cleaning
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Feature Importance
&lt;/h3&gt;

&lt;p&gt;Refers to the extent to which a specific feature/variable contributes to the &lt;br&gt;prediction or classification in sentiment analysis.&lt;/p&gt;

&lt;p&gt;There are differet methods that can be used to determine feature importance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;machine learning algorithms eg. decision trees and random forests&lt;/li&gt;
&lt;li&gt;statistical methods eg. correlation or regression analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;feature importance is a useful tool in sentiment analysis as it can help identify &lt;br&gt;the most important features for accurately predicting the sentiment of a text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;br&gt;
we'll define a function "top_n_tokens" that has 3 parameters&lt;br&gt;
&lt;code&gt;text&lt;/code&gt;, &lt;code&gt;sentiment&lt;/code&gt; and &lt;code&gt;n&lt;/code&gt;&lt;br&gt;&lt;br&gt;
the function will return the top &lt;code&gt;n&lt;/code&gt; most important tokens&lt;br&gt;
to predict the sentiment of the text.&lt;br&gt;&lt;br&gt;
We'll use &lt;code&gt;LogisticRegression&lt;/code&gt; from &lt;code&gt;sklearn.linear_model&lt;/code&gt;&lt;br&gt;
with the following parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;solver = 'lbfgs'&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_iter = 2500&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;random_state = 1234&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;from sklearn.linear_model import LogisticRegression&lt;/p&gt;

&lt;p&gt;def top_n_tokens(text, sentiment, n):&lt;br&gt;
    # create an instance of the class&lt;br&gt;
    lgr = LogisticRegression(solver = 'lbfgs', max_iter = 2500, random_state = 1234)&lt;br&gt;
    cv = CountVectorizer()&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# create the DTM
dtm = cv.fit_transform(text)

# fit the logistic regression model
lgr.fit(dtm, sentiment)

# get the coefficients
coefs = lgr.coef_[0];

# create the features/column names
features = cv.get_features_names_out()

# create the dataframe
df = pd.DataFrame({'Tokens' : features, 'Coefficients' : coefs}) 
# return the largest n
return df.nlargest(n, coefficients)
# Test if on df['text]
top_n_tokens(df.text, df.label, 10)
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3EHQtnR5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4kml7yu934vr5uf6d26s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3EHQtnR5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4kml7yu934vr5uf6d26s.png" alt="feat importance" width="369" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To validate the hypothesis that the most important features will be the ones that&lt;br&gt; indicate a strong positive sentiment, let's look at the 10 smallest coefficients.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sklearn.linear_model import LosticRegression

def bottom_n_tokens(text, sentiment, n):
    # create an instance of the class
    lgr = LogisticRegression(solver = 'lbfgs', max_iter = 2500, random_state = 1234)
    cv = CountVectorizer()

    # create the DTM
    dtm = cv.fit_transform(text)

    # fit the logistic regression model
    lgr.fit(dtm, sentiment)

    # get the coefficients
    coefs = lgr.coef_[0];

    # create the features/column names
    features = cv.get_features_names_out()

    # create the dataframe
    df = pd.DataFrame({'Tokens' : features, 'Coefficients' : coefs})

    # return the smallest n
    return df.nmallest(n, coefficients)
# Test if on df['text]
bottom_n_tokens(df.text, df.label, 10)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_YzAPqdX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ctsfoui3h7cqdik1w60g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_YzAPqdX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ctsfoui3h7cqdik1w60g.png" alt="feat importance" width="325" height="297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the example that we've covered till this far we've used labelled data&lt;br&gt; What if we do not have labelled data?&lt;br&gt;
Then we can use pre-trained models such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TextBlob
-VADER&lt;/li&gt;
&lt;li&gt;Stanford ColeNLP&lt;/li&gt;
&lt;li&gt;Google Cloud Natural Language API&lt;/li&gt;
&lt;li&gt;Hugging Face Transformers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's explore &lt;code&gt;TextBlob&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  TextBlob
&lt;/h2&gt;

&lt;p&gt;It is a Python library that provides a simple API for performing common &lt;br&gt;NLP tasks such as sentiment analysis.&lt;br&gt;
It uses a pre-trained model to assign a sentiment score to a piece of text, ranging from -1 to 1&lt;/p&gt;

&lt;p&gt;It is built on top of NLTK (natural language toolkit)&lt;br&gt;
It also provides additional information such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;subjectivity score&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It returns the sentiment of agiveen data in the format of a named tuple as follows:&lt;br&gt;
    &lt;code&gt;(polarity, subjectivity)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;polarity score is a float within the range of [-1.0, 1.0].&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it aims at differentiating whether the text is positive or negative&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;subjectivity is a float within the range [0.0, 1.0]&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;0.0 is very objective&lt;/li&gt;
&lt;li&gt;1.0 is very subjective&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;TextBlob also provides other features such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;part-of-speech tagging &lt;/li&gt;
&lt;li&gt;a noun phrase extraction&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;Let's define a function named &lt;code&gt;polarity_subjectivity&lt;/code&gt; that accepts two argument.&lt;br&gt;
The function uses &lt;code&gt;TextBlob&lt;/code&gt; to the provided &lt;code&gt;text&lt;/code&gt;&lt;br&gt;
if &lt;code&gt;print_results&lt;/code&gt; = True, prints polarity and subjectivity of the text elseM&lt;br&gt; returns a tuple of float values 1st being polarity and 2nd being subjectivity&lt;/p&gt;

&lt;p&gt;You can install &lt;code&gt;TextBlob&lt;/code&gt; using&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!pip install textblob

#import TextBlob
from textblob import TextBlob

def polarity_subjectivity(text = sample, print_results = False):
    # create an instance of TextBlob
    tb= TextBlob(text)

    # if condition is metm print the results
    if print_results:
        print(f"Polarity is {round(tb.sentiment[0], 2)} : Subjectivity {round(tb.sentiment[1], 2)}")
    else:
        return (tb.sentiment[0], tb.sentiment[1])

# Test the function
polarity_subjectivity(sample, print_results =  True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--AT6aQcyX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jzgzu56twiv8c11v0dk0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AT6aQcyX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jzgzu56twiv8c11v0dk0.png" alt="pol_sub" width="400" height="43"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The results indicate that our sample has a slight positive polarity and it's relatively subjective thought not by  a high degree&lt;/p&gt;

&lt;p&gt;Let's define a function &lt;code&gt;token_count&lt;/code&gt; that accepts a string and using  &lt;code&gt;nltk's word_tokenizer&lt;/code&gt;,&lt;br&gt;returns an integer number of tokens in the given string&lt;/p&gt;

&lt;p&gt;Then define another function &lt;code&gt;series_tokens&lt;/code&gt; that accepts a Pandas Series as argument &lt;br&gt; and aplies the function&lt;br&gt;
&lt;code&gt;token_count&lt;/code&gt; to the given series.&lt;br&gt;
Use the second function on the top 10 rows of our dataframe&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# import libraries
from nltk import word_tokenize

# Define the first function that counts the number of tokens in a given string
def token_count(string):
    return (len(word_tokenize(string)))

# Define the second function that applies the           token_count funnction to a given Pandas series
def series_tokens(series):
    return series.apply(token_count)

# Apply the function to the top 10 rows of the data frame
series_tokens(df.text.head(10))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9gT03var--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/26ij4mkzwpe99mr0wjjj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9gT03var--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/26ij4mkzwpe99mr0wjjj.png" alt="pol series" width="386" height="198"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's define a function named &lt;code&gt;series_polarity_subjectivity&lt;/code&gt;&lt;br&gt;
that applies the &lt;code&gt;polarity_subjectivity&lt;/code&gt; function we defined earlier&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# define the function
def series_polarity_subjectivity(series):
    return series.apply(polarity_subjectivity)

# apply to the top 10 rows of df['text']
series_polarity_subjectivity(df['text'].head(10))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Measure of Complexity - Lexical Diversity
&lt;/h2&gt;

&lt;p&gt;Lexical diversity refers to the variety of words used in a piece of writing or speech.&lt;br&gt;
It is a measure of how often different words are used in a given text or speech and is often used as an indicator  of the richnes and complexity of vocabulary.&lt;br&gt;
It thus defines the number of unique tokens over the total number of tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;Let's define a &lt;code&gt;complexity&lt;/code&gt; function that accepts a string as an argument and returns the lexical complexity score defined as the number of unique tokens over the total number of tokens.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def complexity(string):
    # create a list of all tokens
    total_tokens = nltk.word_tokenize(string)

    # create a set of words(It keeps only unique values)
    unique_tokens = set(total_tokens)

    # Return the complexity measure
    if len(total_tokens) &amp;gt; 0:
        return len(unique_tokens) / len(total_tokens)

# apply the function to top 10 rows
df.text.head(10).apply(complexity)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6CAKPBme--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9gzaa9ymryix9572ry9j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6CAKPBme--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9gzaa9ymryix9572ry9j.png" alt="lexical Diversity" width="343" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Some interesting insights the row at index 3 and 4 have the highest lexical diversity. All the tokens in them are totally unique.&lt;/p&gt;

&lt;h2&gt;
  
  
  Text Cleanup - Stopwords and Non-alphabeticals
&lt;/h2&gt;

&lt;p&gt;This step ensures that the text data is in a constitent format and to remove noise,&lt;br&gt;&lt;br&gt;
irrelevant information and other inconsitencies.&lt;br&gt;
Some of the techniques for text cleanup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lowercasing&lt;/li&gt;
&lt;li&gt;Tokenization&lt;/li&gt;
&lt;li&gt;Stopword Removal&lt;/li&gt;
&lt;li&gt;Removing Punctuation&lt;/li&gt;
&lt;li&gt;Stemming and Lemmatization&lt;/li&gt;
&lt;li&gt;Removing URL's and mentions&lt;/li&gt;
&lt;li&gt;Removing emojis and emotions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#import the library
from nltk.corpus imort stopwords

# Select only English stopwords
english_stop_words = stopwords.words('english')

# print the first 20
print(english_stop_words[:20])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Let's look at an example to remove non-alphabetical&lt;br&gt;
We'll use &lt;code&gt;isalpha&lt;/code&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;string_1 = "Crite_Jes.cd"
string_2 = "a quick dog"
string_2 = "We are good!"

print(f"String_1: {string_1.isalpha()}\n")
print(f"String_1: {string_2.isalpha()}\n")
print(f"String_1: {string_3.isalpha()}\n")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ogN2rcNE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gsppy2pfej8lly71rt2g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ogN2rcNE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gsppy2pfej8lly71rt2g.png" alt="Clean aplhabets" width="177" height="119"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>nlp</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Essential SQL Commands for Data Science</title>
      <dc:creator>Elvis Mburu</dc:creator>
      <pubDate>Mon, 13 Mar 2023 19:24:51 +0000</pubDate>
      <link>https://dev.to/mburu_elvis/essential-sql-commands-for-data-science-1ml3</link>
      <guid>https://dev.to/mburu_elvis/essential-sql-commands-for-data-science-1ml3</guid>
      <description>&lt;p&gt;Structured Query Language (SQL) is a programming language designed for managing and manipulating relational databases.&lt;br&gt;
A database on the other hand is a collection of data that is organized in a manner that facilitates ease of access, as well as efficient management and updating.&lt;br&gt;
A database is made up of tables that store relevant information.&lt;br&gt;
The language is used by data analysts and data scientists to extract insights from large datasets.&lt;br&gt;
SQL is a powerful tool that can be used to perform a wide variety of data manipulation tasks including : filtering, sorting, grouping and aggregating data.&lt;br&gt;
A table stores and displays data in a structured format consisting of columns and rows that are similar to those seen in Excel spreadsheets.&lt;/p&gt;

&lt;p&gt;SQL can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Insert, update or delete records in a database.&lt;/li&gt;
&lt;li&gt;Create new databases, tables, triggers and views.&lt;/li&gt;
&lt;li&gt;Retrieve data from a database.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Basic SQL Commands
&lt;/h2&gt;

&lt;p&gt;I shall be demonstrating this commands using mysql terminal.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. SHOW
&lt;/h2&gt;

&lt;p&gt;The show statement displays information contained in the database and its tables&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SHOW DATABASES;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--B1WaHjF2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ckgunceondh6cwrdnsgu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--B1WaHjF2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ckgunceondh6cwrdnsgu.png" alt="showDatabase" width="254" height="264"&gt;&lt;/a&gt;&lt;br&gt;
This command (SHOW DATABASES) lists the databases managed by the server.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SHOW TABLES;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DFVa6UJN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jw6ixkaja7g56to77pj0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DFVa6UJN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jw6ixkaja7g56to77pj0.png" alt="showTables" width="248" height="186"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SHOW COLUMNS FROM table_names;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zma4JRFe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kea21guemf3642acely3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zma4JRFe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kea21guemf3642acely3.png" alt="showColumn" width="471" height="228"&gt;&lt;/a&gt;&lt;br&gt;
This commands shows the columns in the table_names table.&lt;br&gt;
It displays the :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Field : The column name&lt;/li&gt;
&lt;li&gt;Type : The data type of the values stored in the column&lt;/li&gt;
&lt;li&gt;Null : If the column is null &lt;/li&gt;
&lt;li&gt;Key : It the column is the &lt;code&gt;Primary Key&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Default : The default value if null&lt;/li&gt;
&lt;li&gt;Extra : may contain additional information that is available about a given column&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  USE command
&lt;/h2&gt;

&lt;p&gt;It is used (no pun intended) to specify which database to be &lt;strong&gt;&lt;em&gt;used&lt;/em&gt;&lt;/strong&gt; if there are multiple of them.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;USE demo;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--28VxPDD9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fj4zgg3pv319n14zi7mx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--28VxPDD9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fj4zgg3pv319n14zi7mx.png" alt="useDatabase" width="387" height="291"&gt;&lt;/a&gt;&lt;br&gt;
There are six databases managed by my server. By the the help of &lt;strong&gt;&lt;em&gt;USE&lt;/em&gt;&lt;/strong&gt; we specify that we want to use the demo database.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. SELECT Statement
&lt;/h2&gt;

&lt;p&gt;It is used to retrieve data from one or more tables in a database.&lt;br&gt;
The select statement can be used to filter, sort and group data using different functions which we'll cover as we progress.&lt;br&gt;
Here's the syntax of SQL SELECT statement:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT column_list
FROM table_name;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;column_list : includes one or more columns which data is retrieved&lt;/li&gt;
&lt;li&gt;table-name : it's the name of the table from which the information is retrieved&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A &lt;strong&gt;&lt;em&gt;query&lt;/em&gt;&lt;/strong&gt; may retrieve information from selected columns or from all columns in the table.&lt;br&gt;
To create a simple &lt;strong&gt;&lt;em&gt;SELECT&lt;/em&gt;&lt;/strong&gt; statement, specify the name(s) of the column(s) you need from the table.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT adm_no FROM table_names;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RXgqgDVB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sxjomstur5nhwafijo7m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RXgqgDVB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sxjomstur5nhwafijo7m.png" alt="selectCol" width="422" height="177"&gt;&lt;/a&gt;&lt;br&gt;
From the above statement we SELECT the values in the adm_no column in the table_names table. This means we have specified from which column we want it's values selected. We have just selected from just one column.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT adm_no, Maths FROM table_name;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TVxFq6gp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fpcpohq0xwxukn7ye8c8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TVxFq6gp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fpcpohq0xwxukn7ye8c8.png" alt="selCols" width="422" height="177"&gt;&lt;/a&gt;&lt;br&gt;
We can specify more columns to be queried. In the above statement we have queried from two columns (adm_no and Maths).&lt;br&gt;
To &lt;strong&gt;&lt;em&gt;SELECT&lt;/em&gt;&lt;/strong&gt; from specific multiple columns you use a comma (,) to add the name of the column you want queried.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM table_name;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_pAVm1Rb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hqlgt2jjjm5z3fwaeaab.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_pAVm1Rb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hqlgt2jjjm5z3fwaeaab.png" alt="selAll" width="422" height="177"&gt;&lt;/a&gt;&lt;br&gt;
We use an asterisk (*) if we want to query/fetch from all the columns in a table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Multiple Queries&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
SQL allows to run multiple queries or commands at the same time.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM pets;
SELECT * FROM table_name;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The above statements retrieves all the columns and the rows in the pets and table_name tables;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;The DISTINCT Keyword&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
In situations where you have multiple duplicate records in a table you may want to retrieve only unique records, instead of fetching the Duplicates.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Syntax&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT DISTINCT col_name1, col_name2
FROM table_name;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Example&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT DISTINCT * FROM pets;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vPJe5gMT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/igzqgjzu2jttqexoh21z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vPJe5gMT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/igzqgjzu2jttqexoh21z.png" alt="selectDistinct" width="422" height="186"&gt;&lt;/a&gt;&lt;br&gt;
We fetch all the columns and rows that are distinct from the pets table;&lt;/p&gt;

&lt;h2&gt;
  
  
  3. CREATE Database
&lt;/h2&gt;

&lt;p&gt;The &lt;em&gt;*** CREATE DATABASE***&lt;/em&gt; statement is used to create a new SQL database;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE DATABASE employees;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IAAn5laa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zwl3zyb3jvc9cemzrho4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IAAn5laa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zwl3zyb3jvc9cemzrho4.png" alt="create_db" width="416" height="301"&gt;&lt;/a&gt;&lt;br&gt;
From the above commands we've created a database called employees;&lt;/p&gt;

&lt;p&gt;we can now &lt;em&gt; &lt;strong&gt;&lt;em&gt;USE&lt;/em&gt;&lt;/strong&gt;&lt;/em&gt; the database and now create it's tables;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. CREATE TABLE
&lt;/h2&gt;

&lt;p&gt;The &lt;em&gt;&lt;strong&gt;&lt;em&gt;CREATE TABLE&lt;/em&gt;&lt;/strong&gt;&lt;/em&gt; statement is used to create a new database.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE table_name (
column1 datatype,
column2 datatype,
...
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That is the syntax of creating a table.&lt;br&gt;
Let's create a new table&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE employees (
emp_id VARCHAR(50),
firstName VARCHAR(50),
lastName VARCHAR(50),
department VARCHAR(50));
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--a5xRrtT_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/048q94cwkp5pimn8dre4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--a5xRrtT_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/048q94cwkp5pimn8dre4.png" alt="createTable" width="416" height="281"&gt;&lt;/a&gt;&lt;br&gt;
We've created a table called &lt;strong&gt;&lt;em&gt;employees&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
It has the following columns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;emp_id : of type varchar&lt;/li&gt;
&lt;li&gt;firstName : of type varchar&lt;/li&gt;
&lt;li&gt;lastName : of type varchar&lt;/li&gt;
&lt;li&gt;department : of type varchar&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that we have created a new table we want to insert values to the table:&lt;/p&gt;

&lt;h2&gt;
  
  
  INSERT INTO
&lt;/h2&gt;

&lt;p&gt;The &lt;em&gt; INSERT INTO &lt;/em&gt; is used to insert new records in a table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;syntax&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO table_name
VALUES (value1, value2, ...)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Here's an example inserting values to the &lt;strong&gt;&lt;em&gt;employees&lt;/em&gt;&lt;/strong&gt; table.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO employees
VALUES ('IT_210', 'John', 'Doe', 'IT');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tv-CNIE---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/v56qzf3l2bb23squsmyb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tv-CNIE---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/v56qzf3l2bb23squsmyb.png" alt="insertInto" width="594" height="211"&gt;&lt;/a&gt;&lt;br&gt;
We have insert values to the employees table.&lt;br&gt;
Note we have supplied values for every column in the employees table.&lt;/p&gt;

&lt;p&gt;What if we do not want to supply a value for every column/field of the table?&lt;br&gt;
We then have to supply a list of the fields we want to supply values for&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO employees (emp_id, firstName, lastName)
VALUES ('mn_210', 'Lucky', 'Lard');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--msDv69il--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oscg0yttmvna34malats.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--msDv69il--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oscg0yttmvna34malats.png" alt="insertInto_list" width="612" height="74"&gt;&lt;/a&gt;&lt;br&gt;
Here we've supplied a list of the list of fields we want to supply values for: &lt;strong&gt;&lt;em&gt;emp_id, firstName,lastName&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What if we want to update the values already contained in the database? &lt;br&gt;
Maybe we passed the wrong department or name for an employee. Well use the update statement&lt;/p&gt;

&lt;h2&gt;
  
  
  5. UPDATE statement
&lt;/h2&gt;

&lt;p&gt;The &lt;em&gt;UPDATE&lt;/em&gt; statement is used to modify the existing records in a table.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;syntax&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Let's update the lastName of the customer with id mn_210&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UPDATE employees
SET lastName = "Angie"
WHERE emp_id = 'mn_210';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--L5yuN2KX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pnvtwb4xq6yrzh1aeg7c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--L5yuN2KX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pnvtwb4xq6yrzh1aeg7c.png" alt="updateValues" width="587" height="279"&gt;&lt;/a&gt;&lt;br&gt;
We've updated the name from Lard to Angie.&lt;br&gt;
You can also update many records by use of a comma as shown in the syntax definition&lt;/p&gt;

&lt;h2&gt;
  
  
  6. WHERE statement
&lt;/h2&gt;

&lt;p&gt;It is used to filter data based on a specified condition.&lt;br&gt;
We provide the conditions that have to be met before returning the data using the &lt;strong&gt;&lt;em&gt;WHERE&lt;/em&gt;&lt;/strong&gt; clause.&lt;br&gt;
It is used to filter data in a way.&lt;br&gt;
The &lt;strong&gt;&lt;em&gt;WHERE&lt;/em&gt;&lt;/strong&gt; clause is not only used in &lt;em&gt;SELECT&lt;/em&gt; statements, but also in &lt;em&gt;UPDATE&lt;/em&gt; and &lt;em&gt;DELETE&lt;/em&gt; etc.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM employees
WHERE emp_id = 'mk_23';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9YShSlhJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cc43jw7gvr6z0rrf0gzg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9YShSlhJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cc43jw7gvr6z0rrf0gzg.png" alt="where1" width="497" height="163"&gt;&lt;/a&gt;&lt;br&gt;
Here we've specified that we want to retrieve record that for employee with id &lt;strong&gt;&lt;em&gt;mk_23&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
In this case the query will fetch on row since only one employee has the id since every id is unique&lt;br&gt;
&lt;em&gt;Example2&lt;/em&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM employees
WHERE department = 'IT';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OEfuwibP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9tiut0hiq5rp2srdn5cb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OEfuwibP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9tiut0hiq5rp2srdn5cb.png" alt="where2" width="428" height="183"&gt;&lt;/a&gt;&lt;br&gt;
Here we fetch all the records having the &lt;strong&gt;&lt;em&gt;department&lt;/em&gt;&lt;/strong&gt; as &lt;em&gt;IT&lt;/em&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--B0Rnn5lj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/se8v11xnk8fvh9i6elxn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--B0Rnn5lj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/se8v11xnk8fvh9i6elxn.png" alt="Image description" width="432" height="286"&gt;&lt;/a&gt;&lt;br&gt;
For this case it returns two reco.rds.&lt;/p&gt;

&lt;p&gt;We will see more of &lt;em&gt;WHERE&lt;/em&gt; as we progress with other statements, commands etc.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. DELETE
&lt;/h2&gt;

&lt;p&gt;The &lt;em&gt; DELETE &lt;/em&gt; statement is used to delete existing records in a table.&lt;br&gt;
&lt;em&gt;Syntax&lt;/em&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DELETE FROM table_name WHERE condition;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Let's delete the records for Lucky&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DELETE FROM employees
WHERE emp_id = 'mn_210';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--38cP1Dd5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l3fey2oioyo8g5juj357.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--38cP1Dd5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l3fey2oioyo8g5juj357.png" alt="del1" width="432" height="286"&gt;&lt;/a&gt;&lt;br&gt;
Here we have delete the record that has the &lt;strong&gt;&lt;em&gt;emp_id&lt;/em&gt;&lt;/strong&gt; with the value &lt;strong&gt;&lt;em&gt;mn_210&lt;/em&gt;&lt;/strong&gt; which is refers to Lucky Angie.&lt;br&gt;
When we retrieve all the records we can see that the record has been deleted.&lt;/p&gt;

&lt;p&gt;We can also delete all records at once.&lt;br&gt;
&lt;em&gt;**&lt;em&gt;Syntax&lt;/em&gt; *&lt;/em&gt;* &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DELETE FROM table_name;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Example&lt;/em&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DELETE FROM pets;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WbSeUWPJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zkttlj3d6cdvrma81tpf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WbSeUWPJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zkttlj3d6cdvrma81tpf.png" alt="delete all records" width="497" height="463"&gt;&lt;/a&gt;&lt;br&gt;
This format is used to delete every record from a table.&lt;br&gt;
This however does not delete the table.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. ORDER BY
&lt;/h2&gt;

&lt;p&gt;The keyword is used to sort the result-set in ascending or descending order.&lt;br&gt;
It sorts the records i descending order, use the &lt;em&gt;**DESC&lt;/em&gt;* keyword.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Syntax&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT col_1, col_2, ...
FROM table_name
ORDER BY col_1, col_2, ... ASC|DESC
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Let's explore the keyword using the employees database and employees table.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT *
FROM employees
ORDER BY emp_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--G9pTKk3n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2q2jveomcys4pxlypjrg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--G9pTKk3n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2q2jveomcys4pxlypjrg.png" alt="orderBy1" width="473" height="240"&gt;&lt;/a&gt;&lt;br&gt;
Here we've ordered the records of the employees table using the &lt;em&gt;emp_id&lt;/em&gt; column in ascending order&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM employees
ORDER BY firstName, lastName;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--swg1ycP0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qz6si95ansi3lbavt7g2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--swg1ycP0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qz6si95ansi3lbavt7g2.png" alt="order 2" width="551" height="247"&gt;&lt;/a&gt;&lt;br&gt;
Here we've order the records with the &lt;strong&gt;&lt;em&gt;firstName&lt;/em&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;em&gt;lastName&lt;/em&gt;&lt;/strong&gt;. In an instance where the firstName is similar in two or more records the records will be ordered in respect to the &lt;strong&gt;&lt;em&gt;lastName&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  10. GROUP BY
&lt;/h2&gt;

&lt;p&gt;The &lt;em&gt;GROUP BY &lt;/em&gt; statement groups row that have the same values into summary rows.&lt;br&gt;
It is often used with aggregate functions&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;COUNT()&lt;/li&gt;
&lt;li&gt;MAX()&lt;/li&gt;
&lt;li&gt;SUM()&lt;/li&gt;
&lt;li&gt;AVG()&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Syntax&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT col_name(s)
FROM table_name
WHERE condition
GROUP BY col_name(s)
ORDER BY col_name(s)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Let's view various instances of &lt;strong&gt;&lt;em&gt;GROUP BY&lt;/em&gt;&lt;/strong&gt; statement&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT COUNT(department), department FROM employees
GROUP BY department;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ZGkcyb7J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/e2uf8ept1vqrjnljkxz5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ZGkcyb7J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/e2uf8ept1vqrjnljkxz5.png" alt="Image description" width="416" height="235"&gt;&lt;/a&gt;&lt;br&gt;
Here we count the number of employees in each department&lt;/p&gt;

</description>
      <category>database</category>
      <category>sql</category>
      <category>datascience</category>
      <category>python</category>
    </item>
    <item>
      <title>Exploratory Data Analysis Ultimate Guide</title>
      <dc:creator>Elvis Mburu</dc:creator>
      <pubDate>Tue, 28 Feb 2023 20:15:28 +0000</pubDate>
      <link>https://dev.to/mburu_elvis/exploratory-data-analysis-ultimate-guide-31fd</link>
      <guid>https://dev.to/mburu_elvis/exploratory-data-analysis-ultimate-guide-31fd</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Data Science is an inter-disciplinary field that uses statistics, scientific computing, scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data.&lt;/p&gt;

&lt;p&gt;Data Science encompasses many steps and activities.&lt;br&gt;
The main data science steps are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business understanding&lt;/li&gt;
&lt;li&gt;Data collection&lt;/li&gt;
&lt;li&gt;Data Exploration&lt;/li&gt;
&lt;li&gt;Data Modelling&lt;/li&gt;
&lt;li&gt;Model Evaluation&lt;/li&gt;
&lt;li&gt;Model Deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We will dive deep into exploratory data analysis commonly referred to as EDA.&lt;/p&gt;

&lt;h2&gt;
  
  
  Exploratory Data Analysis (EDA)
&lt;/h2&gt;

&lt;p&gt;What exactly is EDA?&lt;/p&gt;

&lt;p&gt;EDA generally mean the process of exploring the data to gain insights, identify trends, patterns and various relationships between various features in the data.&lt;/p&gt;

&lt;p&gt;To demonstrate various activities in the EDA phase of Data Science we'll use Python programming language.&lt;/p&gt;

&lt;p&gt;If new to to Python here is a link to an earlier article about &lt;a href="https://dev.to/mburu_elvis/python-101-python-for-data-science-18k6"&gt;python for data science&lt;/a&gt;, it targets beginners to programming and gradually introducing the relevant concepts for Data Science.&lt;/p&gt;

&lt;p&gt;Exploratory data analysis is often used to see what the data can reveal beyond formal modelling or hypothesis testing and provides a better understanding of data set variables and the relationships between them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Aim&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
The main aim of EDA is to help look at data before making any assumptions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Exploratory Data Analysis Tools
&lt;/h3&gt;

&lt;p&gt;Specific statistical functions and techniques you can perform with EDA tools include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clustering and dimension reduction techniques, which help create graphical displays of high-dimensional data containing many variables.&lt;/li&gt;
&lt;li&gt;Univariate visualization of each field in the data.&lt;/li&gt;
&lt;li&gt;Bivariate visualiation and summary statistics that allow you to asses the relationship between each variable in the dataset and the target variable.&lt;/li&gt;
&lt;li&gt;K-means Clustering is a clustering method in &lt;a href="https://developer.ibm.com/articles/cc-unsupervised-learning-data-classification"&gt;unsupervised learning&lt;/a&gt; where the data points are assigned into K groups.&lt;/li&gt;
&lt;li&gt;Predictive models, such as linear models, use statistics and data to predict outcomes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Types of exploratory data analysis
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;Univariate non-graphical&lt;/em&gt;&lt;/strong&gt;: It is simple since we just consider one variable/feature. The primary goal is to know the underlying sample distribution and make observations about the population. Outlier detection is also part of the analysis:&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Central Tendency: The commonly useful measures of central tendency  are mean, median and sometimes mode.&lt;/li&gt;
&lt;li&gt;Spread: It's an indicator of what proportion distant from the middle we are to seek out the info values.&lt;/li&gt;
&lt;li&gt;Skewness and Kurtosis: Skewness is a measure of symmetry. A dataset is symmetric if it looks the same to the left and right of the center point. Kurtosis is a measure of whether the data are heavily-tailed or light tailed relative to a normal distribution. &lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;Multivariate Non-graphical&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
It's an EDA technique that won't show the connectin between two or more varivables within the sort of either cross-tabulation or statistics.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;Univariate Graphical&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
They involve a degree of subjective analysis&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Histogram: They are used to describe a feature/variable in terms of frequency/distribution (central tendency, spread, outliers)&lt;/li&gt;
&lt;li&gt;Boxplots: They oftenly used to describe measures of central tendecy and show robust measures of location and spread, symmetry.&lt;/li&gt;
&lt;li&gt;Multivariate graphical: They display relationships between two or more features/variables(columns).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;***Examples of multivariate graphics are :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Scatterplot: &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Heatmap: It's a graphical representaion where values are depicted by color.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Exploratory Data Analysis is a continuous loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Practical Approach in EDA
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Libraries&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
We will use the following libraries in EDA&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;numpy: it's a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform and matrices.&lt;br&gt;&lt;br&gt;
Installing numpy &lt;br&gt;&lt;/p&gt;

&lt;p&gt;!pip install numpy &lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check the official documentation &lt;a href="https://numpy.org/"&gt;numpy&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;pandas: It's a Python library used for working with datasets. It has various functions for manipulating the data.&lt;br&gt;&lt;br&gt;
installing pandas &lt;br&gt;&lt;/p&gt;

&lt;p&gt;!pip install pandas&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Official documentation &lt;a href="https://pandas.pydata.org/docs/"&gt;pandas&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;matplotlib: It's a comprehensive library for creating static, animated and interactive visualizations in Python.&lt;br&gt;&lt;br&gt;
installing matplotlib&lt;/p&gt;

&lt;p&gt;!pip install matplotlib&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Offical documentation &lt;a href="https://matplotlib.org/"&gt;matplotlib&lt;/a&gt;&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  EDA Example
&lt;/h3&gt;

&lt;p&gt;We are going to explore EDA using a housing dataset.&lt;br&gt;
Here's a &lt;a href="https://www.kaggle.com/datasets/yasserh/housing-prices-dataset"&gt;link&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We want to predict the prices of houses based on certain factors like: &lt;br&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;area - the area of the house in square feet&lt;/li&gt;
&lt;li&gt;bedrooms - the number of bedrooms in the house&lt;/li&gt;
&lt;li&gt;bathrooms - the number of bathrooms&lt;/li&gt;
&lt;li&gt;stories - the number of floors (story building)&lt;/li&gt;
&lt;li&gt;main road - nearness to the main road; yes if near to, no if not.&lt;/li&gt;
&lt;li&gt;guestroom - yes if present, no if isn't&lt;/li&gt;
&lt;li&gt;basement - no if absent, yes if present&lt;/li&gt;
&lt;li&gt;hot overheating - yes if present, no if absent&lt;/li&gt;
&lt;li&gt;airconditioning - yes if present, no if absent&lt;/li&gt;
&lt;li&gt;parking - the number of vehicles the parking can accomodate&lt;/li&gt;
&lt;li&gt;prefarea - yes if the locality of the house is of much preference to many, no if it isn't&lt;/li&gt;
&lt;li&gt;furnishingstatus - furnished, semi-funished&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;1.1 &lt;strong&gt;&lt;em&gt;Importing packages&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
This importing the necessary packages and modules required.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;1.2 &lt;strong&gt;&lt;em&gt;Loading Dataset&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Loading the dataset to conduct EDA... My data set is local but one may pass an URL if the data is hosted online.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;housing_data = pd.read_csv("Housing.csv")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  2. Exploratory Data Analysis
&lt;/h2&gt;
&lt;h3&gt;
  
  
  2.1. Preprocessing
&lt;/h3&gt;

&lt;p&gt;View the first n rows of the data set in order to get a general idea of how the data looks like&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;housing_data.head()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OSRx3YE_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cqoy0j2gmmmn85gn9teg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OSRx3YE_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cqoy0j2gmmmn85gn9teg.png" alt="Image description" width="635" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2. Checking the shape of the dataset
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;housing_data.shape
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--81gyXdCR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l659vro2pf0bndy3o4z2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--81gyXdCR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l659vro2pf0bndy3o4z2.png" alt="Image description" width="616" height="44"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3. Checking the the statistical metrics of the dataset
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;housing_data.describe()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YB6Fno3r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/grjpubty4651bvzty00t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YB6Fno3r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/grjpubty4651bvzty00t.png" alt="Image description" width="624" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2.4. Checking the info about the data
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;housing_data.info()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EWYOsga3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5invfngd7fo44de5vyco.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EWYOsga3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5invfngd7fo44de5vyco.png" alt="Image description" width="624" height="360"&gt;&lt;/a&gt;&lt;br&gt;
The dataset at hand is already cleaned so no need of performing the cleaning phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Univariate Analysis
&lt;/h2&gt;

&lt;p&gt;As we said earlier, in univariate analysis you analyze the data of just one variable.&lt;br&gt;
A variable refers to a single feature/column.&lt;br&gt;
Some visual methods include :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Histograms: Bar plots in which frequency of data is represented with rectangle bars&lt;/li&gt;
&lt;li&gt;Box-plots: The variable values are represented in form of boxes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's make a histogram of the price column&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("House Prices")
plt.xlabel("Prices")
plt.ylabel("Frequency")
plt.hist(housing_data.price)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EPuTyf8j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1ec1vu6x4jwknnxtzgc7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EPuTyf8j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1ec1vu6x4jwknnxtzgc7.png" alt="Image description" width="617" height="300"&gt;&lt;/a&gt;&lt;br&gt;
from the image we realize that the house price is positively-skewed. This is because more values are plotted on the left side of the distribution.&lt;br&gt;
Most houses have a price range of 3 million and 4 million&lt;/p&gt;

&lt;h3&gt;
  
  
  Area
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("House Area")
plt.xlabel("area")
plt.ylabel("Frequency")
plt.hist(housing_data.area)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QuRGF8OJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zx5y5n82dk39ydabzz3v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QuRGF8OJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zx5y5n82dk39ydabzz3v.png" alt="Image description" width="472" height="302"&gt;&lt;/a&gt;&lt;br&gt;
The house area are positively skewed by having majority of the house areas range from 4000 square feet and below. &lt;br&gt;
The mode area is 3000 square feet having 200 houses in total having the area&lt;/p&gt;

&lt;h3&gt;
  
  
  bedrooms
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Number of Bedrooms")
plt.xlabel("Bedrooms")
plt.ylabel("Frequency")
plt.hist(housing_data.bedrooms)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--w3ZtCpln--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vkdrs0hbxvd5taulbe84.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--w3ZtCpln--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vkdrs0hbxvd5taulbe84.png" alt="Image description" width="472" height="302"&gt;&lt;/a&gt;&lt;br&gt;
From the image we can infer that the house are normally distributed.&lt;br&gt;
Majority of the houses have 3 bedrooms.&lt;/p&gt;

&lt;h3&gt;
  
  
  bathrooms
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Number of Bathrooms")
plt.xlabel("Bathrooms")
plt.ylabel("Frequency")
plt.hist(housing_data.bathrooms)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--K1dQo4Zx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4doi43hgo0bsxzw4stc4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--K1dQo4Zx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4doi43hgo0bsxzw4stc4.png" alt="Image description" width="472" height="302"&gt;&lt;/a&gt;&lt;br&gt;
The histograms shows that most of the houses have only one bathroom&lt;/p&gt;

&lt;h3&gt;
  
  
  stories
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Number of Stories")
plt.xlabel("Stories")
plt.ylabel("Frequency")
plt.hist(housing_data.stories)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--e973TMyZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/w772dz34l8nv8ad2yz9q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--e973TMyZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/w772dz34l8nv8ad2yz9q.png" alt="Image description" width="472" height="302"&gt;&lt;/a&gt;&lt;br&gt;
The histogram shows that majority of the houses are one or two stories&lt;/p&gt;

&lt;h3&gt;
  
  
  mainroad
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Nearness to Main Road")
plt.xlabel("Near main road")
plt.ylabel("Frequency")
plt.hist(housing_data.mainroad)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--trE565XD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/euqoakwjyftf77k3fj1h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--trE565XD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/euqoakwjyftf77k3fj1h.png" alt="Image description" width="472" height="302"&gt;&lt;/a&gt;&lt;br&gt;
The histogram shows that many of the houses are near the main road&lt;/p&gt;

&lt;h3&gt;
  
  
  guestroom
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("guest room presence")
plt.xlabel("Guest room")
plt.ylabel("Frequency")
plt.hist(housing_data.guestroom)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Y2m_mIm9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/88v0pumr73gwh21cf088.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Y2m_mIm9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/88v0pumr73gwh21cf088.png" alt="Image description" width="472" height="302"&gt;&lt;/a&gt;&lt;br&gt;
This histogram reveals that a majority of the houses have no guest room&lt;/p&gt;

&lt;h3&gt;
  
  
  basement
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Basement presence")
plt.xlabel("basement")
plt.ylabel("Frequency")
plt.hist(housing_data.basement)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cm0ZZ8zw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wmjbce4bk3xgoe3l5rfl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cm0ZZ8zw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wmjbce4bk3xgoe3l5rfl.png" alt="Image description" width="472" height="302"&gt;&lt;/a&gt;&lt;br&gt;
The histogram shows that more houses have no basement as compared to those that have. &lt;/p&gt;

&lt;h3&gt;
  
  
  hotwaterheating
&lt;/h3&gt;

&lt;p&gt;plt.title("Hotwater heating presence")&lt;br&gt;
    plt.xlabel("hotwater")&lt;br&gt;
    plt.ylabel("Frequency")&lt;br&gt;
    plt.hist(housing_data.hotwaterheating)&lt;br&gt;
    plt.show()&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Mye2rV6y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z0sf2xcolm4wk228upie.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Mye2rV6y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z0sf2xcolm4wk228upie.png" alt="Image description" width="472" height="302"&gt;&lt;/a&gt;&lt;br&gt;
This histogram reveals that more houses do not have hot-water-heating that those that have.&lt;/p&gt;

&lt;h3&gt;
  
  
  airconditioning
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("airconditioning presence")
plt.xlabel("airconditioning")
plt.ylabel("Frequency")
plt.hist(housing_data.airconditioning)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mYAI7xKs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xv7fg23lfe23rmf7y1n3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mYAI7xKs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xv7fg23lfe23rmf7y1n3.png" alt="Image description" width="472" height="302"&gt;&lt;/a&gt;&lt;br&gt;
The histogram shows that more houses do not have air conditioning than those that have&lt;/p&gt;

&lt;h3&gt;
  
  
  parking
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("parking size (no. cars)")
plt.xlabel("no. of cars")
plt.ylabel("Frequency")
plt.hist(housing_data.parking)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kt94m6-e--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6j3p2i1kevybtce6m6ae.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kt94m6-e--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6j3p2i1kevybtce6m6ae.png" alt="Image description" width="472" height="302"&gt;&lt;/a&gt;&lt;br&gt;
Many houses have no parking&lt;/p&gt;

&lt;h3&gt;
  
  
  prefarea
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("prefarea")
plt.xlabel("prefarea")
plt.ylabel("Frequency")
plt.hist(housing_data.prefarea)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3vwQmps2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/t4sbtke0zmv6p1k8uwbx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3vwQmps2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/t4sbtke0zmv6p1k8uwbx.png" alt="Image description" width="472" height="302"&gt;&lt;/a&gt;&lt;br&gt;
This histogram reveals that majority of the houses are not in the area of preferrence&lt;/p&gt;

&lt;h3&gt;
  
  
  furnishing status
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Furnishing Status")
plt.xlabel("furnishing")
plt.ylabel("Frequency")
plt.hist(housing_data.furnishingstatus)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wNSV2LSa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pq44vgtr9eba6stnn15m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wNSV2LSa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pq44vgtr9eba6stnn15m.png" alt="Image description" width="472" height="302"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are three states of furnishing (furnished, semi-furnished, unfurnished)&lt;br&gt;
Majority of the houses are semi-furnished&lt;/p&gt;

&lt;h2&gt;
  
  
  Bivariate Analysis
&lt;/h2&gt;




&lt;p&gt;As we discussed earlier, Bivariate analysis is a kind of statistical analysis in which two variables are observed against each other. One variable will be dependent and the other is independent.&lt;/p&gt;

&lt;p&gt;Using Bivariate analysis we will see how the various features relate to house price:&lt;/p&gt;

&lt;p&gt;We'll use various visualizations to uncover the relationships, some are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scatter plot&lt;/li&gt;
&lt;li&gt;bar charts etc&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;1.1 &lt;strong&gt;&lt;em&gt;area vs price&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Let's plot a scatter plot of area and price.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Area vs price")
housing_data.scatter(housing_data.price, housing_data.area)
plt.xlabel("Price")
plt.ylabel("Area")
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xq_9C7h3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vf3u55ge64ca9dj91487.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xq_9C7h3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vf3u55ge64ca9dj91487.png" alt="Image description" width="454" height="308"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;from the image result we can infer that most of the houses that are 6 million and below have an area of 8,000 square feet. Thus the cheaper the house the smaller the area.&lt;br&gt;
Some houses that are expensive even though the area is small and vice versa... This could either be :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an outlier&lt;/li&gt;
&lt;li&gt;affected by other features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We will uncover this later as we look for correlation between the variables.&lt;/p&gt;

&lt;p&gt;1.2 &lt;strong&gt;&lt;em&gt;bedrooms vs price&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Let's try to understand house the number of bedrooms affect the price of a house.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("bedrooms vs price")
plt.xlabel("Price")
plt.ylabel("Bedrooms")
plt.bar(housing_data.bedrooms, housing_data.price)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tM0if8YN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dp95tizqigxscvdi2jwi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tM0if8YN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dp95tizqigxscvdi2jwi.png" alt="Image description" width="458" height="299"&gt;&lt;/a&gt;&lt;br&gt;
From the graph we can observe that the lesser the number of bedroom. But the relationship is not that linear like in regression, the most expensive houses are 4-bedrooms. We would expect that houses with 5 and 6 bedrooms to be more expensive. Some factors could be affecting this assumption.&lt;/p&gt;

&lt;p&gt;Let's see the same graph using a scatter plot:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("bedrooms vs price")
plt.xlabel("Bedrooms")
plt.ylabel("price")
plt.scatter(housing_data.bedrooms, housing_data.price)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8yXPuyjp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/p978y7xhh2z627ermec1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8yXPuyjp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/p978y7xhh2z627ermec1.png" alt="Image description" width="458" height="299"&gt;&lt;/a&gt;&lt;br&gt;
We can now see how the prices are distributed for each number of bedrooms.&lt;br&gt;
if we look closely we can see that 3-bedroom houses are many compared to 4-bedrooms.&lt;/p&gt;

&lt;p&gt;1.3 &lt;strong&gt;&lt;em&gt;bathrooms vs price&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Let's see how the price of the houses compare to the number of bathrooms&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("bedrooms vs price")
plt.xlabel("Bathrooms")
plt.ylabel("price")
plt.scatter(housing_data.bathrooms, housing_data.price)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uDgfoEwJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bijhe1muobwivial2kei.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uDgfoEwJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bijhe1muobwivial2kei.png" alt="Image description" width="458" height="299"&gt;&lt;/a&gt;&lt;br&gt;
From the graph above we can infer the following observations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most of the houses have 1 or 2 bathrooms a total of 534 houses, majority having 1 bathroom (401)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;1.4 &lt;strong&gt;&lt;em&gt;stories&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Let's explore the relationship between the number of stories and the price of the house&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Stories vs Price")
plt.xlabel("Stories")
plt.ylabel("Price")
plt.scatter(housing_data.stories, housing_data.price)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jnb0DYhA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uu5285v2j0qywfud6q2e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jnb0DYhA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uu5285v2j0qywfud6q2e.png" alt="Image description" width="458" height="299"&gt;&lt;/a&gt;&lt;br&gt;
We can observe that many have 1 or 2 strories.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some outliers  can be viewed such as in houses with 3 stories. &lt;/li&gt;
&lt;li&gt;The number of stories can be seen affecting the price of the house &lt;/li&gt;
&lt;li&gt;There must be a drive that makes people to opt to 1 or 2 story houses. If we view a bar graph for the same relationship we see that the lesser the number of stories the more the range of prices ... This indicates there are various factors that still affect the price of the house such as furnishing and maybe nearness to the main road.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;1.5. &lt;strong&gt;&lt;em&gt;main road&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Nearness to the main road definitely affects the price of the house. It also affect the number of houses available.&lt;br&gt;
Let's dive into visualization and see the relationship between the prices of the houses to the distance to the main road.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Nearness to mainroad vs price")
plt.xlabel("Near main road")
plt.ylabel("Price")
plt.bar(housing_data.mainroad, housing_data.price)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RhygDlIa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4mjigb4tc6rfsw803a79.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RhygDlIa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4mjigb4tc6rfsw803a79.png" alt="Image description" width="458" height="299"&gt;&lt;/a&gt;&lt;br&gt;
We can see that houses that are near to the more expensive than those that are not.&lt;br&gt;
Such insights could have many conclusions but we'll dive into that since the project scope is still minimal.&lt;/p&gt;

&lt;p&gt;1.6. &lt;strong&gt;&lt;em&gt;guestroom&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Shows if a house has a guest room or not.&lt;br&gt;
We want to view the relationship between houses having a guest room or not and the price for each accord.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Guest room present vs Price")
plt.xlabel("Guest room")
plt.ylabel("Price")
plt.scatter(housing_data.guestroom, housing_data.price)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--S8Sd0giX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1mjre8qr4m7kdigp943e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--S8Sd0giX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1mjre8qr4m7kdigp943e.png" alt="Image description" width="458" height="299"&gt;&lt;/a&gt;&lt;br&gt;
from the graph we can infer that most of the houses have no guest room. &lt;br&gt;
This does not mean that they are cheap, other metrics could be making the houses to be expensive.&lt;br&gt;
We will see in the correlation graph.&lt;br&gt;
The houses that have no guest room have a wider range of prices as compared to those that have a guest room.&lt;/p&gt;

&lt;p&gt;1.7. &lt;strong&gt;&lt;em&gt;basement&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Having a basement may make the house price go up. We'l, explore on the effect of a house having a basement to the price of the house&lt;br&gt;
Let's have a visual view of the relationship:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Basement present vs Price")
plt.xlabel("Basement Present")
plt.ylabel("Price")
plt.scatter(housing_data.basement, housing_data.price)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7wfp9q95--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/blq0p2h6liethls7llpy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7wfp9q95--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/blq0p2h6liethls7llpy.png" alt="Image description" width="510" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The graph shows the distribution of houses in respect to it having a basement or not.&lt;br&gt;
We see also that more houses do not have a basement as compared to those that have.&lt;br&gt;
The price range of houses in respect to having a basement is wider in those that do not have a basement as compared to those that have.&lt;/p&gt;

&lt;p&gt;1.8. &lt;strong&gt;&lt;em&gt;hot overheating&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Houses that have hot overheating can have an influence in the price of the house. Let's check on how the hot overheating feature affect the price or how it relates to price.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Hot Waterheating present vs Price")
plt.xlabel("hotWaterheating Present")
plt.ylabel("Price")
plt.scatter(housing_data.hotwaterheating, housing_data.price)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tH8uDf1I--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9ufcotr7ha2od2zj6m1r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tH8uDf1I--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9ufcotr7ha2od2zj6m1r.png" alt="Image description" width="428" height="290"&gt;&lt;/a&gt;&lt;br&gt;
From the graph above we can clearly see that most houses do not have the hot water heating feature.&lt;br&gt;
The range of price for the houses without the hot water heating feature is much greater than that of the houses that do have the feature. &lt;br&gt;
The wide range for the houses without the hot water heating feature are probably affected by other features since the  price ranges are not majorly related.&lt;/p&gt;

&lt;p&gt;1.9. &lt;strong&gt;&lt;em&gt;aircondtitioning&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Airconditioning feature is either present in a house or absent. This presence or absence of airconditioning can affect the price of houses.&lt;br&gt;
Let's explore the relationship between the airconditioning feature and house prices&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Airconditioning present vs Price")
plt.xlabel("aircondtioning Present")
plt.ylabel("Price")
plt.scatter(housing_data.airconditioning, housing_data.price)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pOn6mkMF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yqsnej85r6hylhlzqpdu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pOn6mkMF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yqsnej85r6hylhlzqpdu.png" alt="Image description" width="454" height="308"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By looking at the graph we can infer that the houses are almost evenly distributed.&lt;br&gt;
But those houses with air-conditioning have a wider range of prices.&lt;/p&gt;

&lt;p&gt;2.0. &lt;strong&gt;&lt;em&gt;Parking&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Various houses have a whole number of vehicles that can be accommodated.  Houses that have 0 parking value mean that the houses do not have any parking space.&lt;br&gt;
Let's explore the relationship between the parking and the price of the houses.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Parking vs Price")
plt.xlabel("parkig")
plt.ylabel("Price")
plt.scatter(housing_data.parking, housing_data.price)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1HNvOw6x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wt2tw6gp84hnfreo89t9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1HNvOw6x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/wt2tw6gp84hnfreo89t9.png" alt="Image description" width="454" height="308"&gt;&lt;/a&gt;&lt;br&gt;
We can infer from the graph that there are 4 categories of parking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;0 parking - no parking space&lt;/li&gt;
&lt;li&gt;1 vehicle parking space&lt;/li&gt;
&lt;li&gt;2 vehicle parking space&lt;/li&gt;
&lt;li&gt;3 vehicle parking space&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The price ranges are large in all the 4 categories.&lt;br&gt;
There may be a factor affecting this which we will uncover when checking for correlation.&lt;br&gt;
The houses with 1 parking space can be viewed as closely distributed.&lt;/p&gt;

&lt;p&gt;2.1. &lt;strong&gt;&lt;em&gt;prefarea&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Houses can be either be in an area of preference or not. We might want to uncover the relationship between preference and the price of the house.&lt;br&gt;
Let's compare the prefarea to the price of the house&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Prefarea vs Price")
plt.xlabel("prefarea")
plt.ylabel("Price")
plt.scatter(housing_data.prefarea, housing_data.price)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3N_YbHaJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/weviol83l5ln26zk7lzj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3N_YbHaJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/weviol83l5ln26zk7lzj.png" alt="Image description" width="454" height="308"&gt;&lt;/a&gt;&lt;br&gt;
From the graph we can infer that majority of the houses are not in the preferred area. These houses (not preferred area) have a wider range in distribution as compared to those that are preferred.&lt;br&gt;
Seemingly the houses that are the preferred area are much higher in the price as compared to those that are not.&lt;/p&gt;

&lt;p&gt;2.2. &lt;strong&gt;&lt;em&gt;furnishing status&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
A house can either be furnished , semi-furnished or unfurnished.&lt;br&gt;
This may have an effect in the pricing of the house.&lt;br&gt;
We will use a scatter plot to uncover insights about the relationship between furnishing status and the price of the houses.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.title("Furnishing Status vs Price")
plt.xlabel("furnishing stats")
plt.ylabel("Price")
plt.scatter(housing_data.furnishingstatus, housing_data.price)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kojI_xB5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/n7j304xitl3anrcvcq41.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kojI_xB5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/n7j304xitl3anrcvcq41.png" alt="Image description" width="454" height="308"&gt;&lt;/a&gt;&lt;br&gt;
from the graph we can se that the houses are almost even;y distributed in terms of numbers per category.&lt;br&gt;
We can also see that the furnished status of the house has a wide range in prices and also recording the highest prices.&lt;/p&gt;

&lt;p&gt;I will add an update to cater for correlation between every feature.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Python 101 - Python for Data Science</title>
      <dc:creator>Elvis Mburu</dc:creator>
      <pubDate>Sun, 19 Feb 2023 14:42:53 +0000</pubDate>
      <link>https://dev.to/mburu_elvis/python-101-python-for-data-science-18k6</link>
      <guid>https://dev.to/mburu_elvis/python-101-python-for-data-science-18k6</guid>
      <description>&lt;p&gt;Python is a high-level, general purpose programming language.&lt;br&gt;
Python is dynamically typed.&lt;br&gt;
The language is object-oriented and supports functional programming too.&lt;br&gt;
Python was developed by Guido van Rossum in the 1980's.&lt;/p&gt;

&lt;p&gt;Since there are many programming languages lets look at why Python may be the best fit for you:&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Advantages of Python&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Simplicity in it's use and hence simple to understand.&lt;/li&gt;
&lt;li&gt;It's free and Open-Source: This is made possible by a whole wide range diverse and vibrant community determined to develop and improve it.&lt;/li&gt;
&lt;li&gt;Interpreted Language: This means that Python directly executes the code line by line. Incase of an error, it stops further execution and reports back the error which has occurred.&lt;/li&gt;
&lt;li&gt;Extensive library: Python has an extensive library of different packages and methods thus reducing coding many functions from scratch.&lt;/li&gt;
&lt;li&gt;Dynamically Typed&lt;/li&gt;
&lt;li&gt;Portability: This ensures code developed in one machine runs in another machine including those having different architectures.&lt;/li&gt;
&lt;li&gt;Supportive and vibrant large community.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Applications of Python&lt;/strong&gt;&lt;br&gt;
Python as a language has traversed many use cases and is now being used in many fields and domains.&lt;/p&gt;

&lt;p&gt;Here's a few of them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Web applications&lt;/li&gt;
&lt;li&gt;Automation&lt;/li&gt;
&lt;li&gt;Artificial Intelligence&lt;/li&gt;
&lt;li&gt;Statistics&lt;/li&gt;
&lt;li&gt;Data Analysis&lt;/li&gt;
&lt;li&gt;Machine Learning&lt;/li&gt;
&lt;li&gt;Desktop Applications&lt;/li&gt;
&lt;li&gt;Back-end Development&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Deep Dive Into Python&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We'll put our focus on  the Python for Data Science.&lt;br&gt;
But first we'll build our Python muscles by understanding the basics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Outline&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Introduction to variables&lt;/li&gt;
&lt;li&gt;Data types in python&lt;/li&gt;
&lt;li&gt;Operators in Python&lt;/li&gt;
&lt;li&gt;Data Structures &lt;/li&gt;
&lt;li&gt;Control Flows&lt;/li&gt;
&lt;li&gt;Functions&lt;/li&gt;
&lt;li&gt;Packages &lt;/li&gt;
&lt;li&gt;Data Science&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Setting up Coding Environment&lt;/strong&gt;&lt;br&gt;
You can use the following tools:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Jupyter notebooks
&lt;strong&gt;Windows:&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;install Python
    link: &lt;a href="https://www.python.org/downloads/" rel="noopener noreferrer"&gt;click here&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Download and install &lt;strong&gt;Anaconda&lt;/strong&gt; here :
    link: &lt;a href="https://docs.anaconda.com/anaconda/install/windows/" rel="noopener noreferrer"&gt;click here&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mac OS&lt;/strong&gt; : &lt;a href="https://docs.anaconda.com/anaconda/install/mac-os/" rel="noopener noreferrer"&gt;click here&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linux OS&lt;/strong&gt; : &lt;a href="https://docs.anaconda.com/anaconda/install/linux/" rel="noopener noreferrer"&gt;click here&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google colab&lt;/strong&gt; : It's an online environment to run Python code&lt;/li&gt;
&lt;li&gt;you can access it : &lt;a href="https://colab.research.google.com" rel="noopener noreferrer"&gt;click here&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;1. Introduction to variables&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;What are variables? You might ask.&lt;br&gt;
   A variable is a value that can changes and is assigned a value to which it refers to.&lt;/p&gt;

&lt;p&gt;Remember this in your O and A levels:&lt;br&gt;
&lt;code&gt;let x be 12&lt;/code&gt; or even &lt;code&gt;y=mx+c&lt;/code&gt; &lt;br&gt;
   In this case x and y are variables that refer to/represent something else&lt;/p&gt;

&lt;p&gt;Something amazing with them is that they can be used multiple times and refer to different values each time.&lt;br&gt;
   Example:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x = 2 
x = 4
x = 8
or in the case of `y=mx+c` where c is a constant
y = 43 + 3
y = 45 + 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;In Python variables are pretty much the same as the concept used in Mathematics.&lt;br&gt;
They are used to refer to various values&lt;/p&gt;

&lt;p&gt;Example&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x=56
y=45.34
hello= "Hello, world"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;There are various rules governing variable naming.&lt;br&gt;
Here's a few:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Variable names cannot be the same as Python keywords&lt;/li&gt;
&lt;li&gt;variable names can only contain letters, digits or an underscore&lt;/li&gt;
&lt;li&gt;Variable names can only start with a letter or an underscore&lt;/li&gt;
&lt;li&gt;Variable names cannot contain spaces&lt;/li&gt;
&lt;li&gt;Variables names are case-sensitive thus &lt;code&gt;myName&lt;/code&gt; and &lt;code&gt;MyName&lt;/code&gt; are regarded as different variable names&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's a link to the official guide &lt;a href="'https://peps.python.org/pep-0008/'"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;2. Data Types&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A data type is a classification that specifies which type of value of a variable has.&lt;br&gt;
There are various data types used in Python&lt;br&gt;
Here's a few that are supported in Python&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;strings: They refer to a sequence of characters, digits or symbols and are always treated as text.&lt;/li&gt;
&lt;li&gt;Boolean: True or False values&lt;/li&gt;
&lt;li&gt;Integer: Numeric data types that do not have fractions/decimals&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Float: Numeric data types that have fractions&lt;br&gt;
Example in code&lt;/p&gt;

&lt;p&gt;num1 = 1 # Integer&lt;br&gt;
num2 = 2.0 # Float&lt;br&gt;
bool1 = True # Boolean True&lt;br&gt;
bool2 = False # Boolean False&lt;br&gt;
myStr = "Hello, world" # String&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the above code example you have noticed something new that we have not talked about: The &lt;code&gt;#&lt;/code&gt; character.&lt;br&gt;
This character is used to denote a comment.&lt;br&gt;
What is a comment? &lt;br&gt;
    A comment is an explanation/annotation in the source code of a computer program&lt;br&gt;
    They are added to make the code easier to understand and are ignored by the interpreter hence not executed&lt;br&gt;
    Comments in Python are used in a single line&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;3. Operators in Python&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;There are two types of operators in Python&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Arithmetic Operators&lt;/li&gt;
&lt;li&gt;Conditional Operators&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;a. Arithmetic Operators&lt;/strong&gt;&lt;br&gt;
They perform basic Mathematical functions.&lt;br&gt;
Here's a simple list:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;+ addition &lt;code&gt;x + y&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-&lt;/code&gt; Subtraction &lt;code&gt;x-y&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;* Multiplication &lt;code&gt;x*y&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;/ Division &lt;code&gt;x/y&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;% Modulus &lt;code&gt;x%y&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;** Exponentiation &lt;code&gt;x**y&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;// Floor Division &lt;code&gt;x//y&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;b. Conditional Operators&lt;/strong&gt;&lt;br&gt;
 They are used in conditional statements that evaluate to True or False.&lt;br&gt;
Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;and  Logical AND: True if both the       x and y
     operands are true&lt;/li&gt;
&lt;li&gt;or   Logical OR: True if one of the      x or y
     operands is true&lt;/li&gt;
&lt;li&gt;not  Logical NOT:True if operand is      not x
     false and vice versa&lt;/li&gt;
&lt;li&gt;&amp;gt;    Greater than: True if the left      &lt;code&gt;x&amp;gt;y&lt;/code&gt;
     operand is greater than the right
&lt;/li&gt;
&lt;li&gt;&amp;lt;    Less than: True if the left operand  &lt;code&gt;x&amp;lt;y&lt;/code&gt;
     is less than the right one&lt;/li&gt;
&lt;li&gt;&amp;gt;=   Greater than or equal to             &lt;code&gt;x&amp;gt;=y&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&amp;lt;=   Less than or equal to                &lt;code&gt;x&amp;lt;=y&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;4. Data Structures&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A data structures are a way of organizing data so that it can be accessed more efficiently depending upon the situation.&lt;/p&gt;

&lt;p&gt;Here's a list of some of the main data structures in Python.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lists&lt;/li&gt;
&lt;li&gt;Dictionaries&lt;/li&gt;
&lt;li&gt;Sets&lt;/li&gt;
&lt;li&gt;Tuples&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;a. Lists&lt;/strong&gt;&lt;br&gt;
Lists refer to a data structure that is used to hold multiple items in one variable and can be created using &lt;code&gt;[]&lt;/code&gt; brackets&lt;br&gt;
   Example&lt;br&gt;
   &lt;code&gt;fruits = [] # Here we create an empty list&lt;br&gt;
    names = ['John', 'Doe'] # Here we create a list containing two items&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Lists are ordered and their items can be accessed by what we call indexing.&lt;br&gt;
In Python the first index is always 0.&lt;br&gt;
So in order to access an item in a list we use:&lt;br&gt;
    list_name[index]&lt;br&gt;
 for example:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fruits = ['apple', 'mango', 'melon', 'orange'] # a list containing 4 items
fruits[0] # accessing the first item 'apple' from the list
fruits[1] # accessing the the second item 'mango' from the list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Some list methods and manipulation
&lt;/h2&gt;

&lt;p&gt;** slicing **&lt;br&gt;
Refers to retrieving items from a specified portion in a list&lt;br&gt;
Examples:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fruits = ['apple', 'mango', 'melon', 'orange']
fruits[:] # retrieving every item in the list
fruits[0:2] # retrieving items from the first element to the element at index 2 exclusive
fruits[-1] # negative indexing, retrieving the last item
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;len()&lt;/strong&gt;&lt;br&gt;
the function returns the length of the list&lt;br&gt;
Example:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fruits = ['apple', 'mango', 'melon', 'orange']
print(len(fruits)) # prints 4 which is the number of elements in the list fruits
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;type()&lt;/strong&gt;&lt;br&gt;
Return the data type&lt;br&gt;
Example:&lt;br&gt;
&lt;code&gt;print(type(fruits)) # prints &amp;lt;class 'list'&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Lists are mutable, this means that they can modified.&lt;br&gt;
Thus:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you can add items to a list&lt;/li&gt;
&lt;li&gt;you can remove an item from a list&lt;/li&gt;
&lt;li&gt;you can change the list items&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fruits = ['apple', 'mango', 'melon', 'orange']
fruits.append('guava') # adding 'guava' at the end of the list
fruits.insert(1, 'passion') # inserting 'passion' at index 1 of the list
fruits.pop() # remove the last item in the list
fruits.remove('apple') # removing apple from the list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;b. tuple&lt;/strong&gt;&lt;br&gt;
Tuples are used to store multiple items in a single variable.&lt;br&gt;
Tuples are immutable, thus you can not alter the form in which they were created.&lt;br&gt;
They store items in &lt;code&gt;()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Example&lt;br&gt;
&lt;code&gt;thisTuple = ('apple', 'banana', 'berry') # creating a tuple named 'thisTuple' with three items&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tuples are ordered&lt;/li&gt;
&lt;li&gt;Tuples are immutable&lt;/li&gt;
&lt;li&gt;Tuples allow duplicates&lt;/li&gt;
&lt;li&gt;Tuples can contain different data types&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;a type()&lt;/strong&gt;&lt;br&gt;
    Returns the tuple's data type&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mytuple = (1, 2, 3, 4)
print(type(mytuple)) # returns &amp;lt;class 'tuple'&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;c. Set&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;It is a collection which is unordered, immutable and un-indexed&lt;br&gt;
No duplicate members&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;names = {'one', 'two', 'three'}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;d. Dictionary&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;It's a data structure that consists of key-value pairs.&lt;br&gt;
It's ordered, mutable and doesn't allow duplicates.&lt;/p&gt;

&lt;p&gt;Dictionaries are written with curly brackets and have keys and values.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;myDict = {
    'brand': 'Ford',
    'model': 'Mustang',
    'year': 1964
} # creating a dictionary with 3 sets of elements (key-value pairs)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;5. Control Flows&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;a. if Statements&lt;/strong&gt;&lt;br&gt;
It is a conditional statement that is used to determine whether a block of code will be executed or not.&lt;/p&gt;

&lt;p&gt;If the condition defined evaluates to true, it will continue to  execute the code block in the if statement&lt;/p&gt;

&lt;p&gt;Example of if-statement&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;age = 20
if (age &amp;gt; 18):
    print("You are an adult")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;What if you want to execute another block of code if age is not greater than 18?&lt;br&gt;
We make use of the &lt;code&gt;else&lt;/code&gt; statement&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;age = 20
if (age &amp;gt; 18):
    print("You are an adult")
else:
    print("You are still a minor")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;What if you want to test many conditions?&lt;br&gt;
We'll make use of &lt;code&gt;elif&lt;/code&gt; statement&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;age = 20
if (age &amp;lt; 18):
    print("You are a minor")
elif (age &amp;gt; 18 and age &amp;lt;= 35):
    print("You are an adult")
else:
    print("You are a senior adult")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;we can even use if statements inside other if statements.&lt;br&gt;
They are called nested if statements.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;age = 20
if (age &amp;gt; 18):
    if (age &amp;lt; 35):
        print("You are an youth")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;b. for Statements&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;It iterates over the items of any sequence, in the order that they appear in the sequence&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;words = ['cat', 'window', 'defenestrate']
for word in words:
    print(word, len(words))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;c. while Statement&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;It is used for repeated execution as long as an expression is true.&lt;br&gt;
Example:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;number = 5
x = 0
while ( x &amp;lt; number):
    print(x)
    x++
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  The range() Function
&lt;/h3&gt;

&lt;p&gt;It generates arithmetic progressions&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for i in range(5):
    print(i)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;This generates 5 numbers 0 through 4 (remember python starts counting from 0)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The break and continue Statements
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;break&lt;/code&gt; statement breaks out of the innermost enclosing &lt;code&gt;for&lt;/code&gt; or &lt;code&gt;while&lt;/code&gt; loop&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for i in range(2, 10):
    for x in range(2, n):
        print(n, 'equals', x, '*', n//x)
        break

else:
    print(n, 'is a prime number')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;break&lt;/code&gt; statement continues with the next iteration of the loop&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for num in range(2, 10):
    if num % 2 == 0:
        print("Found an even number", num)

    print("Found an odd number", num)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  pass Statements
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;pass&lt;/code&gt; statement does nothing.&lt;br&gt;
It is often used when a statement is required syntantically but the program requires no action &lt;/p&gt;

&lt;p&gt;Example&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;while True:
    pass
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;6. Functions&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A function is a block of code which only runs when it is called.&lt;br&gt;
A function can return data&lt;/p&gt;

&lt;p&gt;There are four types of Python Functions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Built-in functions - they are functions embedded in the Python interpreter and are ready for use.&lt;br&gt;
You have certainly come across some by now example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;len() - finding the length of a list, tuple etc&lt;/li&gt;
&lt;li&gt;print() - display a sequence of characters&lt;/li&gt;
&lt;li&gt;type() - return the data type of a data structure etc&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Recursion functions - refers to functions that call themselves&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lambda functions - they are anonymous function that are defined without a name&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;User defined functions - they are functions defined by the user to do a specific task&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example of user defined functions&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def greetings(): #defining the function
    print("Hello All")

greetings() #calling the function
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;7. Packages&lt;/strong&gt;&lt;br&gt;
Packages are collections of multiple Python files.&lt;br&gt;
Packages are a directory of python scripts, where each script performs a specific function.&lt;/p&gt;

&lt;p&gt;For Data Science, the commonly used packages are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Numpy: Used for working with arrays&lt;/li&gt;
&lt;li&gt;Matplotlib: Used for Data Visualization&lt;/li&gt;
&lt;li&gt;Scikit-learn: For Machine Learning Algorithms
The Python files are known as modules.
This approach helps achieve modularization.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  importing packages
&lt;/h3&gt;

&lt;p&gt;Packages can contain sub-packages which also have modules&lt;br&gt;
To load any package or module, we use the keyword &lt;code&gt;import&lt;/code&gt; followed by the module name or package name&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;i. numpy&lt;/strong&gt;&lt;br&gt;
numpy - Numerical Python&lt;br&gt;
It's a core library for scientific computing.&lt;br&gt;
It provides high performance multi-dimensional array object and tools for working with these objects&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;numpy vs. python list&lt;br&gt;
Numpy is much faster in performance than purely Python based approach&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;creating numpy array from a python list&lt;/p&gt;

&lt;p&gt;import numpy as np # importing numpy package and giving it a np alias&lt;br&gt;
marks = [78, 47, 98, 43, 58] # creating a list&lt;br&gt;
marks_np = np.array(marks)&lt;br&gt;
print(type(marks_np)) # prints &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ndarray attributes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;ul&gt;
&lt;li&gt;ndim: number of dimensions of the array&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;ul&gt;
&lt;li&gt;shape: shape of the array array (n_rows, n_cols)&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;ul&gt;
&lt;li&gt;dtypes: data types stored in the array&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;ul&gt;
&lt;li&gt;size: the total number of elements in the array&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;ul&gt;
&lt;li&gt;strides: number of bytes that must be moved to store each row and column in memory (no_bytes_files, no_bytes_columns)&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print('dimension ',mark_s.ndim)
print('shape ', mark_s.shape)
print('size ', mark_s.size)
print('dtype ', mark_s.dtype)
print('strides ', mark_s.strides)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;some key functions defined for numpy arrays&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;zeros(shape=(n,m)) : creates a zero-array with the shape (n rows, m columns)&lt;/p&gt;

&lt;p&gt;x = np.zeros(shape=(3,5), dtype ="int32")&lt;br&gt;
print(x)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;arange(start=i, stop=j, step=u) : creates a 1-D array whose first value is i inclusive and last value of j exclusive, each values has a step of s to the next or from previous&lt;/p&gt;

&lt;p&gt;x = np.arange(start=100, stop=1000, step=100, dtype="int32")&lt;br&gt;
print(x)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;linspace(start=i, stop=j, num=n) : creates a 1-D array whose first value is i inclusive, last value is j inclusive and contains n values in total&lt;/p&gt;

&lt;p&gt;x_lin = np.linspace(start=10, stop=50, num=30)&lt;br&gt;
print(x_lin)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;full(shape=(n,m), fill_value=f) : allows to create an array with the shape (n rows, m columns), where all positions have the value f.&lt;/p&gt;

&lt;p&gt;x_ful = np.full(shape=(5,6), fill_value=3)&lt;br&gt;
print(x_ful)&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;ii. Pandas&lt;/strong&gt;&lt;br&gt;
Stands for Python Data Analysis Library&lt;br&gt;
It is an open-source Python library&lt;br&gt;
It is used by data scientists/analysts to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read&lt;/li&gt;
&lt;li&gt;write&lt;/li&gt;
&lt;li&gt;manipulate&lt;/li&gt;
&lt;li&gt;analyze the data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why Pandas?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It helps you explore and manipulate data in an efficient manner&lt;/li&gt;
&lt;li&gt;It helps you analyze large volumes of data with ease&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why is Pandas popular?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Easy to read and learn&lt;/li&gt;
&lt;li&gt;Fast and powerful&lt;/li&gt;
&lt;li&gt;Integrates well with other visualization libraries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;importing pandas&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas
import pandas as pd # creating an alias for pandas
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Pandas Series&lt;/strong&gt;&lt;br&gt;
A series is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a 1-D labelled array&lt;/li&gt;
&lt;li&gt;can hold data of any type&lt;/li&gt;
&lt;li&gt;similar to a table's column&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A Series can have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Integers&lt;/li&gt;
&lt;li&gt;Strings&lt;/li&gt;
&lt;li&gt;Both numbers and strings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Series data type is &lt;code&gt;object&lt;/code&gt;&lt;br&gt;
Series are indexed, starting from 0&lt;/p&gt;

&lt;p&gt;Creating a Series&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
numbers = [1, 3, 5, 5, 7, 9, 13, 56]
pd.Series(numbers) # A series from a list

country = {'Kenya': 'Nairobi', 'Tanzania': 'Dodoma', 'Uganda': 'Kampala'}
pd.Series(country) # Creating a series from a dictionary, the dict keys will be the index for the Series
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Pandas DataFrame&lt;/strong&gt;&lt;br&gt;
A DataFrame is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a 2-D table&lt;/li&gt;
&lt;li&gt;made up of a collection of Series&lt;/li&gt;
&lt;li&gt;Structured with labeled axes (rows and columns)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You create a DataFrame with the &lt;code&gt;.DataFrame()&lt;/code&gt; method&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
data = {'item_id': [1, 2, 3, 4, 5], 'item_name': ['chocolate', 'floor', 'sugar', 'ice cream', 'soap'], "item_price": [356.00, 200.00, 150.00, 55.00, 187.00]}
pd.DataFrame(data)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The DataFrame has 3 columns each containing 5 entries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Some pandas functions and methods
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.head()&lt;/code&gt; shows the top entries in a DataFrame. Number of values to be shown can be specified in it.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.tail()&lt;/code&gt; shows the last entries in a DataFrame. Number of values to be shown can be specified.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.descibe()&lt;/code&gt; gives the statistical analysis of the each column in the DataFrame&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.shape&lt;/code&gt; describes the rows and columns present in the DataFrame&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;.info()&lt;/code&gt; gives a summary of the DataFrame showing the sum of not null values &lt;/p&gt;

&lt;p&gt;data.shape&lt;br&gt;
data.head(5)&lt;br&gt;
data.tail(9)&lt;br&gt;
data.info()&lt;br&gt;
data.describe()&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can access a column by using it as the index of the DataFrame&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(data['item_name']) # This outputs the entries in the 'item_name' column&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  &lt;strong&gt;8. Data Science&lt;/strong&gt;&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;Data Science is a field that combines math and statistics, Specialized programming, advanced analytics, artificial intelligence and machine learning with specific subject matter expertise to uncover actionable insights hidden in an organization's data.&lt;/p&gt;

&lt;p&gt;Steps involved in data science process:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business understanding/analysis&lt;/li&gt;
&lt;li&gt;Data Exploration and Preparation&lt;/li&gt;
&lt;li&gt;Data Transformation and Representation&lt;/li&gt;
&lt;li&gt;Data visualization&lt;/li&gt;
&lt;li&gt;Data Modelling, Training, Validation and Deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some of the Python Libraries used for Data Science:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;NumPy&lt;/li&gt;
&lt;li&gt;Pandas&lt;/li&gt;
&lt;li&gt;Scipy&lt;/li&gt;
&lt;li&gt;Matplotlib&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since Data Science is a team spot environments that allow collaboration such as sharing code.&lt;br&gt;
Such environments are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jupyter notebooks&lt;/li&gt;
&lt;li&gt;Github&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We'll deep dive into Data Science in the next article&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>playlist</category>
      <category>gratitude</category>
      <category>discuss</category>
    </item>
  </channel>
</rss>
