<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Marvin Ewarn Okwaro</title>
    <description>The latest articles on DEV Community by Marvin Ewarn Okwaro (@wekessah).</description>
    <link>https://dev.to/wekessah</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1027596%2F3db25ab4-de30-4955-8198-ba52dd20cd90.jpeg</url>
      <title>DEV Community: Marvin Ewarn Okwaro</title>
      <link>https://dev.to/wekessah</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/wekessah"/>
    <language>en</language>
    <item>
      <title>Exploratory Data Analysis, The Ultimate Guide (With Python)</title>
      <dc:creator>Marvin Ewarn Okwaro</dc:creator>
      <pubDate>Wed, 01 Mar 2023 10:20:12 +0000</pubDate>
      <link>https://dev.to/wekessah/exploratory-data-analysis-with-python-17pg</link>
      <guid>https://dev.to/wekessah/exploratory-data-analysis-with-python-17pg</guid>
      <description>&lt;p&gt;&lt;strong&gt;What id EDA?&lt;/strong&gt;&lt;br&gt;
Exploratory Data Analysis (EDA) involves Data Analytic Process used to understand data in depth, learn the different characteristics of data (data visualization) and finding useful patterns in data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why perform EDA&lt;/strong&gt;&lt;br&gt;
There are several reasons for performing EDA on a data set. These include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Removing any irregularities and unnecessary values in the data set, identifying faulty points and noise in data early&lt;/li&gt;
&lt;li&gt;Preparing the dataset for analysis
-Allowing machine learning models to better predict the dataset
-Getting more accurate results from a dataset
-EDA also helps in choosing a better Machine Learning model
-We can use EDA to filter for redundancies
-EDA can help stakeholders to know if they are asking the right questions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are three major steps involved in EDA;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Understanding the data
Here we get to understand the variables in the data, and also know parameters such as the number of columns and rows.&lt;/li&gt;
&lt;li&gt;Cleaning the data
In this step, we get to remove any outliers, any irregularities and any none-useful parts/fields that may affect the end model.
Outliers include datasets that fall outside/ differ significantly from the main observations.&lt;/li&gt;
&lt;li&gt;Analyzing relationship between variables.
We can use tools such as Co-Relation matrix, which is a table showing the co-relation co-efficient between variables, each cell showing the relation between 2 variables.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Using Python To Perform EDA&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Assuming you have Python installed, here are some of the steps and code samples used in performing EDA using Python Programming Language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Libraries&lt;/strong&gt;&lt;br&gt;
There are several libraries that may be used to perform EDA:&lt;/p&gt;

</description>
    </item>
    <item>
      <title>A Brief Introduction to SQL for Data Science</title>
      <dc:creator>Marvin Ewarn Okwaro</dc:creator>
      <pubDate>Sun, 19 Feb 2023 15:46:03 +0000</pubDate>
      <link>https://dev.to/wekessah/a-brief-introduction-to-sql-for-data-science-40i6</link>
      <guid>https://dev.to/wekessah/a-brief-introduction-to-sql-for-data-science-40i6</guid>
      <description>&lt;p&gt;If you are a beginner in the field of Data Science (DS), or you just want to learn how to handle large sets of data using SQL, you are in the right place. I am about to take you through the basics of SQL, some background info on DS and then teach you how to handle large sets of data (from CSV file) using SQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  What exactly is Data Science?
&lt;/h2&gt;

&lt;p&gt;Data Science involves deriving useful insights (using statistics, scientific methods, algorithms and systems) from data to solve real world problems (by analysis, preparation of data for analysis, exploration and visualization of the data).&lt;/p&gt;

&lt;h2&gt;
  
  
  Why SQL?
&lt;/h2&gt;

&lt;p&gt;First, a brief introduction to what exactly SQL is. &lt;/p&gt;

&lt;p&gt;SQL stands for Standard Query Language. It is a standard language for storing, manipulating and retrieving data in databases. Basically, SQL is a language used for communicating with a database.&lt;br&gt;
The data is normally stored in a Relational Database Management System (RDMS). Examples of RDMS include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;MySQL,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Oracle&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PostgreSQL&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SQLite&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, to the question &lt;em&gt;Why SQL?&lt;/em&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;SQL is quite easy to understand&lt;/li&gt;
&lt;li&gt;SQL is opensource meaning you will find a lot of devs working to maintain it.&lt;/li&gt;
&lt;li&gt;SQL is highly scalable&lt;/li&gt;
&lt;li&gt;It is platform independent, and can work with almost, if not all the OS systems (like Windows, Linux, Mac etc)&lt;/li&gt;
&lt;li&gt;It is considered to be fast, at least according to most of the benchmarks done.&lt;/li&gt;
&lt;li&gt;It has features that can help to increase the developer's productivity, like stored procedures and views.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  What are the various datatypes in SQL?
&lt;/h2&gt;

&lt;p&gt;Here are the various datatypes, click on them to learn more about them:&lt;br&gt;
-Numeric - [int, smallint, float, real, decimal, double and precision]&lt;br&gt;
-Character Strings - [char, varchar]&lt;br&gt;
-Boolean - [True, false, unknown]&lt;br&gt;
-Bit-Strings -[bit, bitvarying]&lt;br&gt;
-Date/ Time -[date, time]&lt;br&gt;
-Timestamps and intervals&lt;/p&gt;
&lt;h2&gt;
  
  
  Basic Commands
&lt;/h2&gt;

&lt;p&gt;Why don't we dive into the main topic by first looking at the basic commands used. Before we do that though, I would like to point out a few best practices, that is &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;always use capital letters when writing commands to distinguish them with the variable names-names of columns, tables and values.&lt;/li&gt;
&lt;li&gt;Always terminate each commands with a semi-colon&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;CREATE DATABASE COMMAND&lt;/strong&gt;&lt;br&gt;
this is the command used to create a new database. Here is a format:&lt;br&gt;
&lt;code&gt;CREATE DATABASE databasename&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;USE DATABASE COMMAND&lt;/strong&gt;&lt;br&gt;
This command is used to activate an existing database for use. The command:&lt;br&gt;
&lt;code&gt;USE databasename&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CREATE TABLE COMMAND&lt;/strong&gt;&lt;br&gt;
This command creates a table in a selected database.&lt;br&gt;
The command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE IF NOT EXISTS `table_name` 
(
  id INT(10) PRIMARY KEY,           
  column_name datatype(length_in_characters) NOT NULL,
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;for example, this command will create a table called &lt;em&gt;users&lt;/em&gt;, with a &lt;em&gt;firstname, lastname&lt;/em&gt; and &lt;em&gt;phone number&lt;/em&gt; fields.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE `users` 
(
  id INT(10) PRIMARY KEY,           
  firstname VARCHAR(100) NOT NULL,
  laststname VARCHAR(100) NOT NULL,
  number int(15) NOT NULL,
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CREATE USER COMMAND&lt;/strong&gt;&lt;br&gt;
This creates a new user for the database.&lt;br&gt;
&lt;code&gt;CREATE USER 'newuser'@'%' IDENTIFIED BY 'user_password';&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;To grand privilledges to the user, we use:&lt;br&gt;
&lt;code&gt;GRANT ALL PRIVILEGES ON *.* TO 'newuser'@'%';&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You can show the privilledges assigned by using:&lt;br&gt;
&lt;code&gt;SHOW GRANTS FOR 'newuser'@'%';&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;or just remove the privilledges using&lt;br&gt;
&lt;code&gt;FLUSH PRIVILEGES;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;INSERT COMMAND&lt;/strong&gt;&lt;br&gt;
This will insert new records to an existing table.&lt;br&gt;
&lt;code&gt;INSERT INTO tbl_name (col1,col2) VALUES(15,col1*2);&lt;/code&gt;&lt;br&gt;
or&lt;br&gt;
&lt;code&gt;INSERT INTO tbl_name (col1,col2) VALUES(val1,val2);&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;INSTALLING MYSQL WORKBENCH&lt;/strong&gt;&lt;br&gt;
According to mysql.com, &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;MySQL Workbench is a visual database design tool that integrates SQL development, administration, database design, creation and maintenance into a single integrated development environment for the MySQL database system.&lt;br&gt;
 It can be found &lt;a href="https://dev.mysql.com/downloads/workbench/"&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After downloading, follow the usual steps of installation. You may be required to add the sql path to the environment variables on windows system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IMPORTING CSV DATA TO MYSQL WORKBENCH&lt;/strong&gt;&lt;br&gt;
Tis is a fairly easy process. First, launch the workbench. It may prompt you for a password, if you had previously set one, use it. If not, just hit enter to proceed to the dashboard.&lt;/p&gt;

&lt;p&gt;On the dashboard, you may create a database using the previous commands, then activate the database by using the USE command.&lt;/p&gt;

&lt;p&gt;On the navigator panel, on the left, select the database, then right click on the tables. Select the &amp;gt; Table Data Import Wizard&lt;br&gt;
as shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3WBUe2Mu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xbz64t0p9addfaqoirmm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3WBUe2Mu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xbz64t0p9addfaqoirmm.jpg" alt="Image description" width="880" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, select the CSV file then click next.  Choose the database (by default it may choose the database in use) then give the table a name and click next.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GN9FKwEc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/spbvg2xw2fxau6rrf3bw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GN9FKwEc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/spbvg2xw2fxau6rrf3bw.jpg" alt="Image description" width="880" height="730"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, all the columns will be automatically added. If the column name is too long, you may run into some issues while importing. You may need to shorten the names.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XUYsjzvE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/io66c2aniz21uhgpcq8u.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XUYsjzvE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/io66c2aniz21uhgpcq8u.jpg" alt="Image description" width="880" height="711"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, click next to import. The data will e imported into your newly created database in the selected table.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--z6tp90Ci--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ddjbubazdjz1vgfeus6y.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--z6tp90Ci--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ddjbubazdjz1vgfeus6y.jpg" alt="Image description" width="880" height="709"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>sql</category>
      <category>datascience</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
