<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: mary kariuki</title>
    <description>The latest articles on DEV Community by mary kariuki (@marykariuki90).</description>
    <link>https://dev.to/marykariuki90</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F940351%2F73bfcd13-e798-4710-92fd-a8ae55a306d4.png</url>
      <title>DEV Community: mary kariuki</title>
      <link>https://dev.to/marykariuki90</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/marykariuki90"/>
    <language>en</language>
    <item>
      <title>Numpy (numerical python)</title>
      <dc:creator>mary kariuki</dc:creator>
      <pubDate>Wed, 16 Nov 2022 08:08:14 +0000</pubDate>
      <link>https://dev.to/marykariuki90/numpy-numerical-python-4ap2</link>
      <guid>https://dev.to/marykariuki90/numpy-numerical-python-4ap2</guid>
      <description>&lt;h2&gt;
  
  
  Numpy in python
&lt;/h2&gt;

&lt;p&gt;Numpy is a python library used for working with the arrays.&lt;br&gt;
Numpy stands for numerical python.it is used in performing wide variety of mathemetical operation on arrays.It also has functions for working in domain of linear algebra, fourier transform, and matrices. In python  there are list that can  operate as numpy but list are slow in process therefore numpy  helps in solving the problem since NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently.&lt;br&gt;
To install numpy we use the following command&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install numpy 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Having installed the numpy you have to import the the library using the following command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import numpy as np
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where np is an alias used for refering numpy.&lt;/p&gt;

&lt;h2&gt;
  
  
  creating an array
&lt;/h2&gt;

&lt;p&gt;The array object in NumPy is called ndarray.We can create a NumPy ndarray object by using the array() function as shown below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import numpy as np

# creating an array

x = np.array([2,4,6,8,10])

print(x)

print(type(x))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: an array can be into one dimension, two dimension or three dimension.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;One dimensional array,  is an array that has 0-D arrays as its elements&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Two dimensional array, An array that has 1-D arrays as its elements&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Three dimensional array, is An array that has 2-D arrays as its elements&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are many operations that takes place in numpy arrays which include&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; numpy array indexing&lt;/li&gt;
&lt;li&gt; numpy array slicing&lt;/li&gt;
&lt;li&gt; numpy array shape&lt;/li&gt;
&lt;li&gt; numpy array reshape&lt;/li&gt;
&lt;li&gt; numpy array split&lt;/li&gt;
&lt;li&gt; numpy array join&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Numpy array indexing
&lt;/h2&gt;

&lt;p&gt;We access can array element through indexing by the help of an index number.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Indexing in 1-D array
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#indexing in 1-D array

x = np.array([1, 3, 4, 6])

print(x[0]) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Indexing in 2-D array&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#indexing in 2-D array

y = np.array([[1,4,6,9,0], [2,7,3,9,1]])

print('2nd element on 1st row: ', y[0, 1]) 

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;output&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Numpy array slicing
&lt;/h2&gt;

&lt;p&gt;slicing refers to  taking elements from one given index to another given index.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Slice elements from index 1 to index 5 from the following array

y = np.array([10,20,30,40,50,60])

print(y[1:4]) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[20 30 40 ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note:The result includes the start index, but excludes the end index&lt;/p&gt;

&lt;h2&gt;
  
  
  Numpy array shape
&lt;/h2&gt;

&lt;p&gt;The shape of an array is the number of elements in each dimension.&lt;br&gt;
NumPy arrays have an attribute called shape that returns a tuple with each index having the number of corresponding elements.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;arr = np.array([[2,4,6,8], [8,8,3,4]])

print(arr.shape) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(2, 4)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The example above returns (2, 4), which means  the array has two rows and 4 columns.The first digit represent row and the second one represent columns&lt;/p&gt;

&lt;h2&gt;
  
  
  Numpy array reshape
&lt;/h2&gt;

&lt;p&gt;Reshaping refers to changing the shape of an array where we have said that shape in array is the number of elements in each dimension.reshaping can be adding or removing number of elements in each dimension.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reshaping can be from 1-D to 2-D
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;z = np.array([2,4,6,8,10,12,14,16,18,20,22,24])

newarray = z.reshape(4, 3)

print(newarray) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[[ 2  4  6]
 [ 8 10 12]
 [14 16 18]
 [20 22 24]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: the array have been reshaped from 1-D array to 2-D array with  4rows and 3 columns&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reshaping 1-D to 3-D
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;z = np.array([2,4,6,8,10,12,14,16,18,20,22,24])

newarray = z.reshape(2, 3, 2)

print(newarray) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[[[ 2  4]
  [ 6  8]
  [10 12]]

 [[14 16]
  [18 20]
  [22 24]]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note:The outermost dimension will have 2 arrays that contains 3 arrays, each with 2 elements&lt;/p&gt;

&lt;h2&gt;
  
  
  Numpy joining array
&lt;/h2&gt;

&lt;p&gt;Joining means putting contents of two or more arrays in a single array.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x = np.array([10, 20, 30])

y = np.array([40, 50, 60])

arr1 = np.concatenate((x, y))

print(arr1) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[10 20 30 40 50 60]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we can join the arrays using stack functions such as vstack which stacks along the columns.lets have an example,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
x = np.array([10, 20, 30])

y = np.array([40, 50, 60])

arr2 = np.vstack((x,y))

print(arr2) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[[10 20 30]
 [40 50 60]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Numpy splitting array
&lt;/h2&gt;

&lt;p&gt;Splitting is reverse operation of Joining.&lt;br&gt;
Joining merges multiple arrays into one and Splitting breaks one array into multiple.To split the arrays we use array_split() function where we pass some arguments which are the array to be split and the number of split.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x = np.array([20,40, 60,70,80,100])

arr3 = np.array_split(x, 3)

print(arr3)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[array([20, 40]), array([60, 70]), array([ 80, 100])]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note:The return value  from the example above is an array containing three arrays.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>visualization of data using matplotlib and seaborn</title>
      <dc:creator>mary kariuki</dc:creator>
      <pubDate>Sat, 22 Oct 2022 22:54:34 +0000</pubDate>
      <link>https://dev.to/marykariuki90/visualization-of-data-using-matplotlib-and-seaborn-3j0f</link>
      <guid>https://dev.to/marykariuki90/visualization-of-data-using-matplotlib-and-seaborn-3j0f</guid>
      <description>&lt;h2&gt;
  
  
  Visualization of data.
&lt;/h2&gt;

&lt;p&gt;Data visualization is the graphical representation of data.&lt;br&gt;
Matplotlib is a python library used in plotting of graphs with other modules such such pandas and numpy while seaborn is also&lt;br&gt;
a python library used for plotting graph with help ofother libararies like matplotlib,numpy and pandas.&lt;br&gt;
The difference between seaborn and matplotlib is that,seaborn &lt;br&gt;
complies  the entire data into a single plot while matplotlib is&lt;br&gt;
used in plotting 2-D graphs of arrays.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Matplotlib&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first  thing is to install matplotlib that uses a simple command&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install matplotlib
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After matplotlib has being installed you have to import the matplotlib module as shown below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib.pyplot as plt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;note: plt is an alias.&lt;br&gt;
Matplotlib is used in plotting varoius graphs such as&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;bar graphs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;histograms&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;pie charts&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;scatter plots&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scatter plot&lt;/p&gt;

&lt;p&gt;to draw a scatter plot we use the SCATTER() method which draws one dot for each value.To plot a scatter function one should have two values  that is the x-axis values and y-axis values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib.pyplot as plt
import numpy as np

xpoints = np.array([3,4,5,7,1,0,5,8,6,4])
ypoints = np.array([70,20,70,30,50,90,55,49,34,28])

plt.scatter(xpoints, ypoints)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bar graphs&lt;/p&gt;

&lt;p&gt;when drawing a bar graph we use the BAR() method  to create bar graphs and provide the x-axis and y-axis values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib.pyplot as plt
import numpy as np

xvalues = np.array(["mary", "anne", "simon", "james"])
yvalues = np.array([90,10,50,70])

plt.bar(xvalues,yvalues)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Histogram&lt;/p&gt;

&lt;p&gt;A histogram is a graph that shows frequency distribution.&lt;br&gt;
We use the HIST() method to create histograms, which uses arrays of numbers where the hist function reads the array and provide a histogram.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib.pyplot as plt
import numpy as np

y = np.random.normal(20, 40, 500)

plt.hist(y)
plt.show()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Piechart&lt;/p&gt;

&lt;p&gt;We use the pie() method to create pie charts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib.pyplot as plt
import numpy as np

z = np.array([10,30,5,60,59,70,2])

plt.pie(z)
plt.show() 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pie chart is subdivided  in 7parts since we have passed 7elements in the array.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Seaborn&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To use seaborn module you will first install as shown.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install seaborn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;after installing you  now import the matplotlib  and seaborn since they go hand in hand.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib.pyplot as plt 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import seaborn as sns
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Seaborn is used in statistical graphics in python now lets load  our data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=sns.load_dataset("data")
df
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Out of our data we can  have a single plot  that describes &lt;br&gt;
the entire data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib.pyplot as plt
import seaborn as sns
sns.pairplot(df)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note:pairplot &amp;gt; allows us to plot pairwise relationships between variables within a dataset.&lt;/p&gt;

&lt;p&gt;Distplot in seaborn&lt;br&gt;
Distplot stands for distribution plot it takes as input an array and plots a curve corresponding to the distribution of points in the array.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import matplotlib.pyplot as plt
import seaborn as sns

sns.distplot([2,4,6,8,10])

plt.show() 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
    <item>
      <title>Analyzing and cleaning data using pandas</title>
      <dc:creator>mary kariuki</dc:creator>
      <pubDate>Tue, 18 Oct 2022 16:03:39 +0000</pubDate>
      <link>https://dev.to/marykariuki90/analyzing-and-cleaning-data-using-pandas-1jfc</link>
      <guid>https://dev.to/marykariuki90/analyzing-and-cleaning-data-using-pandas-1jfc</guid>
      <description>&lt;h2&gt;
  
  
  Analyzing data using pandas
&lt;/h2&gt;

&lt;p&gt;It is advisable to first make a quick overview of  your dataset once you load your data into dataframes, a dataset can be in a csv format therefore to load it into dataframe we use the following,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
dx=pd.read_csv(r"C:\Users\ADMIN\Desktop\EXCEL\RECORD2.csv")
print(dx)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we can view our data in various ways which includes,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;head() function &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;it is a function that returns the headers and specified number of rows from top.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
dt=pd.read_csv(r"C:\Users\ADMIN\Desktop\EXCEL\datas.csv")
print(dt.head(10))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;tail() function &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;it returns the headers and specified number of row from bottom.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
dt=pd.read_csv(r"C:\Users\ADMIN\Desktop\EXCEL\RECORD2.csv")
print(dt.tail())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;info () function&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;it is a function that gives more information about your dataset that is it show datatypes, non-null cells,memory among others.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dt=pd.read_csv(r"C:\Users\ADMIN\Desktop\EXCEL\RECORD2.csv")
print(dt.info())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;describe() function&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;the function gives the description of your data that is the function shows the mean,median,standard deviation,maximum among others.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dt=pd.read_csv(r"C:\Users\ADMIN\Desktop\EXCEL\RECORD2.csv")
print(dt.describe())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cleaning data using pandas
&lt;/h2&gt;

&lt;p&gt;cleaning data is simply removing bad  data  in dataset,this may involve removing empty cells,removing duplicates,checking data with wrong format.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;removing of duplicates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;duplicates are rows that have been registered more than once.&lt;br&gt;
to remove duplicates we use the duplicated() function which returns a boolean value that is True if duplicates exist else returns False.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dt.duplicated().sum()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;removing empty cells &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;we remove the empty cells using dropna() function,this is a method that returns  new dataset and will not change the original dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dt = pd.read_csv(r"C:\Users\ADMIN\Desktop\EXCEL\datas.csv")
data1=dt.dropna()
print(data1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;cleaning wrong data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;we fix wrong data  in two ways  that is by replacing the wrong values or removing those wrong values.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;removing wrong data&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dx=pd.read_csv(r"C:\Users\ADMIN\Desktop\EXCEL\RECORD2.csv")
dx.dropna(subset=['fname'],inplace=True)
print(dx)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2.&lt;em&gt;replacing wrong data&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dx=pd.read_csv(r"C:\Users\ADMIN\Desktop\EXCEL\RECORD2.csv")
for line in dx.index:
    if dx.loc[line,'ASSIGN 2']&amp;gt;90:
        dx.loc[line,'ASSIGN 2']=10
print(dx)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;cleaning wrong format&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;it may difficult  and impossible to analyze data with some columns or rows having wrong format.wrong format can be a row having multiple datatypes , to fix this one can convert the entire row into one datatype or remove the entire row from dataset.&lt;br&gt;
removing entire row&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dx=pd.read_csv(r"C:\Users\ADMIN\Desktop\EXCEL\RECORD2.csv")
dx.dropna(subset=['row 3'])
print(dx)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
  </channel>
</rss>
