<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alisha Rana</title>
    <description>The latest articles on DEV Community by Alisha Rana (@alisharana).</description>
    <link>https://dev.to/alisharana</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F904516%2F0de7b94d-3ced-4cab-a407-a4bf0b0d58df.jpeg</url>
      <title>DEV Community: Alisha Rana</title>
      <link>https://dev.to/alisharana</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/alisharana"/>
    <language>en</language>
    <item>
      <title>Finding out the Missing Values Using Missingno and Pandas</title>
      <dc:creator>Alisha Rana</dc:creator>
      <pubDate>Mon, 29 Aug 2022 11:50:00 +0000</pubDate>
      <link>https://dev.to/alisharana/finding-out-the-missing-values-using-missingno-and-pandas-368a</link>
      <guid>https://dev.to/alisharana/finding-out-the-missing-values-using-missingno-and-pandas-368a</guid>
      <description>&lt;p&gt;The first step in data cleaning for me is typically looking for missing data, missing data can have different sources, maybe it isn't available, maybe it gets lost, maybe it gets damaged and normally its not an issue, we can fill it but I think often time missing data is very informative in itself, while we can fill the data with the average or something like that and I will show you how to do that frequently, &lt;br&gt;
For instance, if you have an online clothing store, if a customer never clicked on the baby category, it is likely that they do not have children. You can learn a lot by simply taking the information that is not there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The missingno Library&lt;/strong&gt;&lt;br&gt;
Missingno is a great Python module that provides a set of visualisations to help you understand the presence and distribution of missing data within a pandas dataframe. This can take the shape of a dendrogram, heatmap, barplot, or matrix plot.&lt;br&gt;
We can determine where missing values occur, the magnitude of the missingness, and whether any of the missing values are associated with each other using these graphs.&lt;br&gt;
Using the pip command, you may install the missingno library:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

pip install missingno


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Importing Libraries and Loading the Data&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

import pandas as pd
import missingno as msno
df = pd.read_csv('housing.csv')
df.head()


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxcurncj464nznkjiktb3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxcurncj464nznkjiktb3.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick Analysis with Pandas&lt;/strong&gt;&lt;br&gt;
Before we utilise the missingno library, there are a few features in the pandas library that can provide us with an idea of how much missing data there is.&lt;/p&gt;

&lt;p&gt;The first method is to use the .describe() method. This function returns a table with summary statistics about the dataframe, such as the mean, maximum, and minimum values.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

df.describe()


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1dvy6j9bl3x2b2ud47zf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1dvy6j9bl3x2b2ud47zf.png" alt="Image description"&gt;&lt;/a&gt;&lt;br&gt;
Using the .info() method, we can go one step farther. This will provide you a count of the non-null values in addition to a summary of the dataframe.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

df.info()


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo4xa9xa0hc1j8fupotvu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo4xa9xa0hc1j8fupotvu.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Yet another quick technique is&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

df.isna().sum()


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This function produces a summary of the number of missing values in the dataframe. The isna() function finds missing values in the dataframe and returns a Boolean result for each element in the dataframe. The sum() function adds up all of the True values.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrx7ltvbsuf6d7w9impm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrx7ltvbsuf6d7w9impm.png" alt="Image description"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Using missingno to Identify Missing Data&lt;/strong&gt;&lt;br&gt;
There are four types of plots in the missingno library for visualising data completeness: barplots, matrix plots, heatmaps, and dendrogram plots.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

msno.matrix(df)



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2uj0kp6r7jgk61v6jfz5.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2uj0kp6r7jgk61v6jfz5.JPG" alt="Image description"&gt;&lt;/a&gt;&lt;br&gt;
The column total_bedrooms in the resulting graphic displays some amounts of missing data.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

msno.bar(df)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fab9z986194xiad4otp33.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fab9z986194xiad4otp33.JPG" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The barplot provides a simple plot where each bar represents a column within the dataframe. The height of the bar indicates how complete that column is, i.e, how many non-null values are present.&lt;/p&gt;

&lt;p&gt;you can notice the height of total_bedrooms which is less than others&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;br&gt;
Identifying missing data before using machine learning is a critical step in the data quality pipeline. This is possible with the missingno library and a sequence of visualisations.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thank you for your time!&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Dealing with Huge Data</title>
      <dc:creator>Alisha Rana</dc:creator>
      <pubDate>Wed, 24 Aug 2022 12:31:00 +0000</pubDate>
      <link>https://dev.to/alisharana/dealing-with-huge-data-55e6</link>
      <guid>https://dev.to/alisharana/dealing-with-huge-data-55e6</guid>
      <description>&lt;p&gt;It's quite common, especially in large companies, to have datasets that no longer fit in your computer's memory, or if you are performing any kind of calculation, the calculation takes so long that it makes you bored. This means that we must find ways to work with data to make it either small in memory or sample the data, so you have a subset, frequently times it is valid to just take the sample and that sample is representative of all the big data and then make calculations, do data science on it.&lt;br&gt;
&lt;strong&gt;We'll import Pandas and add our data to the dataframe.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
df = pd.read_csv("data/housing.csv")
df.head(5)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To examine the memory footprinting of our loaded data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df.memory_usage(deep=True)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--I9_n2AE4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gnwim8zx4zey8zu3jx73.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--I9_n2AE4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gnwim8zx4zey8zu3jx73.png" alt="Image description" width="275" height="284"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Declaring the deep=True because:&lt;br&gt;
The memory footprint of object dtype columns is ignored by default, We don't want items to be neglected in our situation.&lt;/strong&gt;&lt;br&gt;
Checking the dtype of columns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df.dtypes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--AHpBl8C7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/v7gvb0ou6fengm9wsr1v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AHpBl8C7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/v7gvb0ou6fengm9wsr1v.png" alt="Image description" width="244" height="263"&gt;&lt;/a&gt;&lt;br&gt;
Notice the dtype of ocean_proximity,&lt;br&gt;
Always keep in mind that strings can take up a lot of memory space compared to numbers, which are particularly effective at doing so.&lt;br&gt;
We will override our ocean_proximity datatype with the pandas-specific categorical datatype.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df["ocean_proximity"] = df["ocean_proximity"].astype("category")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This improves our memory usage,&lt;br&gt;
Lets check memory&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df.memory_usage(deep=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--g7Ad-z6B--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/evk7cwjgorsqk9ffvb0l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--g7Ad-z6B--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/evk7cwjgorsqk9ffvb0l.png" alt="Image description" width="251" height="277"&gt;&lt;/a&gt;&lt;br&gt;
waoooooh!!! You can see it reduces more than a half way&lt;br&gt;
In this way you can make your DataFrame more optimal in simple way.&lt;br&gt;
However, the issue with this technique is that even after changing the memory, the memory footprint is still substantial because the data is loaded into our memory.&lt;br&gt;
&lt;strong&gt;During the loading process, we can also modify the datatype&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df_columns = pd.read_csv("data/housing.csv", usecols=["longitude", "latitude", "ocean_proximity"])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we're going to utilise a dictionary with the key being the column name and the value being the datatype, which means you may use as many as you like.&lt;br&gt;
It will adjust our memory footprint to the data frame automatically during loading.&lt;/p&gt;

&lt;p&gt;Instead of importing all datasets because we might not require them all, we will construct a new dataframe and load the data as usual. However, in this time, we will define the columns&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df_columns = pd.read_csv("data/housing.csv", usecols=["longitude", "latitude", "ocean_proximity"])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is another excellent method for saving some space when loading the material.&lt;/p&gt;

&lt;p&gt;Sometimes the issue isn't just with the data loading; sometimes it's with the computation itself because we have a costly function. In these cases, we need to sample our data, which pandas makes easy for us because each dataframe has the method sample available.&lt;/p&gt;

&lt;p&gt;We have a random state that is really crucial if you want to replicate your analysis and give it to a different coworker or data scientist. This is a really nice thing to become used to&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df_columns.sample(100, random_state=42)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;but if you want to repeat something, you must ensure that your random process is reusable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;random_state = 42
df_columns.sample(100, random_state=random_state)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I hope you understand how to load data more efficiently and with fewer items.&lt;br&gt;
See you next time!!!&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>programming</category>
      <category>python</category>
    </item>
    <item>
      <title>Data Science Dependencies</title>
      <dc:creator>Alisha Rana</dc:creator>
      <pubDate>Mon, 22 Aug 2022 22:46:51 +0000</pubDate>
      <link>https://dev.to/alisharana/data-science-dependencies-3ade</link>
      <guid>https://dev.to/alisharana/data-science-dependencies-3ade</guid>
      <description>&lt;p&gt;These are prerequisites to begin pursuing data science. install all of these packages on your computers. You may also get the most recent version of each item.&lt;br&gt;
&lt;strong&gt;Below are dependencies.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;python=3.8.3&lt;/li&gt;
&lt;li&gt;pip=20.1.1&lt;/li&gt;
&lt;li&gt;eli5=0.10.1&lt;/li&gt;
&lt;li&gt;folium=0.11.0&lt;/li&gt;
&lt;li&gt;jupyter=1.0.0&lt;/li&gt;
&lt;li&gt;matplotlib==3.3.0&lt;/li&gt;
&lt;li&gt;missingno=0.4.2&lt;/li&gt;
&lt;li&gt;numpy=1.19.1&lt;/li&gt;
&lt;li&gt;pandas=1.0.5&lt;/li&gt;
&lt;li&gt;pandas-profiling=2.8.0&lt;/li&gt;
&lt;li&gt;pandera=0.4.4&lt;/li&gt;
&lt;li&gt;scikit-learn=0.23.1&lt;/li&gt;
&lt;li&gt;scipy=1.5.0&lt;/li&gt;
&lt;li&gt;seaborn=0.10.1&lt;/li&gt;
&lt;li&gt;shap=0.35.0&lt;/li&gt;
&lt;li&gt;sqlalchemy=1.3.18&lt;/li&gt;
&lt;li&gt;voila=0.1.21&lt;/li&gt;
&lt;li&gt;pip:

&lt;ul&gt;
&lt;li&gt;discover-feature-relationships==1.0.3&lt;/li&gt;
&lt;li&gt;quilt==2.9.15&lt;/li&gt;
&lt;li&gt;yellowbrick==1.1&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>datascience</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>Data loading with Pandas: Loading Excel , CSV , SQL, and any data file</title>
      <dc:creator>Alisha Rana</dc:creator>
      <pubDate>Mon, 22 Aug 2022 22:26:35 +0000</pubDate>
      <link>https://dev.to/alisharana/data-loading-with-pandas-loading-excel-csv-sql-and-any-data-file-kli</link>
      <guid>https://dev.to/alisharana/data-loading-with-pandas-loading-excel-csv-sql-and-any-data-file-kli</guid>
      <description>&lt;p&gt;Whether you want to begin with Data Analysis, fetch useful information, or predict something from data, the first step is always the data loading we will be using a pandas library.&lt;br&gt;
&lt;strong&gt;We will use a Python tool called pandas to import data from either an Excel table or a SQL database.&lt;/strong&gt;&lt;br&gt;
Before getting into loading data, you must have pandas installed into your platform on which you are loading data. &lt;br&gt;
I will be using Jupyter Notebook , you can easily get it in Anaconda&lt;br&gt;
To install pandas run the following command in Jupyter Notebook cell:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!pip install pandas
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or else you can install in Python Environment as well, but that's not the focus of today.&lt;br&gt;
&lt;strong&gt;This is first class we are touching the code , so open up Jupyter Notebook if you want to code along&lt;/strong&gt; &lt;br&gt;
I have some CSV and Excel file, I will go along with&lt;br&gt;
&lt;strong&gt;Initially, you must import the installed library pandas.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Writing this would be enough, but because we will be using pandas a lot usually we will give it a shorthand to some alias&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;pd is most common that people use, we execute the cell now we have Pandas in Python.&lt;br&gt;
&lt;strong&gt;To import or read Data&lt;/strong&gt;&lt;br&gt;
You can enter pd.read in your Notebook and hit tab you can see different ways that you can load data with you will fine various way to load data ,in this we'll have a look at the most common ones&lt;br&gt;
&lt;strong&gt;Import Excel Files&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pd.read_excel("data/crypto.xlsx")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the parenthesis you will be giving the location where your file is stored,&lt;br&gt;
&lt;strong&gt;Now that loading has completed, you can see that you have data in a pandas dataframe&lt;/strong&gt;&lt;br&gt;
We didn't save it in a variable.&lt;br&gt;
However, you can save data in a variable as well.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data=pd.read_excel("data/crypto.xlsx")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Import CSV Files&lt;/strong&gt;&lt;br&gt;
CSV files are slightly different because they contain raw data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pd.read_csv("data/crypto.csv")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Loading Data From SQL&lt;/strong&gt;&lt;br&gt;
A great way to store data and make it available to data scientists is through SQL databases.&lt;br&gt;
Most businesses avoid using Excel files since they can be duplicated.&lt;br&gt;
&lt;strong&gt;In addition to pandas we have to import SQLAlchemy&lt;/strong&gt;&lt;br&gt;
SQLAlchemy is a package that helps Python programmes communicate with databases.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import sqlalchemy as sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below this will create the connection,its called an Engine, If you have PostgreSQL database, this should be the location of your database&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;connect=sql.create_engine("postgresql://scott:tiger@localhost/test")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Here we go read SQL Table&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data = pd.read_sql_table("sales", connect)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Loading any Data Files&lt;/strong&gt;&lt;br&gt;
Pandas works great on structured data, but sometimes data comes in weird formats. This is the general way to work with data files in Python.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;with open("data/crypto.csv", mode='r') as cryptocurr:
    data = cryptocurr.read()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you only want to read the data and not alter it, you'll indicate that. &lt;strong&gt;mode='r'&lt;/strong&gt;&lt;br&gt;
Then we will give file a name to open, here i am giving file name as &lt;strong&gt;cryptocurr&lt;/strong&gt;&lt;br&gt;
Now we have a block where our file is open, within this block create a variable and will use read function after that run the cell and call the variable to get execute.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Hurraaah we did it!!!!!&lt;/strong&gt;&lt;br&gt;
Loading data into pandas is extremely easy.&lt;br&gt;
Try it out with your own data, if you have an excel file lying around on your computer, make sure you have data in your computer nothing gets out so you can just pd.read and get in your data and play around with.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>python</category>
      <category>data</category>
      <category>programming</category>
    </item>
    <item>
      <title>Data Science for Beginners: How to Get Started</title>
      <dc:creator>Alisha Rana</dc:creator>
      <pubDate>Mon, 22 Aug 2022 12:43:00 +0000</pubDate>
      <link>https://dev.to/alisharana/data-science-for-beginners-how-to-get-started-p86</link>
      <guid>https://dev.to/alisharana/data-science-for-beginners-how-to-get-started-p86</guid>
      <description>&lt;p&gt;&lt;strong&gt;Data Science&lt;/strong&gt;&lt;br&gt;
Data science is a trendy topic these days and the field is expanding quickly, but many people are unsure of what the term actually means. In this post, we'll try to clarify what data science is and how to utilise it in business analytics.&lt;br&gt;
&lt;strong&gt;Data&lt;/strong&gt;&lt;br&gt;
First of all, what exactly is data? Data is omnipresent, and people are terrified of it being stolen. Data, however, is something that can teach us a significant amount about a person, a company, and international businesses.&lt;br&gt;
Using data effectively in data science means developing analytical models from the data and making decisions on them.&lt;br&gt;
&lt;strong&gt;Data Science&lt;/strong&gt;&lt;br&gt;
Three words—analysis, statistics, and machine learning—combine to form the term data science.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analysis is performed to extract the data's practical insights.&lt;/li&gt;
&lt;li&gt;For identifying and interpreting data patterns, statistics is used.&lt;/li&gt;
&lt;li&gt;Machine learning is utilized to forecast data.
Approaching the literal definition:
Data science is the application of data to enhance decision-making to accomplish three objectives,&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Analysis&lt;/li&gt;
&lt;li&gt;Statistics&lt;/li&gt;
&lt;li&gt;Machine Learning&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You now understand what data Science and its uses , moving toward Which prerequisites must be satisfied before you may begin with data science.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools for Data Science&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Python&lt;/strong&gt;&lt;br&gt;
Other programming languages, such R, are also utilised in data science. But we'll be talking about which one is easiest to put into practice.&lt;br&gt;
Python is currently gaining popularity because of how simple the syntax is while writing code in it. It can also run on a variety of devices such as Windows and Mac&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Anaconda&lt;/strong&gt;&lt;br&gt;
It's convenient because most of the data science packages we need are already there and are free, so we don't have to install additional programmes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Jupyter Notebook&lt;/strong&gt;&lt;br&gt;
It is a web-based Python interface that makes learning Python very simple, You can use to generate and distribute documents with text, mathematics, and live code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Numpy&lt;/strong&gt;&lt;br&gt;
It is scientific computing toolkit in Python that we use whenever we need to perform calculations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Pandas&lt;/strong&gt;&lt;br&gt;
For me, it combines Excel and SQL. its for data manipulation and analysis tool&lt;/p&gt;

&lt;p&gt;For Machine Learning portion and the model validation:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Scikit-learn&lt;/strong&gt;&lt;br&gt;
It is Python's most practical and reliable machine learning library. It offers a variety of effective methods for statistical modelling and machine learning, including  dimensionality reduction, clustering, and regression, all through a Python interface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Matplotlib&lt;/strong&gt;&lt;br&gt;
A cross-platform library for Python's numerical extension NumPy and data visualisation and graphical charting&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Seaborn&lt;/strong&gt;&lt;br&gt;
Built upon Matplotlib, Seaborn uses single lines to create stunning data visualisations of statistical data.&lt;/p&gt;

&lt;p&gt;These all are open source, free tools are a cornerstone of data science,&lt;br&gt;
I hope you find this blog fascinating; I hope to see you again soon.&lt;/p&gt;

</description>
      <category>bussiness</category>
      <category>data</category>
      <category>science</category>
      <category>python</category>
    </item>
  </channel>
</rss>
