DEV Community

Lina Dias
Lina Dias

Posted on • Updated on

Going into a Data Science Python course!

Hi guys!

As you know by my previous posts (in the Portuguese blog), I've been doing a mini-course at Udemy about data visualization in Python. This course was recommended for helping me to learn Python, so I can join the AI lab in my university ASAP.

First, I was formally presented to matplotlib.pyplot with the import command:
import matplotlib.pyplot as mpl or as plt

Next, I met the Google Colab platform, which is a Python notebook, and helped me a lot, since I can't download a Python IDE on the computer I'm currently using.

I made a line graph, with the following code (please try, it was a cute experience):

import matplotlib.pyplot as plt 

x = [1, 2] #giving x and y some values so I can plot something
y = [2, 3]

plt.plot(x, y) #plotting the graph
plt.show() #show the graph when I hit "Run"
Enter fullscreen mode Exit fullscreen mode

After creating my first graph, I did the legend, so I could identify things inside what I created.

import matplotlib.pyplot as plt 

x = [1, 2, 5] #I added one more value in each variable 
y = [2, 3, 7]

plt.title("My first graph") #this is a title for my graph
plt.xlabel("Axis X") #creating labels for each axis
plt.ylabel("Axis Y")

plt.plot(x, y) 
plt.show()
Enter fullscreen mode Exit fullscreen mode

Now that we learned how to do line graphs, shall we do bar graphs?

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5] #now, x represents each one of the bars
y = [2, 3, 7, 1, 0] #and y, their sizes

titulo = "Bar graph"
eixox = "Axis X"               #creating variables for the legends
eixoy = "Axis Y"

plt.title(titulo)
plt.xlabel(eixox)
plt.ylabel(eixoy)

plt.bar(x, y) #plotting the bar graph
plt.show()
Enter fullscreen mode Exit fullscreen mode

Now we, with the knowledge we got about these two types of graphs, can do at least two things: compare graphs and/or unite them!

import matplotlib.pyplot as plt

x1 = [1, 3, 5, 7, 9] #odd numbers for the bars!
y1 = [2, 3, 7, 1, 0] #random numbers for their sizes: unaltered

x2 = [2, 4, 6, 8, 10] #even numbers for other bars
y2 = [5, 1, 3, 7, 4] #more random numbers, but they weren't here before

titulo = "Bar graphs"
eixox = "Axis X" #"eixo" is Portuguese for "axis"
eixoy = "Axis Y"
                             #this part you already know
plt.title(titulo)
plt.xlabel(eixox)
plt.ylabel(eixoy)

plt.bar(x1, y1)
plt.bar(x2, y2) #one and then another, but are shown together!

plt.show()
Enter fullscreen mode Exit fullscreen mode

We can also unite n types of graphs! You can enter a plt.plot(x, y) in the bar graphs code, for example. But we have one more type of graph: Scatterplot, or dispersion graph. Call it by plt.scatter(x, y).

Quick note for this English version: in the Portuguese one, I've shown some images of the graphs I did so you can see how it is if you can't access Google Colab now. So I wrote thinking about the images that could be seen and now I'm adapting to DEV.

You may have seen that in the comparative graphs the colors change. Actually, these are the default colors, but you can change them to any hue you want (using mainly the color codes of plt, which you can find here), with the color tag, as in plt.scatter(x, y, color="r"), where I'm changing the color of the dots to red. You can also use the label tag to make captions for the graph, as in plt.plot(x, y, label="My line"), but using plt.legend() after it so the caption can show up in the image.

You can save your figures with plt.savefig("figurename.png"), being that "png" can be altered to "pdf" if we want a vectorized image, so it has a (really) good print quality. We have the dpi tag, which can be used to define the quality of the image. A good dpi value is, apparently, 300, and a so-so is by 72. You can use plt.savefig("figurename.png", dpi=300).

The course has a small case study with info from 1980 to 2016 about the increase in the Brazilian population, and then we're presented to the boxplot. Boxplots are box-shaped diagrams which represent variation in data per quartile. This is my current subject in Probability and Statistics, so I was very interested.

Basically, if you enter a code like this one:

import matplotlib.pyplot as plt
import random #a Python library for generating random numbers!

vetor = [] #a small vector to put values on
for i in range(100): #for a i value from (I guess) 0 to 99...
    numAleatorio = random.randint(0, 50) #random number ("número aleatório") is somewhere between 0 and 50
    vetor.append(numAleatorio) #vector receives this number so we can create the boxplot

plt.boxplot(vetor) #and then he plots the boxplot with the vector value
plt.show()
Enter fullscreen mode Exit fullscreen mode

...and then press Run, it generates another image, but it's not like the other graphs, so let me explain what I know about it.

If there was something over the top (that line), that would represent values which are very different from what was asked. The above line represents the maximum that this number can be (50). Talking about 50, the main rectangle in the figure represents 50% of the obtained data. The lower line is the minimum, zero.

But where are Statistics in this? See: quartiles are fractions of a given quantity that was divided by 4. So the rectangle contains two quartiles, since 2/4 = 50%. The first line is 0%. The second (when the rectangle "begins") is 25% (1st quartile). The median (the red line) represents 50% (2nd quartile). The above line represents 75% (3rd quartile). The maximum, 100%, is the fourth quartile.

To complete the course, there was a case study about Bioinformatics, but for some reason I still haven't figured out completely, my code was resulting in many errors. I put the code on StackOverflow and I hope someone, someday, will help me to find the error. Also, if you could even look at it, I would be very thankful. Edit: I've figured it out somehow. The link is now deactivated.

I recommend this course for people who, like me, are starting to learn Python and Data Science things.
If you tried it, please leave your feedback in the comments :)

Top comments (1)

Collapse
 
innomaticsresearchlabs profile image
innomaticsresearchlabs

Data Science Course in Hyderabad

Data Science Training in Hyderabad
Data Science Course Training in Hyderabad

Advanced-Data science training with Free Internship & 100% Placement Assistance

About the Data Science course Training in Hyderabad

Data is everywhere, which is growing exponentially globally, and this may still grow at an accelerating rate for the foreseeable future. Businesses generate massive amounts of data within the type of blogs, messages, transaction documents, mobile device data, social media, etc. By using this data effectively, a firm can create vital value and grow their economy by enhancing productivity, increasing efficiency, and delivering more value to consumers.

Data Science helps in combining the disruption into categories and communicating their potential, which allows data and analytics leaders to drive better results. Top businesses thought there is a necessity to research the data for significant benefits. They use the insights from data for the advantage of users.

Human deciding is becoming increasingly inadequate to pander to a never-ending expansion of the data . However, Data Science and Machine Learning are excelling in solving highly complex data-rich problems. To think beyond the human brain and maintain the balance with the knowledge that's evolved, disrupted, and being employed the sectors altogether, data scientists foster new methodologies. Data scientists must try 'big data expeditions' to explore the data for previously undiscovered value - the first common application of data science. Typical applications include marketing segmentation, advertising, tweaking dynamic pricing models, or banks finding risks and adjusting the financial risk models.

What are the Tools utilized in Data Science?