INTRODUCTION
What is python?
Python is a very popular object-oriented, interactive, high-level and general-purpose interpreted programming language. The language has gained popularities in the recent years in areas pertaining data science, analytics, machine learning and web development. In this article we will discuss python for data science(basics).
What is Data science?
This is the process of deriving knowledge and insights from a huge and diverse set of data, enabled by a process of carefully organizing, processing and analysing the data.
Can be simplified as-the study of a given data to extract meaningful insights for business, using modern tools to derive meaningful information.
Involves mathematical and statistical modelling, data extraction and applying data visualization tools and techniques.
Where is Data science applied?
Data science aids various industries by figuring out solutions to problems by linking similar data for future use.
Among the industries include;
1)Health Care
2)Financial Risk Management
3)Energy sector
4)Computer vision
5)Transport industry
Why you should python for Data science?
a)Python is very versatile language
b)Easy to learn
c)Has many libraries that make it possible to perform complex tasks with just a few simple lines of code.
d)It's open source thus free to use and modify.
e)Well-supported with a community.
which tools and libraries should I use?
In the real world, data comes in all forms and shapes, often raw data whereby data wrangling is applied, thus it's the primary job for a data science/scientist to be able to analyze the data.
It's usually challenging to process, clean and transform the data so as to be able to analyze and model it so as to create insights.
Python as a language for performing data science, comes with maNY OPEN-SOURCE libraries to aid in performing all of these tasks. Among the libraries are; pandas, numpy and matplotlib.
There is'nt much need to learn this tools as long as you can be able to organize and clean your data, apply some mathematical formulae, run statistical equations. You may also need to learn how to import python modules.
JUPYTER NOTEBOOK
Is a web-based interactive computing platform, that combines live code, equations, narrative text and visualizations to enrich functionality.
Allows one to code and collaborate with other data scientists using a web browser.
An incredible tool for developing and presenting data science projects . Allowing you to integrate code, its output into a single document, combining visualization, mathematical formula and explanations.
Pandas
Pandas as an essential tool for every data scientist, allows you to clean and massage your data but also be able to analyze it.
You can also be able to load the data from various data sources which may be in form of; CSV files, Excel, Databases among others.
Contains a variety of functions for imports, export, indexing and data manipulation.
Pandas also provides handy data structures such as; Dataframes and series( 1 Dimensional array) and most efficient methods for handling them.
Can be used to reshape, merge, split and aggregate data.
There are multiple courses on Udemy, Datacamp and Youtube on data science with python and pandas.
NUMPY
A python library that provides a simple yet powerful data structure known as the n-dimensional array. Aims to provide an array that is 50* faster than traditional python lists, providing a lot of supporting functions that makes working ndarray very easy. This tools are used in data science where speed and resources are very important.
Can perform mathematical and logical operations on arrays and has a variety of useful capabilities for matrices as well.
MATPLOTLIB
A low-level python library used to perform data visualization, used to communicate the findings of a data analysis project through use of graphs and visualization.
Acts productively with data arrays and frames,by regarding aces and figures as objects.
Well and more customizable and pairs well with pandas and numpy for data analysis.
That's all for now, see you on next article where I will be covering Data analysis.
Top comments (0)