DEV Community

Cover image for What is scikit learn?
amigos-maker
amigos-maker

Posted on

What is scikit learn?

Sci-kit (pronounced sai kit) learn is a free software machine learning library for Python. Library in computer language simply means a collection o languages, behaviors, routines, scripts, files, programs and functions which can be used or referenced in a programming code.

Machine learning itself is a method of study of algorithms that can be used by computer systems to perform a specific task or a group of tasks without getting an explicit instruction from a controller, which would be a human in this case, but rather relying on past patterns and experiences to draw inferences that could then be used to perform the required task.

Sci-kit learn is therefore a collection of languages, functions and routines for python that helps provide many supervised and unsupervised learning algorithms.

Sci-kit learn is built from already existing and familiar technology like NumPy, pandas, iPhython, Sympy and Matplotlib.

The two kinds of machine learning models that exist in reality are:

  • The traditional machine learning model
  • Artificial neural network

Sci-kit learn is a library in python that helps in building traditional machine learning models. There are other libraries in python that help in building artificial neural networks, like the python library called
Keras.

Brief history of Sci-kit Learn

Scikit-learn was initially developed by David Cournapeau, a data scientist who had worked for Silveregg, a SaaS company, as well as Enthought, a scientific consulting company for years. The sci-kit learn project initially started as scikits.learn, which was a Google summer of code project in 2007.

Google summer of code, which had started two years prior to the sci-kit learn project, is an international annual project sponsored by Google to award stipends to university students aged 18 years or over who can successfully create a free and open source software program during the summer of code (which happens to be an actual summer).

The name was built from the idea that is a SciPy toolkit, which is a separately developed and distributed third-party extension to SciPy.

Since that time, the original codebase has be rewritten by other developers, such as Fabian Pedregosa, Vincent Michel, Alexandre Gramfort and Gael Varoquaux.

All of the above computer scientists come from the French Institute for Research in Computer Science and Automation, located in Rocquencourt, France.

These scientists took charge of the project in 2010, and the first public release of the project was made by them on 1st February, 2010.The stable release of the software was done on 17th May, 2019.

Why Sci-kit learn?

So what can you do with scikit-learn? You can run the traditional Machine Learning algorithms. Implementation of many algorithms is included in the module.

Some of the functionalities provided by sci-kit learn include:

  • Regression: this includes Logistic and Linear Regression
  • Clustering: this includes K-Means and K-Means ++

clustering

Model Selection

  • Preprocessing: this includes Min to Max Normalization
  • Classification: this includes K-Nearest Neighbors

Benefits of using sci-kit learn include:

  • It can work well with Numpy as it has a good integration
  • It has a good documentation, and a very clean, consistent API
  • It provides lots of useful utilities for doing rare matrix operations and for splitting data.

Sci-kit learn is an important model interface that can be used for various functions like data splitting, even if you do not use the interface primarily.

Related links

Top comments (0)