DEV Community

John Barku
John Barku

Posted on

DATA Science

Data Science

What is Data Science?
Data Science is an interdisciplinary field focused on extracting knowledge, manipulating and analyzing data, and using data to answer questions or make recommendations.
A data scientist is a professional who creates programming code and combines it with statistical knowledge to develop insights from data.

Categories of Data Science;
Data Management: Collecting, persisting, and retrieving data securely, efficiently, and cost-effectively
Data Integration and Transformation: Extract, transform, and load data (ETL). Some of the data is distributed in multiple repositories such as databases

Data Visualization: Graphical representation of data and information in charts, plots, maps, and animations. It conveys data more effectively.

Model Building: You train the data and analyze patterns using suitable machine-learning algorithms

Model Deployment: Integrate a model into a production environment. Here the machine learning model is made available to third-party apps via APIs, helping them make data-based decisions.

Model Monitoring and Assessment: Tracks deployed models and model assessment checks for accuracy, fairness, and robustness monitoring

Open Source Tools for Data Science
Data management:

  1. Relational Databases;
    a. MySQL
    b. PostgreSQL

  2. NoSQL Databases;
    a. MongoDB
    b. Apache CouchDB
    c. Apache Cassandra

  3. File Based Tools;
    a. Hadoop File System
    b. Cloud File System

Data Integration and Transformation:

  1. Apache Airflow
  2. KubeFlow
  3. Apache SparkSQL

Data Visualization:

  1. Pixie Dust
  2. Hue
  3. Kibana
  4. Apache Superset

Model Deployment:

  1. Apache Prediction IO
  2. Seldon
  3. Mleap
  4. TensorFlow

Model Monitoring and Assessment:

  1. Model DB
  2. Prometheus

Libraries for Data Science
Scientific Computing Libraries;
• Pandas
• Numpy

Visualization Libraries;
• Matplotlib
• Seaborn

Machine Learning and Deep Learning Libraries;
• Scikit-Learn
• Keras
• TensorFlow
• Pytorch

Top comments (0)