Data Science
What is Data Science?
Data Science is an interdisciplinary field focused on extracting knowledge, manipulating and analyzing data, and using data to answer questions or make recommendations.
A data scientist is a professional who creates programming code and combines it with statistical knowledge to develop insights from data.
Categories of Data Science;
Data Management: Collecting, persisting, and retrieving data securely, efficiently, and cost-effectively
Data Integration and Transformation: Extract, transform, and load data (ETL). Some of the data is distributed in multiple repositories such as databases
Data Visualization: Graphical representation of data and information in charts, plots, maps, and animations. It conveys data more effectively.
Model Building: You train the data and analyze patterns using suitable machine-learning algorithms
Model Deployment: Integrate a model into a production environment. Here the machine learning model is made available to third-party apps via APIs, helping them make data-based decisions.
Model Monitoring and Assessment: Tracks deployed models and model assessment checks for accuracy, fairness, and robustness monitoring
Open Source Tools for Data Science
Data management:
Relational Databases;
a. MySQL
b. PostgreSQLNoSQL Databases;
a. MongoDB
b. Apache CouchDB
c. Apache CassandraFile Based Tools;
a. Hadoop File System
b. Cloud File System
Data Integration and Transformation:
- Apache Airflow
- KubeFlow
- Apache SparkSQL
Data Visualization:
- Pixie Dust
- Hue
- Kibana
- Apache Superset
Model Deployment:
- Apache Prediction IO
- Seldon
- Mleap
- TensorFlow
Model Monitoring and Assessment:
- Model DB
- Prometheus
Libraries for Data Science
Scientific Computing Libraries;
• Pandas
• Numpy
Visualization Libraries;
• Matplotlib
• Seaborn
Machine Learning and Deep Learning Libraries;
• Scikit-Learn
• Keras
• TensorFlow
• Pytorch
Top comments (0)