DEV Community

Kimeu
Kimeu

Posted on

Introduction to Python for Data Engineering

Python is one of the best programming languages for data analysis due to a variety of packages e.g Pandas and Numpy, that enable its efficiency.
For one to be an expert in data engineering,he or she needs knowledge in software development and data analysis.
Python works well with data analysis as Python code can be interpreted by Jupyter notebook.
For example, when trying to change a datatype of a column to integer data type
df['colName'].astype(int)
Data analysis is made easier through Jupyter notebook,an app that you can easily perform operations on data to get meaning from a collected dataset as it allows one to import packages.
One has to understand how Jupyter differs from Python data types.
Jupyter notebook stores strings as objects while python stores them as strings.
During data collection, it's advised to use API to get data and not web-scrapping. Reason being, with web scraping the underlying html structure can be changed and one cannot reproduce the same results on performing on the dataset.
To install python packages on any environment use "pip install package-name". To install any packages on a conda environment use "conda install package-name"

Top comments (0)