DEV Community

Cover image for Easy CSV Handling with Python: A Beginner's Guide (Bite-size Article)
koshirok096
koshirok096

Posted on

Easy CSV Handling with Python: A Beginner's Guide (Bite-size Article)

Introduction

Today, I want to write about Python. Normally, I focus on front-end web development technologies, especially React, JavaScript, HTML + CSS and etc, but I've ventured into something new this time. Recently, in a personal project which I'm working for, I found that I need to extract and manage data from large CSV files, which has a tons of columns and rows in each. I'm still figuring out how to execute and manage this task, but my research led me to believe that Python is well-suited for it.

I haven't decided yet whether to use Python for this project, but I did a simple setup for testing, and it worked well. As a Python beginner, I want to share what I did and what I've learned with others who are also interested in starting with Python.

Until writing this article, I had never touched Python, but the tasks I did were very simple, and I highly recommend it to anyone interested in playing around with Python. If you're curious, read this article and try it out for yourself.

Image description

Python Initial Setup

First things first, let's start with the setup of Python. Note that these instructions are based on my environment (Mac), so if you are using Windows or other OS, please look up the appropriate steps for your system (sorry about that).

Let's install Python on your computer by downloading it from the official Python website.

Next, create your project file. The file structure of this project would eventually look like the following (At this point, create only the root project folder). But, feel free to modify it according to your personal situation.

csv_project/
│
├── venv/                  # Python virtual environment folder
│
├── src/                   # Directory to store source code
│   └── main.py            # Main script
│
├── data/                  # Directory for data files
│   └── sample.csv         # CSV data file
│
├── requirements.txt       # Project dependencies file
│
└── README.md              # Project documentation (optional)

Enter fullscreen mode Exit fullscreen mode

Tip: What is venv?

"venv" stands for "virtual environment," which is a feature of Python's standard library that creates a virtual Python environment on your system. We will use files ending in .Py to execute our work in this environment. It is common for Python projects to use virtual environments to separate dependencies from one project to another.

Image description

Setting Up a Virtual Environment (venv)

Next, from the root of your project, run below commands from your CLI to create a virtual environment.

python -m venv venv # or python3 -m venv venv
Enter fullscreen mode Exit fullscreen mode

That's all. Just in case, if you want to make sure that venv is set up correctly or not, you can check using the following methods:

1. Check if the Virtual Environment is Active

After activating the virtual environment, check if the terminal (CLI) prompt starts with the name of the virtual environment, (venv). For example, if the name of the virtual environment is venv, the prompt will typically display as:

(venv) user@hostname:~/path/to/project$
Enter fullscreen mode Exit fullscreen mode

If you see this, the virtual environment is correctly activated.

2. Use the which python Command

To check if the virtual environment is active, execute the which python (or which python3) command to verify the path of the Python being used. If the Python from the virtual environment is being used, the output path will be inside the virtual environment directory.

which python
Enter fullscreen mode Exit fullscreen mode

If it functions correctly, this command will output a path. If it points to the virtual environment directory, it is correctly set up.

Image description

Virtual Environment Activation & Dependency Installation

Next, we'll prepare for the activation of the virtual environment and the installation of dependencies.

For the first thing, to activate the virtual environment that you have created, execute the following command:

# To activate the virtual environment on MacOS or Linux:
source venv/bin/activate

# For Windows users:
venv\Scripts\activate
Enter fullscreen mode Exit fullscreen mode

This will activate the virtual environment for the current shell session.

Next task is the installation of dependencies.

To install the libraries needed for this project, we will use pip. This time, as we are working with CSV files, let’s use a tool called pandas.

pip install pandas
Enter fullscreen mode Exit fullscreen mode

By listing this in the requirements.txt, other developers can easily install the same dependencies (this file lists the Python packages used in the project).

pip freeze > requirements.txt
Enter fullscreen mode Exit fullscreen mode

This saves the dependencies installed after the necessary libraries into requirements.txt, thereby allowing other developers to easily reproduce the same environment.

Tip: What is pip?

Pip is a package management system for Python. It is the standard tool for installing and managing Python programs and libraries, and allows you to easily search for and install packages from the Python Official Package Repository, PyPI (Python Package Index).

The pip freeze command used this time is a command in Python that lists all the packages and their versions installed in the current Python environment and outputs this list to standard output. This output is usually redirected to a requirements.txt file, which can later be used to easily reinstall the same packages in other environments.

Tip: What is requirements.txt?

Requirements.txt in Python projects is equivalent to package.json in React (JavaScript) projects (I'm writing this for someone like me). In React projects, the package.json file is used to manage the project metadata and dependencies. Similarly, in Python projects, requirements.txt serves that role.

🔥 Comparison of package.json and requirements.txt

📦 package.json (JavaScript):

  • Lists the project dependencies (libraries and frameworks).
  • Can also include additional information related to the project (name, version, scripts, etc.).
  • Dependencies are installed using npm install or yarn install.

📕 requirements.txt (Python):

  • Provides a list of Python packages required for the project.
  • Mainly contains a list of package names and their versions.
  • Executing pip install -r requirements.txt installs the listed dependencies.

Both files are used to manage dependencies within the ecosystems of different programming languages, but their basic functions are the same. Namely, they define the dependencies of the development environment and make it easily reproducible.

Image description

Reading and Writing CSV Data from Python

Finally, let's verify whether we can successfully read and write CSV files. For this test, we will simply print the outputs to the terminal to see if it works!

First, prepare the CSV file you wish to read. Create a folder named data, in the root of your project and place the CSV file directly under it.

Next, create the Python file that will execute this operation. Create a folder named src, in the project's root, and under it, create a file named main.py. The code will look like this:

import pandas as pd

# Specify the path to the CSV file
file_path = '../data/sample.csv'

# Read the contents of the CSV file using pandas
data = pd.read_csv(file_path, encoding='utf-8', errors='ignore')

# Print the data
print(data)
Enter fullscreen mode Exit fullscreen mode

If you're new to Python, there may be some parts of the code above that look unfamiliar. However, this article is a casual introduction to reading CSV files with Python, so we'll skip the details. If you just want to follow along, you can copy and paste the above code, replacing the filename and path with your own, or you can research the code yourself.


OK, it's time for the final step. Move to the src directory with cd, and then execute the following:

python main.py 
Enter fullscreen mode Exit fullscreen mode

If there is an error, try python3 main.py instead of this code.

How did it go? If the CSV list appears correctly in the terminal, you're successful!

Actually this challenge is done here, but let's wrap up with a few additional notes.

Tip: Encoding and 'errors='ignore'

When I tried to run this code above, I got an error due to encoding issues.

This Python codes above includes encoding='utf-8', and you might need to change the encoding depending on the original CSV file.

There many types of encodes, so I tried to change this encoding several times, but these adjustments didn't resolve the problems and continued to cause errors (possibly my CSV has something that cause of problems). If you run into similar issues, adding errors='ignore' might help you execute the script while ignoring encoding errors (Note: If no errors occur, the errors='ignore' code is optional and can be removed).

But, be aware that ignoring errors might cause some data to be lost in the output. The one of big purposes of this test was just to get familiar with Python, so I didn't delve deeply into encoding issues and ran the test with errors='ignore' added. However, this approach is likely impractical for real applications. I admit I'm not yet very knowledgeable about Python and encoding, so I will check this issue more on this later.

Bonus: Deactivating the Virtual Environment:

Once you've done your project, you can deactivate the virtual environment with the following command:

deactivate
Enter fullscreen mode Exit fullscreen mode

This command will exit the virtual environment and return you to your system's original Python environment. Use it as needed.

Conclusion

In this article, I introduced a method to read the contents of a CSV file using Python with minimal knowledge and effort.

Since I myself do not yet fully understand complex topics, paradoxically, the information presented on this article should have been quite easy to understand for everyone (but I apologize if any part of the explanation was unclear!).

I hope you enjoy your Python journey! Thank you for reading!

Top comments (0)