1. "Python Program to Filter CSV Rows and Write Output to New File"

#python #dataengineering

import csv

input_file = 'input.csv'
output_file = 'output.csv'
column_index = 1

with open(input_file, 'r') as infile:
    csv_reader = csv.reader(infile)
    header = next(csv_reader)
    filtered_rows = [header]

    for row in csv_reader:
        if float(row[column_index]) > 100:
            filtered_rows.append(row)

with open(output_file, 'w', newline='') as outfile:
    csv_writer = csv.writer(outfile)
    csv_writer.writerows(filtered_rows)

print("Filtered rows have been written to output.csv")

The code logic is as follows;

Imports the CSV module:
The code starts by importing the csv module, which helps us read and write CSV files.
File paths and column index:
- input_file = 'input.csv' tells the program where to find the file we want to read.
- output_file = 'output.csv' is where the program will save the filtered data.
- column_index = 1 indicates the column where we will check the values (in this case, the second column because column counting starts from 0).
Open the input file:
The program opens the input.csv file to read the data inside.
Read the header:
It reads the first row of the file, which contains the column names, and stores it in header. This will be used later when writing to the new file.
Filter the rows:
The program goes through each row of data:
- It checks if the number in the specified column (the second column) is greater than 100.
- If the number is greater than 100, the program keeps that row.
- If not, the row is skipped.
Write to the output file:
After filtering, the program writes the header and the remaining rows (that meet the condition) to a new file called output.csv.
Print a message:
Finally, the program prints a message to let you know that the filtered data has been saved to the new file.

2a. **A Python multithreading solution to download multiple files simultaneously.

import threading
import requests

urls = [
    'https://example.com/file1.jpg',
    'https://example.com/file2.jpg',
    'https://example.com/file3.jpg'
]

def download_file(url):
    try:
        response = requests.get(url)
        filename = url.split('/')[-1]
        with open(filename, 'wb') as f:
            f.write(response.content)
        print(f"Downloaded: {filename}")
    except Exception as e:
        print(f"Failed to download {url}: {e}")

threads = []
for url in urls:
    thread = threading.Thread(target=download_file, args=(url,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print("All downloads are complete.")

Explanation of the code:

URLs List: urls contains the list of file URLs you want to download.
Download Function: download_file(url) is a function that downloads a single file from a URL and saves it.
Thread Creation: For each URL, a new thread is created using threading.Thread to download the file at the same time.
Starting Threads: The start() method is called on each thread to begin downloading the files.
Waiting for Completion: join() ensures the main program waits for all threads to finish before it prints "All downloads are complete."

2b. A multiprocessing script to compute the factorial of numbers from 1 to 10.

import multiprocessing

def factorial(n):
    result = 1
    for i in range(1, n + 1):
        result *= i
    print(f"Factorial of {n} is {result}")

if __name__ == '__main__':
    for i in range(1, 11):
        process = multiprocessing.Process(target=factorial, args=(i,))
        process.start()
        process.join()

    print("All factorials have been computed.")

Explanation:

factorial(n) function: Calculates the factorial of a number n and prints the result.
Main Block: In the if __name__ == '__main__' block:
- Loops through numbers from 1 to 10.
- For each number, creates a new process to compute its factorial.
- Starts each process and waits for it to finish using process.join() before moving to the next.

2c A simple Python script that demonstrates how to modify a Pandas DataFrame in parallel using concurrent.futures:

import pandas as pd
import concurrent.futures

def modify_row(row):
    row['modified'] = row['value'] * 2
    return row

def main():
    data = {'value': [1, 2, 3, 4, 5]}
    df = pd.DataFrame(data)

    with concurrent.futures.ThreadPoolExecutor() as executor:
        results = list(executor.map(modify_row, [row for _, row in df.iterrows()]))

    df = pd.DataFrame(results)
    print(df)

if __name__ == '__main__':
    main()

Explanation:

DataFrame: A simple DataFrame df is created with a column 'value'.
modify_row function: This function modifies the row by adding a new column 'modified', where the value is the original 'value' multiplied by 2.
ThreadPoolExecutor:
- executor.map(modify_row, [...]) runs the modify_row function in parallel for each row in the DataFrame.
Result: The modified DataFrame is printed at the end.

DEV Community

1. "Python Program to Filter CSV Rows and Write Output to New File"

Explanation of the code:

Explanation:

Explanation:

Top comments (0)