import csv
input_file = 'input.csv'
output_file = 'output.csv'
column_index = 1
with open(input_file, 'r') as infile:
csv_reader = csv.reader(infile)
header = next(csv_reader)
filtered_rows = [header]
for row in csv_reader:
if float(row[column_index]) > 100:
filtered_rows.append(row)
with open(output_file, 'w', newline='') as outfile:
csv_writer = csv.writer(outfile)
csv_writer.writerows(filtered_rows)
print("Filtered rows have been written to output.csv")
The code logic is as follows;
Imports the CSV module:
The code starts by importing thecsv
module, which helps us read and write CSV files.-
File paths and column index:
-
input_file = 'input.csv'
tells the program where to find the file we want to read. -
output_file = 'output.csv'
is where the program will save the filtered data. -
column_index = 1
indicates the column where we will check the values (in this case, the second column because column counting starts from 0).
-
Open the input file:
The program opens theinput.csv
file to read the data inside.Read the header:
It reads the first row of the file, which contains the column names, and stores it inheader
. This will be used later when writing to the new file.-
Filter the rows:
The program goes through each row of data:- It checks if the number in the specified column (the second column) is greater than 100.
- If the number is greater than 100, the program keeps that row.
- If not, the row is skipped.
Write to the output file:
After filtering, the program writes the header and the remaining rows (that meet the condition) to a new file calledoutput.csv
.Print a message:
Finally, the program prints a message to let you know that the filtered data has been saved to the new file.
2a. **A Python multithreading solution to download multiple files simultaneously.
import threading
import requests
urls = [
'https://example.com/file1.jpg',
'https://example.com/file2.jpg',
'https://example.com/file3.jpg'
]
def download_file(url):
try:
response = requests.get(url)
filename = url.split('/')[-1]
with open(filename, 'wb') as f:
f.write(response.content)
print(f"Downloaded: {filename}")
except Exception as e:
print(f"Failed to download {url}: {e}")
threads = []
for url in urls:
thread = threading.Thread(target=download_file, args=(url,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print("All downloads are complete.")
Explanation of the code:
-
URLs List:
urls
contains the list of file URLs you want to download. -
Download Function:
download_file(url)
is a function that downloads a single file from a URL and saves it. -
Thread Creation: For each URL, a new thread is created using
threading.Thread
to download the file at the same time. -
Starting Threads: The
start()
method is called on each thread to begin downloading the files. -
Waiting for Completion:
join()
ensures the main program waits for all threads to finish before it prints "All downloads are complete."
2b. A multiprocessing script to compute the factorial of numbers from 1 to 10.
import multiprocessing
def factorial(n):
result = 1
for i in range(1, n + 1):
result *= i
print(f"Factorial of {n} is {result}")
if __name__ == '__main__':
for i in range(1, 11):
process = multiprocessing.Process(target=factorial, args=(i,))
process.start()
process.join()
print("All factorials have been computed.")
Explanation:
-
factorial(n)
function: Calculates the factorial of a numbern
and prints the result. -
Main Block: In the
if __name__ == '__main__'
block:- Loops through numbers from 1 to 10.
- For each number, creates a new process to compute its factorial.
- Starts each process and waits for it to finish using
process.join()
before moving to the next.
2c A simple Python script that demonstrates how to modify a Pandas DataFrame in parallel using concurrent.futures:
import pandas as pd
import concurrent.futures
def modify_row(row):
row['modified'] = row['value'] * 2
return row
def main():
data = {'value': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
with concurrent.futures.ThreadPoolExecutor() as executor:
results = list(executor.map(modify_row, [row for _, row in df.iterrows()]))
df = pd.DataFrame(results)
print(df)
if __name__ == '__main__':
main()
Explanation:
-
DataFrame: A simple DataFrame
df
is created with a column'value'
. -
modify_row
function: This function modifies the row by adding a new column'modified'
, where the value is the original'value'
multiplied by 2. -
ThreadPoolExecutor:
-
executor.map(modify_row, [...])
runs themodify_row
function in parallel for each row in the DataFrame.
-
- Result: The modified DataFrame is printed at the end.
Top comments (0)