Transform: Clean and process the data (e.g., filter rows, modify columns).
Load: Insert the transformed data into a SQLite database.
Code Example
# Step 1: Import necessary libraries
importpandasaspd# For data manipulation
importsqlite3# For interacting with SQLite databases
# Step 2: Extract - Load data from a CSV file
defextract_data(file_path):"""
Reads data from a CSV file into a Pandas DataFrame.
"""df=pd.read_csv(file_path)# Read the CSV file
returndf# Step 3: Transform - Clean and process the data
deftransform_data(df):"""
Cleans and transforms the data.
"""# Drop rows with missing values
df=df.dropna()# Convert a column to uppercase (example transformation)
df['name']=df['name'].str.upper()# Filter rows where 'age' is greater than 18
df=df[df['age']>18]returndf# Step 4: Load - Insert data into a SQLite database
defload_data(df,db_path,table_name):"""
Loads the transformed data into a SQLite database.
"""# Connect to the SQLite database (creates it if it doesn't exist)
conn=sqlite3.connect(db_path)# Insert the DataFrame into the database as a table
df.to_sql(table_name,conn,if_exists='replace',index=False)# Close the database connection
conn.close()# Step 5: Main function to orchestrate the ETL process
defmain():# Define file paths
input_file='data.csv'# Path to the input CSV file
db_file='example.db'# Path to the SQLite database
table_name='users'# Name of the table to create
# Step 1: Extract
print("Extracting data...")data=extract_data(input_file)# Step 2: Transform
print("Transforming data...")transformed_data=transform_data(data)# Step 3: Load
print("Loading data into the database...")load_data(transformed_data,db_file,table_name)print("ETL process completed successfully!")# Run the main function
if__name__=="__main__":main()
Explanation of Each Line
Step 1: Import Libraries
importpandasaspdimportsqlite3
pandas: A powerful library for data manipulation and analysis. It provides the DataFrame object, which is ideal for handling tabular data.
sqlite3: A built-in Python library for interacting with SQLite databases.
sqlite3.connect(db_path): Connects to the SQLite database. If the database doesn't exist, it creates one.
df.to_sql(table_name, conn, if_exists='replace', index=False): Inserts the DataFrame into the database as a table.
table_name: Name of the table to create.
if_exists='replace': Replaces the table if it already exists.
index=False: Prevents Pandas from writing row indices to the database.
conn.close(): Closes the database connection.
Step 5: Main Function
defmain():input_file='data.csv'db_file='example.db'table_name='users'print("Extracting data...")data=extract_data(input_file)print("Transforming data...")transformed_data=transform_data(data)print("Loading data into the database...")load_data(transformed_data,db_file,table_name)print("ETL process completed successfully!")
Orchestrates the ETL process by calling the extract_data, transform_data, and load_data functions.
Prints progress messages to the console.
Run the Script
if__name__=="__main__":main()
Ensures the main() function runs only when the script is executed directly (not when imported as a module).
Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.
Our centralized storage agent works on-device, unifying various developer tools to proactively capture and enrich useful materials, streamline collaboration, and solve complex problems through a contextual understanding of your unique workflow.
👥 Ideal for solo developers, teams, and cross-company projects
Top comments (0)