Introduction
Managing data is at the heart of many software applications. Often, we encounter scenarios where we need to import data from a CSV file into a database, such as Oracle. While tools like SQL*Loader exist, Python offers a flexible alternative for scripting and automation.
In this tutorial, we’ll walk through a Python script that:
• Dynamically imports data from a CSV file.
• Handles CLOB (Character Large Object) data for the last column.
• Logs its execution details to a file for easier debugging and auditing.
This script is designed for execution on Unix-based systems but can be adapted for other platforms.
Prerequisites
Before diving into the script, ensure the following:
1. Python Environment: Python 3.6 or later installed.
2. cx_Oracle Library: Install this package for Oracle database connectivity:
pip install cx_Oracle
3. Oracle Instant Client: Required for database interaction. Ensure it’s installed and configured on your system. Set environment variables like LD_LIBRARY_PATH and TNS_ADMIN.
4. Database Table Setup: Ensure your Oracle table matches the CSV structure:
CREATE TABLE my_table (
column1 VARCHAR2(100),
column2 VARCHAR2(100),
column3 VARCHAR2(100),
column4 VARCHAR2(100),
column5 VARCHAR2(100),
column6 VARCHAR2(100),
column7 CLOB
);
The Python Script
Here’s the full Python script:
import os
import sys
import csv
import cx_Oracle
import logging
from datetime import datetime
Define the log file location
LOG_FILE = "/path/to/your/log/file/load_csv_to_oracle.log"
Set up logging
logging.basicConfig(
filename=LOG_FILE,
filemode="a",
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s"
)
def load_csv_to_oracle(file_name, table_name, oracle_connection_string):
"""
Load data from a CSV file into an Oracle table.
Args:
file_name (str): Path to the CSV file.
table_name (str): Name of the Oracle table.
oracle_connection_string (str): Oracle connection string in the format "user/password@host:port/service".
"""
# Check if the file exists
if not os.path.exists(file_name):
logging.error(f"File '{file_name}' does not exist.")
sys.exit(1)
try:
# Connect to the Oracle database
connection = cx_Oracle.connect(oracle_connection_string)
cursor = connection.cursor()
logging.info(f"Connected to Oracle database: {oracle_connection_string}")
# Open the CSV file
with open(file_name, 'r') as csv_file:
reader = csv.reader(csv_file)
headers = next(reader) # Skip the header row
# Validate column count
if len(headers) != 7:
logging.error(f"The CSV file must contain exactly 7 columns, but found {len(headers)}.")
sys.exit(1)
# Prepare the SQL insert query
placeholders = ', '.join([':1', ':2', ':3', ':4', ':5', ':6', ':7'])
insert_query = f"INSERT INTO {table_name} VALUES ({placeholders})"
# Insert rows into the table
row_count = 0
for row in reader:
if len(row) != 7:
logging.warning(f"Skipping row due to incorrect column count: {row}")
continue
# Handle CLOB data for the last column
clob_data = row[-1]
cursor.execute(insert_query, row[:-1] + [clob_data])
row_count += 1
# Commit the transaction
connection.commit()
logging.info(f"Data loaded successfully into table '{table_name}'. Total rows inserted: {row_count}")
except cx_Oracle.DatabaseError as e:
logging.error(f"Database error: {e}")
sys.exit(1)
except Exception as e:
logging.error(f"Error: {e}")
sys.exit(1)
finally:
if cursor:
cursor.close()
if connection:
connection.close()
logging.info("Database connection closed.")
if name == "main":
# Validate the number of arguments
if len(sys.argv) != 3:
logging.error("Invalid number of arguments. Usage: python load_csv_to_oracle.py ")
print("Usage: python load_csv_to_oracle.py ")
sys.exit(1)
# Get the arguments
file_name = sys.argv[1]
table_name = sys.argv[2]
# Oracle connection string
oracle_connection_string = input("Enter the Oracle connection string (user/password@host:port/service): ").strip()
# Log start
logging.info(f"Starting CSV load. File: {file_name}, Table: {table_name}")
# Load the CSV data into the Oracle table
load_csv_to_oracle(file_name, table_name, oracle_connection_string)
# Log end
logging.info("CSV load completed.")
Script Features
1. Dynamic Input: Accepts the CSV file name and Oracle table name as command-line arguments.
2. Error Handling: Gracefully handles file errors, database errors, and incorrect row formats.
3. Logging: Logs all key events, warnings, and errors to a specified log file for auditing and debugging.
4. CLOB Support: Handles large text data in the last column, ideal for long descriptions or notes.
How to Use
1. Save the script as load_csv_to_oracle.py.
2. Execute it from the command line:
python load_csv_to_oracle.py data.csv my_table
3. Provide the Oracle connection string when prompted.
Example Log Output
2024-12-10 10:30:45 - INFO - Starting CSV load. File: data.csv, Table: my_table
2024-12-10 10:30:46 - INFO - Connected to Oracle database: user/password@host:port/service
2024-12-10 10:30:50 - INFO - Data loaded successfully into table 'my_table'. Total rows inserted: 100
2024-12-10 10:30:50 - INFO - Database connection closed.
2024-12-10 10:30:50 - INFO - CSV load completed.
Conclusion
This script provides a robust solution for importing CSV data into Oracle while maintaining a clear audit trail through logging. Whether you’re an administrator or a developer, this approach is customizable and extensible for various data import tasks.
Top comments (0)