DEV Community

Aum
Aum

Posted on

Data Load through Python

Introduction

Managing data is at the heart of many software applications. Often, we encounter scenarios where we need to import data from a CSV file into a database, such as Oracle. While tools like SQL*Loader exist, Python offers a flexible alternative for scripting and automation.

In this tutorial, we’ll walk through a Python script that:
• Dynamically imports data from a CSV file.
• Handles CLOB (Character Large Object) data for the last column.
• Logs its execution details to a file for easier debugging and auditing.

This script is designed for execution on Unix-based systems but can be adapted for other platforms.

Prerequisites

Before diving into the script, ensure the following:
1. Python Environment: Python 3.6 or later installed.
2. cx_Oracle Library: Install this package for Oracle database connectivity:

pip install cx_Oracle

3.  Oracle Instant Client: Required for database interaction. Ensure it’s installed and configured on your system. Set environment variables like LD_LIBRARY_PATH and TNS_ADMIN.
4.  Database Table Setup: Ensure your Oracle table matches the CSV structure:
Enter fullscreen mode Exit fullscreen mode

CREATE TABLE my_table (
column1 VARCHAR2(100),
column2 VARCHAR2(100),
column3 VARCHAR2(100),
column4 VARCHAR2(100),
column5 VARCHAR2(100),
column6 VARCHAR2(100),
column7 CLOB
);

The Python Script

Here’s the full Python script:

import os
import sys
import csv
import cx_Oracle
import logging
from datetime import datetime

Define the log file location

LOG_FILE = "/path/to/your/log/file/load_csv_to_oracle.log"

Set up logging

logging.basicConfig(
filename=LOG_FILE,
filemode="a",
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s"
)

def load_csv_to_oracle(file_name, table_name, oracle_connection_string):
"""
Load data from a CSV file into an Oracle table.

Args:
    file_name (str): Path to the CSV file.
    table_name (str): Name of the Oracle table.
    oracle_connection_string (str): Oracle connection string in the format "user/password@host:port/service".
"""
# Check if the file exists
if not os.path.exists(file_name):
    logging.error(f"File '{file_name}' does not exist.")
    sys.exit(1)

try:
    # Connect to the Oracle database
    connection = cx_Oracle.connect(oracle_connection_string)
    cursor = connection.cursor()
    logging.info(f"Connected to Oracle database: {oracle_connection_string}")

    # Open the CSV file
    with open(file_name, 'r') as csv_file:
        reader = csv.reader(csv_file)
        headers = next(reader)  # Skip the header row

        # Validate column count
        if len(headers) != 7:
            logging.error(f"The CSV file must contain exactly 7 columns, but found {len(headers)}.")
            sys.exit(1)

        # Prepare the SQL insert query
        placeholders = ', '.join([':1', ':2', ':3', ':4', ':5', ':6', ':7'])
        insert_query = f"INSERT INTO {table_name} VALUES ({placeholders})"

        # Insert rows into the table
        row_count = 0
        for row in reader:
            if len(row) != 7:
                logging.warning(f"Skipping row due to incorrect column count: {row}")
                continue
            # Handle CLOB data for the last column
            clob_data = row[-1]
            cursor.execute(insert_query, row[:-1] + [clob_data])
            row_count += 1

    # Commit the transaction
    connection.commit()
    logging.info(f"Data loaded successfully into table '{table_name}'. Total rows inserted: {row_count}")

except cx_Oracle.DatabaseError as e:
    logging.error(f"Database error: {e}")
    sys.exit(1)
except Exception as e:
    logging.error(f"Error: {e}")
    sys.exit(1)
finally:
    if cursor:
        cursor.close()
    if connection:
        connection.close()
    logging.info("Database connection closed.")
Enter fullscreen mode Exit fullscreen mode

if name == "main":
# Validate the number of arguments
if len(sys.argv) != 3:
logging.error("Invalid number of arguments. Usage: python load_csv_to_oracle.py ")
print("Usage: python load_csv_to_oracle.py ")
sys.exit(1)

# Get the arguments
file_name = sys.argv[1]
table_name = sys.argv[2]

# Oracle connection string
oracle_connection_string = input("Enter the Oracle connection string (user/password@host:port/service): ").strip()

# Log start
logging.info(f"Starting CSV load. File: {file_name}, Table: {table_name}")

# Load the CSV data into the Oracle table
load_csv_to_oracle(file_name, table_name, oracle_connection_string)

# Log end
logging.info("CSV load completed.")
Enter fullscreen mode Exit fullscreen mode

Script Features
1. Dynamic Input: Accepts the CSV file name and Oracle table name as command-line arguments.
2. Error Handling: Gracefully handles file errors, database errors, and incorrect row formats.
3. Logging: Logs all key events, warnings, and errors to a specified log file for auditing and debugging.
4. CLOB Support: Handles large text data in the last column, ideal for long descriptions or notes.

How to Use
1. Save the script as load_csv_to_oracle.py.
2. Execute it from the command line:

python load_csv_to_oracle.py data.csv my_table

3.  Provide the Oracle connection string when prompted.
Enter fullscreen mode Exit fullscreen mode

Example Log Output

2024-12-10 10:30:45 - INFO - Starting CSV load. File: data.csv, Table: my_table
2024-12-10 10:30:46 - INFO - Connected to Oracle database: user/password@host:port/service
2024-12-10 10:30:50 - INFO - Data loaded successfully into table 'my_table'. Total rows inserted: 100
2024-12-10 10:30:50 - INFO - Database connection closed.
2024-12-10 10:30:50 - INFO - CSV load completed.

Conclusion

This script provides a robust solution for importing CSV data into Oracle while maintaining a clear audit trail through logging. Whether you’re an administrator or a developer, this approach is customizable and extensible for various data import tasks.

Top comments (0)