DEV Community

Cover image for A Comprehensive Guide to Extracting Data from MySQL Using Singer ETL
Rahul-262001
Rahul-262001

Posted on

A Comprehensive Guide to Extracting Data from MySQL Using Singer ETL

In this guide, we'll walk through the process of extracting data from MySQL using tap-mysql and loading it into target-jsonl. This seamless process ensures efficient data transfer while maintaining integrity.

Step 1: Enable Python Virtual Environment (venv)

Even without administrative access, you can set up a Python virtual environment. Here's how:

  • Create a new virtual environment:

    python3 -m venv <file_name>
    
  • Navigate to the Script Directory within the created file and activate the virtual environment:

    .\activate.bat
    

Now, your virtual environment is active and ready for use.

Step 2: Install tap-mysql and target-jsonl

Utilize pip to install the necessary packages:

pip install tap-mysql target-jsonl
Enter fullscreen mode Exit fullscreen mode

Step 3: Prepare Configuration Files

tap-mysql requires two input files: config.json and properties.json.

config.json:

{
    "host": "127.0.0.1",
    "port": "3306",
    "user": "root",
    "password": "root"
}
Enter fullscreen mode Exit fullscreen mode

properties.json:

{
    "streams": [
        {
            "tap_stream_id": "sakila-actor_info",
            "table_name": "actor_info",
            "schema": {
                "properties": {
                    "actor_id": {
                        "inclusion": "available",
                        "minimum": 0,
                        "maximum": 65535,
                        "type": [
                            "null",
                            "integer"
                        ]
                    },
                    "first_name": {
                        "inclusion": "available",
                        "maxLength": 45,
                        "type": [
                            "null",
                            "string"
                        ]
                    },
                    "last_name": {
                        "inclusion": "available",
                        "maxLength": 45,
                        "type": [
                            "null",
                            "string"
                        ]
                    },
                    "film_info": {
                        "inclusion": "available",
                        "maxLength": 65535,
                        "type": [
                            "null",
                            "string"
                        ]
                    }
                },
                "type": "object"
            },
            "stream": "actor_info",
            "metadata": [
                {
                    "breadcrumb": [],
                    "metadata": {
                        "selected": true,
                        "replication-method": "FULL_TABLE",
                        "selected-by-default": false,
                        "database-name": "sakila",
                        "is-view": true
                    }
                },
                {
                    "breadcrumb": [
                        "properties",
                        "actor_id"
                    ],
                    "metadata": {
                        "selected-by-default": true,
                        "sql-datatype": "smallint unsigned"
                    }
                },
                {
                    "breadcrumb": [
                        "properties",
                        "first_name"
                    ],
                    "metadata": {
                        "selected-by-default": true,
                        "sql-datatype": "varchar(45)"
                    }
                },
                {
                    "breadcrumb": [
                        "properties",
                        "last_name"
                    ],
                    "metadata": {
                        "selected-by-default": true,
                        "sql-datatype": "varchar(45)"
                    }
                },
                {
                    "breadcrumb": [
                        "properties",
                        "film_info"
                    ],
                    "metadata": {
                        "selected-by-default": true,
                        "sql-datatype": "text"
                    }
                }
            ]
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Generate properties.json

Execute the following command in discover mode to generate catalog.json:

tap-mysql --config config.json --discover > catalog.json
Enter fullscreen mode Exit fullscreen mode

Locate the JSON content of the desired table in catalog.json and copy it into another file. Let's name this file selected_table.json.

In selected_table.json, add the following lines within the curly braces to ensure the table is selected:

{
    "streams": [ // Paste the content here ]
}
Enter fullscreen mode Exit fullscreen mode

This step ensures that only the selected table is included for extraction.

Step 5: Run tap-mysql

Execute the following command:

tap-mysql --config config.json --catalog selected_table.json
Enter fullscreen mode Exit fullscreen mode

Congratulations! You've successfully extracted data from MySQL using tap-mysql.

Step 6: Send Data to jsonl Target

Run the following command to send the data to jsonl target:

tap-mysql --config config.json --catalog selected_table.json | target-jsonl
Enter fullscreen mode Exit fullscreen mode

A file with the same name as the table will be created.

Step 7: Convert Output to a DataFrame

Here's an example of how to convert the output to a DataFrame using Python:

import pandas as pd
import json

data = []
with open("<file_name>.jsonl", "r") as f:
    for line in f:
        data.append(json.loads(line))

df = pd.DataFrame(data)
print(df.columns)
Enter fullscreen mode Exit fullscreen mode

This step allows for further analysis and manipulation of the extracted data.

By following these steps, you've successfully extracted and transformed data from MySQL into a structured format, ready for analysis and insights.


Top comments (0)