Background
Let me start by saying, I am writing this both to create awareness of a tool I have built and also to share my experiences that I hope helps the community make software better, faster, safer. :)
In the past 8 months, I have been building project yuniql (yuu-nee-kel); an open source schema versioning and database migration tool made with .NET Core. From the beginning, one of the primary use cases we tried to address is seeding our databases both for environment-specific deployment and for supporting integration tests. Every fresh databases carries initial set of data such as lookup tables and baseline data. Running integration tests and regression tests requires snapshots of data or samples to support the scenario being tested.
While we can always script out the initial data, an efficient way is to just drop CSV files into a folder and have this imported into destination tables during deployment, test execution or application startup. Using CSV files also allows our Test Engineers and Business SMEs prepare data themselves and deliver to development teams so it can be can be fed into regression tests.
Such is the birth of bulk import support of yuniql for CSV files as seed data.
Seeding database with CSV files and yuniql
In this article, I am using a sample database from sqltutorial.org. Lets prepare a baseline schema version of the database and seed regions
, countries
, location
, departments
and jobs
tables. Then we will create new version for employees
and dependents
tables.
Step 1: Install yuniql CLI 👉
We begin by getting yuniql cli
installed. We can install yuniql CLI in several ways via chocolatey, dotnet global tool or direct from source. Choose what works for you.
Option 1: Install via .NET Core Global Tool. Requires .NET Core 3.0 SDK installed.
dotnet tool install -g yuniql.cli
Option 2: Install via Chocolatey package manager. Requires choco installed.
choco install yuniql
Option 3: Use Powershell and append to PATH
Invoke-WebRequest -Uri https://github.com/rdagumampan/yuniql/releases/download/latest/yuniql-cli-win-x64-latest.zip -OutFile "c:\temp\yuniql-win-x64-latest.zip"
Expand-Archive "c:\temp\yuniql-win-x64-latest.zip" -DestinationPath "c:\temp\yuniql-cli"
$Env:Path += ";c:\temp\yuniql-cli"
Option 4: Download zipped package containing yuniql.exe file and extract to our workspace directory.
https://github.com/rdagumampan/yuniql/releases/download/latest/yuniql-cli-win-x64-latest.zip
Step 2: Initialize schema repository, call yuniql init
🍨
yuniql init
When yuniql init
is issued, a baseline directory structure. Each version directory represents an atomic version of our database executed in an all-or-nothing fashion. Utility directories are also created to cover pre and post migration activities. A typical starter project workspace looks like this.
Step 3: Create baseline schema structure 🍭
Our next step is to prepare our Baseline version. A Baseline version, is the v0.00
of our database schema and initial data. Baseline helps create full visibility of our database schema evolution. Here, we prepare set of CREATE TABLE
scripts in setup-tables.sql
file.
Step 4: Prepare and add CSV files in the baseline 🍹
Lets now prepare the CSV files. Each CSV file represents a fully qualified name of the destination table. File dbo.regions.csv
will be bulk loaded into dbo.regions
table. We may use other schema or keep it simple to regions.csv
and it will use dbo
as default schema.
Step 5: Let's do this, call yuniql run
🏃
docker run -e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=P@ssw0rd!" -p 1400:1433 -d mcr.microsoft.com/mssql/server:2017-latest
SETX YUNIQL_CONNECTION_STRING "Server=localhost,1400;Database=helloyuniql;User Id=SA;Password=P@ssw0rd!"
Lets verify now by deploying our database locally. Here, we can use SQL Server container to speed up the process. In principle, we just need a database server we can connect to.
yuniql run -a
When yuniql run
is issued the first time, it inspects the target database and creates required table to track the versions applied. The -a
tells yuniql
to create new database also. All script files in _init
directory and child directories will be executed only this time. The order of execution is as follows _init
, _pre
, vx.xx
, vxx.xx+N
, _draft
, _post
.
All CSV files are bulk loaded when all script files are executed.
Lets verify how data looks now at SQL Server. Notice that we created dbo.__yuniqldbversions
to keep record of versions already applied. This ensures the CSV files will not be imported again the next time we call yuniql run
Where do we go from here? 🚧
This is one of the first steps to enable continuous delivery of changes to our databases. We can take this further by committing this workspace into a git
repository and establish a CI/CD pipelines to run the migrations to different environments. Yuniql is released also as free Azure DevOps Task and Docker image.
What about other database platforms? ❄️
A PostgreSql and MySql compatible scripts can be executed in the same way with --platform
parameter.
yuniql run -a --platform postgresql
yuniql run -a --platform mysql
The latest release of yuniql supports CSV import for SQL Server, PostgreSql and MySql. I am now maturing the implementation for Snowflake DW, Azure Synapse Analytics, and CockroachDB which are equally exciting data platforms. Contributors are welcome! :)
Parting words ㊗️
Using CSV files to bulk import baseline data and test samples is an effective approach to seeding databases. CSV files also allows non-SQL users help developers prepare datasets to support the software development process. As yuniql also leverage the native bulk import APIs of the target platform such SqlBulkCopy
for SqlServer, we can import large CSV files pretty fast.
Thanks if you have reached this far!!! 🍻 This is my DEV first post and I guess I had fun with emojis :D The source for this article can be cloned here
P.S. Please support yuniql by clicking GitHub Star! For a young project like this, a star can help capture more user experiences and improve the tool in its early stage. https://github.com/rdagumampan/yuniql
Thanks thanks!
Top comments (0)