DEV Community

Cover image for Quick Tip - How to merge CSV files in 30 lines of code
Coding Bugs
Coding Bugs

Posted on

Quick Tip - How to merge CSV files in 30 lines of code

My nickname in the networks is Coding Bugs, and I have been working as a software architect and developer from 2004. In my dashboard, you will find articles about programming languages, programming techniques, and other concepts that will help you improve.

I am back with another PowerShell article in which I show you how easy and fast it is to generate a single CSV file from many. You will find it useful when developing processes that result in a report in CSV format and you have to merge the contents of several files into one.

CSV is an acronym for Comma Separated Values (CSV) files. It is a de facto standard format that is often used for information exchange. The goal is to have a plain text file in which each row is a record, each value in the row is separated by the comma character. All rows have the same number of values, i.e. the same number of commas. See RFC 4180 for detailed information.

For the following code to work without errors, the requirement is that all the CSV files that we want to join have the same structure, i.e. the same headers and in the same order.

If you want to work with CSV files and you use Node.js, I have developed a library that simplifies getting your data using objects, arrays and dynamic filters. Download with NPM and get more information clicking this link and let me know your thoughts about it.

Once the introduction is done, let's go to the code to comment it.

param(
    [Parameter(Mandatory=$true)]
    [String] $Location,

    [Parameter(Mandatory=$true)]
    [String] $NameFilter,

    [Parameter(Mandatory=$true)]
    [String] $OutputFile
)

If(Test-Path $OutputFile) {
    Remove-Item $OutputFile -Force
}

# Writing headers obtained from the first file found
Get-ChildItem -Path $Location -Filter "$NameFilter.csv"
 | Select -First 1
 | Get-Content
 | Select -First 1
 | Out-File $OutputFile -Append

# Writing content for each file
Get-ChildItem -Path $Location -Filter "$NameFilter.csv"
    | Foreach {
        $_
            | Get-Content
            | Select -Skip 1
            | Out-File $OutputFile -Append
    }
Enter fullscreen mode Exit fullscreen mode

The script needs 3 input parameters:

  1. The first one indicates the location where the CSV files we want to join are located,
  2. The second one allows to indicate a filter to apply to collect the targeted CSV files,
  3. Finally, the third parameter indicates where and what will be the file resulting from the join.

So far so simple. The parameters are all of type String simplifying even more the input values.

The next block deletes the resulting file if it exists. I have added the -Force parameter to prevent it from constantly asking. This is a personal choice, if you want to change it, go ahead.

Now comes the interesting part, the union of the contents. To maintain the same existing structure in each of the CSV files I take the headers of the first file used with the Get-ChildItem command and selecting the first of them all with Select -First 1.

Get-ChildItem -Path $Location -Filter "$NameFilter.csv"
 | Select -First 1
 | Get-Content
 | Select -First 1
 | Out-File $OutputFile -Append
Enter fullscreen mode Exit fullscreen mode

Getting the headers follows the same pattern with different elements. I read the contents of the file, Get-Content, select the first row, Select -First 1 and write it to the output file passed in the third parameter, Out-File.

As you can see, I am not doing any kind of treatment of the data as CSV or otherwise, i.e. using the Import-Csv command. As I mentioned at the beginning of the article, CSV files are text files and can be treated as such when reading and writing them.

The last block of code iterates over the contents of each file to dump all the data except for the first row. Remember that the first row will be the headers and we do not need to rewrite it.

Get-ChildItem -Path $Location -Filter "$NameFilter.csv"
    | Foreach {
        $_
            | Get-Content
            | Select -Skip 1
            | Out-File $OutputFile -Append
    }
Enter fullscreen mode Exit fullscreen mode

As you have seen, with 30 lines of code, including whitespace, indentations, etc., we have the union of several files into one. Maybe you don't need to do this, but I hope it will inspire you.

The code for this article can be found in this gist.

Feedback

If you liked this article, do not forget to click on the heart or the unicorn and share it with your friends and colleagues. Leave me your comments at the bottom about what you liked or how to improve and if you want me to write about other topics send me a direct message to my Twitter account.

Credits

The header image comes from Unsplash and is courtesy of Mika Baumeister.

Top comments (0)