DEV Community

Anton
Anton

Posted on

Integrate any command into your file-processing pipeline

Here I want to show you how to integrate an arbitrary command that performs certain manipulations on your file into the file-processing pipeline. A file-processing pipeline we are going to work with is called Capyfile. It’s free, open-source, and written in Golang. If you want to familiarize yourself with the Capyfile, you can check an article called Do whatever you want with your files, and do it quickly. Or just check its repository on GitHub.

Capyfile goes with the built-in file-processing operations that allow you to retrieve, validate, modify, and store your files. In addition to these, I decided to add one more operation that allows you to run external commands. This can be pretty much any command that works with the files, such as ffmpeg, exiftool, zip, wget, aws cli, etc. Let me show you how on a few examples.

Transcode video with ffmpeg

Let’s say you have a directory with a bunch of avi video files and you want to transcode these to mp4. Here’s your file-processing pipeline configuration:

---
version: '1.1'
name: videos
processors:
  - name: transcode_avi_to_mp4
    operations:
      - name: filesystem_input_read
        cleanupPolicy: keep_files
        params:
          target:
            sourceType: env_var
            source: INPUT_READ_TARGET
      - name: file_type_validate
        params:
          allowedMimeTypes:
            sourceType: value
            source:
              - video/x-msvideo
      - name: command_exec
        cleanupPolicy: remove_files
        params:
          commandName:
            sourceType: value
            source: ffmpeg
          commandArgs:
            sourceType: value
            source: [
              "-i", "{{.AbsolutePath}}",
              "-c:v", "copy",
              "-c:a", "copy",
              "/tmp/{{.Basename}}.mp4",
            ]
          outputFileDestination:
            sourceType: value
            source: /tmp/{{.Basename}}.mp4
      - name: filesystem_input_write
        params:
          destination:
            sourceType: env_var
            source: INPUT_WRITE_DESTINATION
          useOriginalFilename:
            sourceType: value
            source: true
Enter fullscreen mode Exit fullscreen mode

You can run it with capycmd command line app:

$ INPUT_READ_TARGET=/home/user/Videos/* \ 
  INPUT_WRITE_DESTINATION=/home/user/Videos/transcoded \ 
  capycmd -f service-definition.yml videos:transcode_avi_to_mp4
Enter fullscreen mode Exit fullscreen mode

Compress and archive old logs

If you saw the repository, you may remember the log archiver example. What was missing there? Probably compression. And now we can add it this way:

---
version: '1.1'
name: logs
processors:
  - name: archive
    operations:
      - name: filesystem_input_read
        cleanupPolicy: keep_files
        params:
          target:
            sourceType: env_var
            source: INPUT_READ_TARGET
      - name: file_time_validate
        params:
          maxMtime:
            sourceType: env_var
            source: MAX_LOG_FILE_TIME_RFC3339
      - name: command_exec
        cleanupPolicy: remove_files
        params:
          commandName:
            sourceType: value
            source: gzip
          commandArgs:
            sourceType: value
            source: ["{{.AbsolutePath}}"]
          outputFileDestination:
            sourceType: value
            source: "{{.AbsolutePath}}.gz"
      - name: command_exec
        params:
          commandName:
            sourceType: value
            source: aws
          commandArgs:
            sourceType: value
            source: [
              "s3",
              "cp", "{{.AbsolutePath}}",
              "s3://my_logs_bucket/{{.Filename}}",
            ]
Enter fullscreen mode Exit fullscreen mode

And run:

$ INPUT_READ_TARGET=/var/log/rotated-logs* \
  MAX_LOG_FILE_TIME_RFC3339=$(date -d "30 days ago" -u +"%Y-%m-%dT%H:%M:%SZ") \
  capycmd -f service-definition.yml logs:archive
Enter fullscreen mode Exit fullscreen mode

Download the archive and process individual files in it

So you have an archive and you want to unpack it and process each file in it. For example, let’s download the archive of images and do some transformations for all images in the archive:

---
version: '1.1'
name: web_images
processors:
  - name: unpack
    operations:
      - name: command_exec
        params:
          commandName:
            sourceType: value
            source: bash
          commandArgs:
            sourceType: value
            source:
              - -c
              - >
                wget -O /tmp/images.zip https://example.com/images.zip;
                mkdir -p /tmp/web_images; 
                unzip /tmp/images.zip -d /tmp/web_images
      - name: filesystem_input_read
        cleanupPolicy: remove_files
        params:
          target:
            sourceType: value
            source: "/tmp/web_images/*"
      - name: file_type_validate
        params:
          allowedMimeTypes:
            sourceType: value
            source:
              - image/jpeg
              - image/png
              - image/heif
      - name: image_convert
        cleanupPolicy: remove_files
        params:
          toMimeType:
            sourceType: value
            source: image/jpeg
          quality:
            sourceType: value
            source: high
      - name: filesystem_input_write
        params:
          destination:
            sourceType: env_var
            source: INPUT_WRITE_DESTINATION
          useOriginalFilename:
            sourceType: value
            source: true
Enter fullscreen mode Exit fullscreen mode

And as usual, run it:

$ INPUT_WRITE_DESTINATION=/home/user/Pictures/web_images \
  capycmd -f service-definition.yml web_images:unpack
Enter fullscreen mode Exit fullscreen mode

In the end

The ability to seamlessly integrate and execute arbitrary commands exponentially increases the number of use cases that Capyfile can cover. So feel free to try it out and share your feedback or any ideas that you have.

Top comments (0)