Here I want to show you how to integrate an arbitrary command that performs certain manipulations on your file into the file-processing pipeline. A file-processing pipeline we are going to work with is called Capyfile. It’s free, open-source, and written in Golang. If you want to familiarize yourself with the Capyfile, you can check an article called Do whatever you want with your files, and do it quickly. Or just check its repository on GitHub.
Capyfile goes with the built-in file-processing operations that allow you to retrieve, validate, modify, and store your files. In addition to these, I decided to add one more operation that allows you to run external commands. This can be pretty much any command that works with the files, such as ffmpeg, exiftool, zip, wget, aws cli, etc. Let me show you how on a few examples.
Transcode video with ffmpeg
Let’s say you have a directory with a bunch of avi video files and you want to transcode these to mp4. Here’s your file-processing pipeline configuration:
---
version: '1.1'
name: videos
processors:
- name: transcode_avi_to_mp4
operations:
- name: filesystem_input_read
cleanupPolicy: keep_files
params:
target:
sourceType: env_var
source: INPUT_READ_TARGET
- name: file_type_validate
params:
allowedMimeTypes:
sourceType: value
source:
- video/x-msvideo
- name: command_exec
cleanupPolicy: remove_files
params:
commandName:
sourceType: value
source: ffmpeg
commandArgs:
sourceType: value
source: [
"-i", "{{.AbsolutePath}}",
"-c:v", "copy",
"-c:a", "copy",
"/tmp/{{.Basename}}.mp4",
]
outputFileDestination:
sourceType: value
source: /tmp/{{.Basename}}.mp4
- name: filesystem_input_write
params:
destination:
sourceType: env_var
source: INPUT_WRITE_DESTINATION
useOriginalFilename:
sourceType: value
source: true
You can run it with capycmd command line app:
$ INPUT_READ_TARGET=/home/user/Videos/* \
INPUT_WRITE_DESTINATION=/home/user/Videos/transcoded \
capycmd -f service-definition.yml videos:transcode_avi_to_mp4
Compress and archive old logs
If you saw the repository, you may remember the log archiver example. What was missing there? Probably compression. And now we can add it this way:
---
version: '1.1'
name: logs
processors:
- name: archive
operations:
- name: filesystem_input_read
cleanupPolicy: keep_files
params:
target:
sourceType: env_var
source: INPUT_READ_TARGET
- name: file_time_validate
params:
maxMtime:
sourceType: env_var
source: MAX_LOG_FILE_TIME_RFC3339
- name: command_exec
cleanupPolicy: remove_files
params:
commandName:
sourceType: value
source: gzip
commandArgs:
sourceType: value
source: ["{{.AbsolutePath}}"]
outputFileDestination:
sourceType: value
source: "{{.AbsolutePath}}.gz"
- name: command_exec
params:
commandName:
sourceType: value
source: aws
commandArgs:
sourceType: value
source: [
"s3",
"cp", "{{.AbsolutePath}}",
"s3://my_logs_bucket/{{.Filename}}",
]
And run:
$ INPUT_READ_TARGET=/var/log/rotated-logs* \
MAX_LOG_FILE_TIME_RFC3339=$(date -d "30 days ago" -u +"%Y-%m-%dT%H:%M:%SZ") \
capycmd -f service-definition.yml logs:archive
Download the archive and process individual files in it
So you have an archive and you want to unpack it and process each file in it. For example, let’s download the archive of images and do some transformations for all images in the archive:
---
version: '1.1'
name: web_images
processors:
- name: unpack
operations:
- name: command_exec
params:
commandName:
sourceType: value
source: bash
commandArgs:
sourceType: value
source:
- -c
- >
wget -O /tmp/images.zip https://example.com/images.zip;
mkdir -p /tmp/web_images;
unzip /tmp/images.zip -d /tmp/web_images
- name: filesystem_input_read
cleanupPolicy: remove_files
params:
target:
sourceType: value
source: "/tmp/web_images/*"
- name: file_type_validate
params:
allowedMimeTypes:
sourceType: value
source:
- image/jpeg
- image/png
- image/heif
- name: image_convert
cleanupPolicy: remove_files
params:
toMimeType:
sourceType: value
source: image/jpeg
quality:
sourceType: value
source: high
- name: filesystem_input_write
params:
destination:
sourceType: env_var
source: INPUT_WRITE_DESTINATION
useOriginalFilename:
sourceType: value
source: true
And as usual, run it:
$ INPUT_WRITE_DESTINATION=/home/user/Pictures/web_images \
capycmd -f service-definition.yml web_images:unpack
In the end
The ability to seamlessly integrate and execute arbitrary commands exponentially increases the number of use cases that Capyfile can cover. So feel free to try it out and share your feedback or any ideas that you have.
Top comments (0)