DEV Community

Cover image for The Art of Cleaning Files Before They Reach Your Server
IderaDevTools
IderaDevTools

Posted on • Originally published at blog.filestack.com

The Art of Cleaning Files Before They Reach Your Server

Building an application that accepts user content is a standard requirement today. Whether you are running a classroom management tool or a print-on-demand shop, you need to accept files. However, accepting a file in your file uploader is only half the battle. The real challenge lies in making sure that file is actually usable and safe before it enters your system. This is where we move beyond simple uploads and start looking at intelligent automation.

Key Takeaways

  1. Automate Quality Control Workflows serve as an intelligent filter that standardizes, scans, and fixes files before they ever touch your main database.

  2. Keep Your App Fast By offloading heavy tasks like virus scanning or video transcoding to a background process, your user interface remains snappy and responsive.

  3. Webhooks Are Essential Webhooks bridge the gap between long-running background tasks and your application by notifying you exactly when a job is complete.

  4. Solve Industry Headaches From correcting student document formats in EdTech to ensuring high DPI for Printing, workflows handle specific constraints automatically.

  5. Simple Implementation You can trigger complex logic chains with a simple addition to your existing file picker code or a single API call.

The Reality of User Uploads

If you are a developer at a startup or managing an internal tool, you know the specific anxiety that comes with file uploads. You build a beautiful, clean application. You design a perfect database schema. Then you open the doors to users, and they immediately start uploading chaos.

You ask for a profile photo, and someone uploads a 50MB uncompressed TIFF file. You ask for a PDF homework assignment, and a student submits a corrupted Word document from a decade ago. In the worst scenarios, you might even receive a file containing malware that puts your entire infrastructure at risk.

The traditional way to handle this is expensive. You have to set up your own servers to receive the file, check the MIME type, run it through antivirus software, convert it, and finally save it to your storage bucket. That is a massive amount of infrastructure to manage just to accept a simple file.

We need a smarter approach. We need a way to clean and verify data before it ever becomes our problem. This is where Filestack Workflows come in.

Why You Need an Automated Filter

Think of a Workflow as an automated assembly line that sits between your user and your storage. Instead of letting a file land directly in your app, you pass it through a series of logic steps first.

This is critical for growing companies because it removes the need for a dedicated media engineering team.

For Startups, this means you can protect your MVP from malicious content without building a complex security backend. You can automatically flag or reject files that don’t meet your safety standards.

For EdTech, it solves the interoperability nightmare. Students upload assignments from iPads, Chromebooks, and old Windows machines. A workflow can standardizing everything into a clean PDF before it reaches the teacher’s dashboard.

For the Printing Industry, this is about protecting your margins. If a customer uploads a low-resolution image for a large banner, printing it is a waste of money. A workflow can check the dimensions and DPI instantly, rejecting the file or upscaling it before you waste ink/toner.

A Refresher on Webhooks

To understand how Workflows function efficiently, we have to revisit webhooks.

When you run a complex workflow — like scanning for viruses, detecting faces, and converting video formats — it takes time. It might take three seconds, or it might take thirty. In the world of web development, you cannot have your user staring at a frozen screen while this happens.

This is where the webhook comes in.

If you haven’t used them in a while, think of a webhook like a text notification from a delivery driver.

  • The Old Way (Polling) You call the restaurant every ten seconds asking if your food is ready. This wastes your time and annoys the restaurant.

  • The Modern Way (Webhooks) You place the order and go watch TV. When the food is ready, the driver rings your doorbell.

In our context, the “doorbell” is a POST request sent by Filestack to your server. It contains all the data you need about the file, confirming that the workflow is done and the file is safe to use. You can read more about configuring these notifications in the Webhooks documentation.

Implementing the Logic

You can trigger workflows in two main ways. The most common is hooking it directly into the File Picker on the frontend, but you can also use the Workflows API on the backend for more control.

Method 1

The File Picker Configuration

This is the easiest integration. You simply tell the Filestack File Picker that when a user selects a file, it should not just store it. It should run a specific workflow first.

const client = filestack.init('YOUR_API_KEY');

const options = {
  onUploadDone: (res) => console.log(res),
  storeTo: {
    // We attach the workflow ID here
    // This ensures the file runs through our logic chain
    workflows: ["YOUR_WORKFLOW_ID_FROM_DASHBOARD"] 
  }
};

// Open the picker
client.picker(options).open();
Enter fullscreen mode Exit fullscreen mode

Method 2

The Workflows API

Sometimes you might have a file that is already in your storage, or you want to run a workflow programmatically from your backend. You can use a simple HTTP request to trigger the job using the Workflows API.

Here is how you might structure that request.

Bash

curl -X POST “https://cdn.filestackcontent.com/API\_KEY/runWorkflow=id:WORKFLOW\_ID/FILE\_HANDLE”

This flexibility means you can use Workflows to clean up old data just as easily as new uploads.

Handling the Webhook Response

Once the workflow finishes its job, your server receives the data. You need to know what that data looks like so you can parse it.

If you are running a virus scan workflow, the JSON payload sent to your webhook URL will look something like this.

{
"action": "fs.workflow",
  "timestamp": 1620000000,
  "id": "job_id_12345",
  "results": {
    "virus_detection": {
      "data": {
        "infected": false,
        "virus": null
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Your code simply needs to check if infected is false. If it is, you mark the file as “Live” in your database. If it is true, you delete the file and alert the user.

Making It Work for You

The beauty of this system is that it decouples your file ingestion from your core application logic. You don’t need to maintain libraries for image resizing or virus definitions. You define the rules once in the dashboard, and the API enforces them for every single upload.

Whether you are building the next big EdTech platform or simply trying to keep your startup’s database clean, treating file uploads as a workflow rather than a simple storage event is the standard we should all be aiming for.

References

This article was published on the Filestack blog.

Top comments (0)