Omar Ahmed

Posted on Oct 5, 2022

[Side Project] Post automated Youtube videos from Reddit

#go #showdev #productivity #performance

Long introduction

I've always been curious about some videos that I come across on Facebook and YouTube that look a little strange. Like they were just generated by a robot. Think of a video playing with a little box of a guy reacting to it while their reaction is incredibly strange and appears utterly unrelated. They always feature stuff that either doesn't fit the caption or the content itself is unusual.

I finally understand why. These videos are all automated and performed by scripts. It also appears that "Youtube/Facebook automation" is popular right now.

I first became curious about "Youtube automation" after running across the word. I didn't know that some people were building scripts to scrape stuff from websites (like Facebook and Reddit, but largely Reddit, to be honest), process it, and then produce a video from it to put online.

Imagine this thing running, and having a cron-job to run and post a new video every day. Consider your potential earnings and the consistency of this. You simply need to worry about funding your server and maybe improving how the material is processed while being scraped.

There is a lot to say about this, but I'm here to explain how I used Golang to get several Tiktok videos from a subreddit, combine them, and post a video of the "Top 10 funny/cringiest Tiktok videos" on YouTube without knowing what the video's content was.

TLDR

Social media is really weird. Authentic content is becoming harder to find, and BTW! I managed to code a Golang script to scrape Tiktok videos from a Subreddit, combine them into a single video, and upload it to Youtube all automatically.

So how though?

So for starters, the initial idea is to split the big thing into smaller things and blah blah you know the boring engineering stuff. But I seriously wanted to make it to scale. So it made sense to have 3 Golang packages/modules:

Reddit Aggregator:
- This would be responsible for scraping a subreddit and extracting Tiktok videos out of it.
- Will save the results into a MongoDB collection - because schema is overrated -
Video Processor:
- This would grab unpublished results from MongoDB collection:
  - Downloads all videos on the server
  - Process each single video
  - Combine all videos into a single video
- Now obviously this seems a bit too much for a single service, but ... meh
Uploader:
- This will upload the final video to Youtube

TLDR

This is a side project, and I wanted it to be complex enough that Youtube would actually use it. However, spoiler alert: they won't. That is why I divided the code into three Golang packages to run them as microservices serverless functions. Spoiler alert again: they are not.

Reddit Aggregator

The aggregator is straightforward; all it does is use the Reddit API to obtain posts from a subreddit and store them in a MongoDB collection.

https://www.reddit.com/r/<SUBREDDIT>/<POSTS_TYPE>.json?limit=<LIMIT>
https://www.reddit.com/r/linux/hot.json?limit=10

Now not to keep it too simple, every single thing between < > is a variable, that can be provided before the run to make it more robust.

The aggregator will save each post on a DB collection that will look like this:

{
    "_id" : ObjectId("633c37a1740ff26f2433f99c"),
    "hash" : "78a0428f87a9adf7d692e5435466bc5b",
    "title" : "<POST_TITLE>",
    "video" : "<VIDEO_LINK>",
    "created_at" : ISODate("2022-10-04T15:39:45.808+02:00"),
    "updated_at" : ISODate("2022-10-04T15:39:45.808+02:00"),
    "published" : false
}

What are the hash and published fields for you wonder?

Well, hash is a simple MD5 hash of the video link, and on DB this field has a unique index so that we're sure that the aggregator never inserts duplicates. While published is for determining to not return a post/video that was already published to our Youtube channel, this way it will not slip into the process.

Video processor

So, this package will begin by getting a list of unpublished posts from the database, downloading them all to a specific directory, and then starting the processor to process them all.

Downloader

Downloader is using youtube-dl behind the scenes to download every single video, now why not just a simple HTTP GET request and write the file?

youtube-dl provide great support for downloading videos, from various sources.
The goal is to be scalable and to be able to even have multiple aggregators ( facebook aggregator, youtube aggregator, Vimeo... etc )
I already had a wrapper to run a CLI command safely that I did (sniff.go#L118)

Goroutines were required since downloading 10 videos one at a time seemed incredibly slow. I had to run 10 goroutines and wait for them to complete before going on to the next step.

With Goroutines in place, there comes a challenge, more on that will be on here

So to download a single video is just that:

const ydl_command = "youtube-dl %s -q -o %s.%%(ext)s"

func download(url, base, filename string) (string, string, error) {
    dest := path.Join(base, "downloads", filename)
    cmdString := fmt.Sprintf(ydl_command, url, dest)
    args := strings.Split(cmdString, " ")

    cmd := exec.Command(args[0], args[1:]...)
    out, err := helpers.RunCmd(cmd)

    return args[len(args)-1], out, err
}

Processor

ffmpeg is pure evil

The processor is identical to the downloader; it will run through each video and, given that all Tiktok videos are vertical and have a black background on the left and right sides, its sole function will be to create blurry background for the video.

So to achieve that goal, you will have to use ffmpeg which seems more evil than Regex.

Anyway for processing a single video this is the function to do it:


const (
    ffmpeg_quality = "1080k"
    ffmpeg_command = `ffmpeg -i %s -lavfi %s -vb %s -c:v libx264 -crf 20 %s.mp4 -n`
    ffmpeg_filters = `[0:v]scale=ih*16/9:-1,boxblur=luma_radius=min(h\,w)/20:luma_power=1:chroma_radius=min(cw\,ch)/20:chroma_power=1[bg];[bg][0:v]overlay=(W-w)/2:(H-h)/2,crop=h=iw*9/16`
)

func process(video downloader.Video, base string) (string, string, error) {
    dest := path.Join(base, "blurry", video.Post.Hash)
    cmdString := fmt.Sprintf(ffmpeg_command, video.Path, ffmpeg_filters, ffmpeg_quality, dest)
    args := strings.Split(cmdString, " ")

    cmd := exec.Command(args[0], args[1:]...)
    out, err := helpers.RunCmd(cmd)
    if strings.Contains(out, "already exists. Exiting") {
        err = nil
    }

    return args[len(args)-2], out, err
}

For the record, it took me ~1hr to fix an issue running ffmpeg command to learn that I had to change this:

`ffmpeg -i %s -lavfi '%s' -vb %s -c:v libx264 -crf 20 %s.mp4 -n`

to this:

`ffmpeg -i %s -lavfi %s -vb %s -c:v libx264 -crf 20 %s.mp4 -n`

Now of course since the script will be processing 10 videos, this needs to be done using Goroutines to benefit the concurrency. As with the downloader, this will be later explained in the concurrency challenge section

Merger

The merger is the processor's last step, where after adding the blurry background to all of the videos, it will merge them all into a single video. But it does that in steps:

Create a text file that has all of the videos paths in this format

file /some/path/blurry-file1.mp4
file /some/path/blurry-file2.mp4

Then pass that file to ffmpeg so that it combines all videos

ffmpeg -f concat -safe 0 -i all_videos.txt final.mp4

Then remove the temporary all_videos.txt file

func MergeAll(videos []ProcessedVideo, base, output string) error {
    temp, err := createVideosFile(videos, base)
    if err != nil {
        return err
    }

    out, err := runMerge(output, temp)
    if err != nil {
        log.Printf("error running command, output: %q", out)
        return err
    }

    return os.Remove(temp)
}

Uploader

This is the last piece of the puzzle where the final video gets uploaded to youtube, but to avoid having the same title always, there is an option to have a random title deduced from the posts we already have on the database.

Since the goal is to be able to publish a video every single day, you can just get all posts of today and get a single random post, and then make it the video's title.

type Config struct {
    VideoInfoType    string
    VideoTitle       string
    VideoDescription string
    DatabaseURI      string
}

func (c *Config) GetTitle() (string, error) {
    switch c.VideoInfoType {
    case VideoTypeManual:
        return c.VideoTitle, nil
    case VideoTypeRandom:
        db, cancel, err := getDbClient(c)
        if err != nil {
            return "", err
        }
        defer cancel()

        var posts []types.Post
        err = db.FindRandomOfToday(1, &posts)
        if err != nil {
            return "", err
        }
        p := posts[0]
        return fmt.Sprintf("Top 10 best/cringiest Tiktoks today: %s", p.Title), nil
    default:
        return "", fmt.Errorf("unknown video type %v", c.VideoInfoType)
    }
}

Besides that script of course will need client_credentials and API token to be able to use youtube API to upload the video.

upload := &youtube.Video{
  Snippet: &youtube.VideoSnippet{
    Title:       title,
    Description: description,
  },
  Status: &youtube.VideoStatus{
    PrivacyStatus: c.PrivacyStatus,
  },
}

call := service.Videos.Insert([]string{"snippet", "status"}, upload)

file, err := os.Open(c.OutputFile)
check(err)
defer file.Close()

r, err := call.Media(file).Do()
check(err)

Concurrency challenge

The main challenge for me was to be able to just concurrently run tasks I want to run, of which i have 2:

Downloading videos
Processing videos

These are the most expensive tasks I have in terms of time because if they run one by one it will take so much time obviously.

So I wanted to have a way of running a tasks pool, where I pass the number of needed threads, and the pool will just split the tasks across these threads. So I ended up implementing this:

type ThreadElement struct {
    Element interface{}
}

func Threadify(numOfThreads int, elements []ThreadElement, f func(args ...interface{})) {
    length := len(elements)
    each := length / numOfThreads
    acc := length - (numOfThreads * each)

    var wg sync.WaitGroup

    wg.Add(numOfThreads)

    start := 0

    for i := 0; i < numOfThreads; i++ {
        running := each

        if acc > 0 {
            running++
            acc--
        }

        go func(start, i int) {
            for j := 0; j < running; j++ {
                e := elements[start]
                f(e.Element)
                start++
            }

            wg.Done()
        }(start, i)

        start += running
    }

    wg.Wait()
}

Pretty basic right? and for sure there is a better way to do it, but hey it actually worked, and now I can download videos by calling it like this:

var (
  videos   []Video
  elements []helpers.ThreadElement
)

for _, p := range posts {
  elements = append(elements, ThreadElement{
    Element: p,
  })
}

Threadify(3, elements, func(args ...interface{}) {
  p := args[0].(types.Post)
  log.Printf("calling download %q", p.Hash)
  downloadedPath, out, err := download(p.Video, base, p.Hash)
  if err != nil {
    log.Fatalf("error downloading video: %q of this post: %q::: %q\ncommand output: %s", p.Video, p.Hash, err.Error(), out)
  }

  log.Printf("Downloaded video: %q with hash %q on %q\n", p.Title, p.Hash, downloadedPath)
  videos = append(videos, Video{
    Post: p,
    Path: strings.ReplaceAll(downloadedPath, "%(ext)s", "mp4"),
  })
})

fmt.Println("Videos downloaded", videos)

I loved how generic this function is, and that is I just can use it for anything I want to concurrently run, and the way you can pass elements to it, by just kind of mapping it into ThreadElement so that it understands what it will later pass to the callback function.

But still, that looked hard to maintain, and I asked myself there has to be a better way to do this, just out of curiosity i googled and came across ants which seemed exactly right for this kind of functionality I wanted, so I converted the same function to use ants, and it became this:

func Threadify(numOfThreads int, elements []ThreadElement, f func(args ...interface{})) error {
    var wg sync.WaitGroup

    p, err := ants.NewPoolWithFunc(numOfThreads, func(e interface{}) {
        f(e)
        wg.Done()
    })
    if err != nil {
        return err
    }

    defer p.Release()

    for _, e := range elements {
        wg.Add(1)
        err = p.Invoke(e.Element)
        if err != nil {
            return err
        }
    }

    wg.Wait()
    return nil
}

Now it even looks cleaner, and I just don't care anymore about how it works as long as it works, and the most beautiful thing is that the function signature is still the same, which means I could still use it in the same way I did and have it generic just as I wanted initially.

Lessons learned

Finally now after the project is complete, here are the top lessons I learned from:

I shouldn't be surprised to know about "Youtube automation", literally everything is possible nowadays.
Golang is just super easy, Goroutines are easy when you just think of them logically, but also they're easy to produce bugs.
Don't use any dotenv package in Golang, IDK why I did!! it just makes no sense at least in this kind of project.

Finally

I got the idea from this Reddit post huge kudos to the OP, he made it simply in bash, but I just had to be a nerd to do it in Golang :/

There is huge room for improvement indeed, but I'm happy with the results, mission is achieved now to the new side project.

DEV Community