DEV Community

Evan Lin
Evan Lin

Posted on • Originally published at evanlin.com on

[TIL][Golang] How to Get YouTube Video Information and Download YouTube Videos with Go (Updated June 3, 2020)

Preface:

This project was developed during the development of Project 52, because I saw some content that parsed YouTube video information. Through Golang's channel and goroutine, a small tool can be quickly developed. After the project was completed, I didn't really maintain it. Unexpectedly, it was loved by everyone, and the number of Stars exceeded two hundred.

Since YouTube has actually modified the data format (as of 2019/12/10), here we also discuss how to capture relevant information through Golang, and obtain the video title, author's name, and even the method of obtaining the download link.

(Update: 2020/06/02) Since YouTube has updated the way to obtain video URLs again, I personally think it is very suitable for sharing. Here is a summary of the relevant updates in the same article.

Open Source Project: github.com/kkdai/youtube

Slides:

Github: https://github.com/kkdai/youtube

Direct download and use:
- go install github.com/kkdai/youtube/youtubedr

Usage 1: (Save the file name as Campaign Diary.mp4)
youtubedr -o "Campaign Diary".mp4 https://www.youtube.com/watch\?v\=XbNghLqsVwU

Usage 2: (Do not specify a file name, but use the video title)
youtubedr https://www.youtube.com/watch\?v\=XbNghLqsVwU

Enter fullscreen mode Exit fullscreen mode

Capture Youtube Video Information

Get Youtube Video ID and Get Information:

For example, a great talk by Rob Pike in dotGo 2015 - Simplicity is Complicated, the video location is as follows:

https://www.youtube.com/watch?v=rFejpH\_tAHM

Here is a brief introduction. Each video on YouTube has an ID, and the ID of this video is rFejpH_tAHM.

If you need to get the relevant information of the video, you need to call

https://youtube.com/get_video_info?video_id={YOUR_VIDEO_ID}

Enter fullscreen mode Exit fullscreen mode

To get it, that is, if you want to get the information of this video, you need to link to https://youtube.com/get\_video\_info?video\_id=rFejpH\_tAHM

Get Information and Related Processing Code:

Next, let's discuss how to find the video title and data. First, let's take a detailed look at the relevant data obtained just now.

Because what is obtained is URK-encoded query string data, it needs to be processed in the following way.

First, you need to handle the error message. Since many videos are prohibited from being downloaded and shared, an error status will occur when obtaining relevant information, which needs to be handled here.

Get Video Title and Video Author Information:

I have to say that most of the data has been modified. It is different from the information that can be found on the Internet. So I spent a lot of time searching for relevant information again, and organizing and converting it. After the above parseVideoInfo conversion, you can get url.Values, which is the variable name answer.

The processing can refer to the following method. Because it is found that `answer[“player\_response”] has a Map structure data format, the relevant information can be obtained through the following method.

Here are some less common usages, let me explain a little.

`
if err := json.Unmarshal([]byte(playResponse[0]), &personMap); err != nil {
panic(err)
}

`

This uses JSON unmarshal to convert the JSON string into a map. After converting to a map, you can search and get values, here you can use videoDetails.

As for how readers may be curious about these data formats, and know where the data is located. This is also found through constant iteration.

After obtaining the map data, since the default format of the data will be interface{}, if you want to convert it to string output. You can use type assertion or direct conversion.

Download Video:

Let's explain how to search for all video formats, and how to find the highest resolution video.

Here you can get the stream Map from player_response, which is a JSON data of an Array. (As described in the slides).

At this time, you need to start processing the obtained JSON data, the recommended method is as follows:

  • First go to the JSON Lint website to convert the obtained JSON raw data into a readable format.
  • Then copy and paste it into the editor to remove some web css tags
  • Finally, paste the data into JSON-TO-GO to get the Go structure

It seems that the download link can be obtained (but..)

It seems that the download URL can be obtained from [“streamingData”][“formats”][0][“url”], but it seems that not every video provides such data.

When searching for data for a certain video, it is found that there is no url data to find, but there is an extra piece of data cipher ??

Cipher and Decipher:

At this time, you need to start looking for relevant information. Fortunately, I found this article. It mentions relevant explanations, mainly the following things:

  • cipher is an encrypted information, you need to do decipher to get the url.
  • As for decipher, it is mainly processed through three functions:
    • EQ() is responsible for exchanging a certain character with the first character.
    • Splice() is responsible for taking only the first n characters.
    • Reverse() is responsible for reversing the entire string.
  • But each decipher is actually composed of these three functions in different orders.
  • The combination needs to be viewed in base.js to know.

    Retrieval base.js and migrate to Go code

So how to correctly obtain base.js and how to obtain the complete decipher data flow?

  • First go to https://www.youtube.com/embed/{VIDEO_ID}?hl=en to get the base.js address.
  • The address may be information like https://www.youtube.com/s/player/e3cd195e/player_ias.vflset/en_US/base.js.
  • Then open the base.js content and search for the entire content of decipher.
  • Process the relevant content through the corresponding Korean style to process the cipher.

This code shows how to find the decipher function content in base.js, and record EQ(), splice() and reverse(), and also find Args (arguments).

(The related skills are mainly through the method of regular expression, the details can refer to the official document regexp.)

Then you need to process the cipher string through the Go function in the relevant Mapping method. Finally~~finally you can get the url. (Sprinkle flowers

Conclusion:

Through the method of web crawler, and through the background conversion method to obtain YouTube related information. This is a long but very interesting process, which can not only learn web crawler skills and related string search and processing skills.

The current complete project is open source, and I hope that more people will join in to help. For details, you can see

Github: https://github.com/kkdai/youtube

Reference:

Top comments (0)