Preface:
This project was developed during the development of Project 52, because I saw some content that parsed YouTube video information. Through Golang's channel and goroutine, a small tool can be quickly developed. After the project was completed, I didn't really maintain it. Unexpectedly, it was loved by everyone, and the number of Stars exceeded two hundred.
Since YouTube has actually modified the data format (as of 2019/12/10), here we also discuss how to capture relevant information through Golang, and obtain the video title, author's name, and even the method of obtaining the download link.
(Update: 2020/06/02) Since YouTube has updated the way to obtain video URLs again, I personally think it is very suitable for sharing. Here is a summary of the relevant updates in the same article.
Open Source Project: github.com/kkdai/youtube
Slides:
Github: https://github.com/kkdai/youtube
Direct download and use:
- go install github.com/kkdai/youtube/youtubedr
Usage 1: (Save the file name as Campaign Diary.mp4)
youtubedr -o "Campaign Diary".mp4 https://www.youtube.com/watch\?v\=XbNghLqsVwU
Usage 2: (Do not specify a file name, but use the video title)
youtubedr https://www.youtube.com/watch\?v\=XbNghLqsVwU
Capture Youtube Video Information
Get Youtube Video ID and Get Information:
For example, a great talk by Rob Pike in dotGo 2015 - Simplicity is Complicated, the video location is as follows:
https://www.youtube.com/watch?v=rFejpH\_tAHM
Here is a brief introduction. Each video on YouTube has an ID, and the ID of this video is rFejpH_tAHM.
If you need to get the relevant information of the video, you need to call
https://youtube.com/get_video_info?video_id={YOUR_VIDEO_ID}
To get it, that is, if you want to get the information of this video, you need to link to https://youtube.com/get\_video\_info?video\_id=rFejpH\_tAHM
Get Information and Related Processing Code:
Next, let's discuss how to find the video title and data. First, let's take a detailed look at the relevant data obtained just now.
Because what is obtained is URK-encoded query string data, it needs to be processed in the following way.
First, you need to handle the error message. Since many videos are prohibited from being downloaded and shared, an error status will occur when obtaining relevant information, which needs to be handled here.
Get Video Title and Video Author Information:
I have to say that most of the data has been modified. It is different from the information that can be found on the Internet. So I spent a lot of time searching for relevant information again, and organizing and converting it. After the above parseVideoInfo conversion, you can get url.Values, which is the variable name answer.
The processing can refer to the following method. Because it is found that `answer[“player\_response”] has a Map structure data format, the relevant information can be obtained through the following method.
Here are some less common usages, let me explain a little.
`
if err := json.Unmarshal([]byte(playResponse[0]), &personMap); err != nil {
panic(err)
}
`
This uses JSON unmarshal to convert the JSON string into a map. After converting to a map, you can search and get values, here you can use videoDetails.
As for how readers may be curious about these data formats, and know where the data is located. This is also found through constant iteration.
After obtaining the map data, since the default format of the data will be interface{}, if you want to convert it to string output. You can use type assertion or direct conversion.
Download Video:
Let's explain how to search for all video formats, and how to find the highest resolution video.
Here you can get the stream Map from player_response, which is a JSON data of an Array. (As described in the slides).
At this time, you need to start processing the obtained JSON data, the recommended method is as follows:
- First go to the JSON Lint website to convert the obtained JSON raw data into a readable format.
- Then copy and paste it into the editor to remove some web css tags
- Finally, paste the data into JSON-TO-GO to get the Go structure
It seems that the download link can be obtained (but..)
It seems that the download URL can be obtained from [“streamingData”][“formats”][0][“url”], but it seems that not every video provides such data.
When searching for data for a certain video, it is found that there is no url data to find, but there is an extra piece of data cipher ??
Cipher and Decipher:
At this time, you need to start looking for relevant information. Fortunately, I found this article. It mentions relevant explanations, mainly the following things:
-
cipheris an encrypted information, you need to dodecipherto get theurl. - As for
decipher, it is mainly processed through three functions:- EQ() is responsible for exchanging a certain character with the first character.
- Splice() is responsible for taking only the first n characters.
- Reverse() is responsible for reversing the entire string.
- But each decipher is actually composed of these three functions in different orders.
- The combination needs to be viewed in base.js to know.
Retrieval base.js and migrate to Go code
So how to correctly obtain base.js and how to obtain the complete decipher data flow?
- First go to
https://www.youtube.com/embed/{VIDEO_ID}?hl=ento get the base.js address. - The address may be information like
https://www.youtube.com/s/player/e3cd195e/player_ias.vflset/en_US/base.js. - Then open the
base.jscontent and search for the entire content of decipher. - Process the relevant content through the corresponding Korean style to process the
cipher.
This code shows how to find the decipher function content in base.js, and record EQ(), splice() and reverse(), and also find Args (arguments).
(The related skills are mainly through the method of regular expression, the details can refer to the official document regexp.)
Then you need to process the cipher string through the Go function in the relevant Mapping method. Finally~~finally you can get the url. (Sprinkle flowers
Conclusion:
Through the method of web crawler, and through the background conversion method to obtain YouTube related information. This is a long but very interesting process, which can not only learn web crawler skills and related string search and processing skills.
The current complete project is open source, and I hope that more people will join in to help. For details, you can see

Top comments (0)