LangChain on Cloud Run: Getting YouTube Info

#serverless #llm #python #tutorial

title: [Google Cloud/Firebase] How to Get YouTube Information via LangChain on GCP Cloud Run
published: false
date: 2024-10-04 00:00:00 UTC
tags: 
canonical_url: https://www.evanlin.com/langchain-youtube-gcp/
---

![image-20241006103355905](https://www.evanlin.com/images/2022/image-20241006103355905.png)

# Preface

[I mentioned before](https://dev.to/evanlin/geminifirebase-ge-ren-zi-xun-liu-tou-guo-ifttt-yu-langchain-da-zao-ke-ji-shi-shi-line-bot-4246-temp-slug-7050583), I created a "Technology News LINE Bot via IFTTT and LangChain" that allows me to obtain a lot of necessary information through that LINE Bot. However, recently, there are too many YouTube videos with very in-depth technical information. This also made me think about how to get YouTube subtitle information to organize the information.

Although the [LangChain YouTube Transcripts Loader](https://python.langchain.com/docs/integrations/document_loaders/youtube_transcript/) can quickly obtain subtitle information, it will encounter some problems when deployed to Google CloudRun. This article will share what problems were encountered and how to solve these problems. I hope it can help everyone.

# Calling LangChain YouTube Transcripts Loader to Get Video Subtitles

Referring to the LangChain documentation ([YouTube transcripts](https://python.langchain.com/docs/integrations/document_loaders/youtube_transcript/)), you can get YouTube videos with subtitles through this package. Through related analysis, you can quickly understand the video content. Here are some example codes:

%pip install --upgrade --quiet pytube

loader = YoutubeLoader.from_youtube_url(
"https://www.youtube.com/watch?v=QsYGlZkevEg", add_video_info=True
)
loader.load()


# Unable to Get Data on GCP CloudRun

After trying to deploy to Google CloudRun, this is the code used:

def summarized_from_youtube(youtube_url: str) -> str:
"""
Summarize a YouTube video using the YoutubeLoader and Google Generative AI model.
"""
try:
print("Youtube URL: ", youtube_url)
# Load the video content using YoutubeLoader
loader = YoutubeLoader.from_youtube_url(
youtube_url, add_video_info=True, language=["zh-Hant", "zh-Hans", "ja", "en"])
docs = loader.load()

    print("Pages of Docs: ", len(docs))
    # Extract the text content from the loaded documents
    text_content = docs_to_str(docs)
    print("Words: ", len(text_content.split()),
          "First 1000 chars: ", text_content[:1000])

    # Summarize the extracted text
    summary = summarize_text(text_content)

    return summary
except Exception as e:
    # Log the exception if needed
    print(f"An error occurred: {e}")
    return ""


But the results are completely different, because it is impossible to get any data at all.

![image-20241005230731585](https://www.evanlin.com/images/2022/image-20241005230731585.png)

No error messages were obtained, but it is completely **empty, it's empty.**

# Debug 1: Try putting the same code in Colab

You can refer to the following [gist code](https://gist.github.com/kkdai/4d613dcdc86bad995477be4d22a7f907).

![image-20241005230200026](https://www.evanlin.com/images/2022/image-20241005230200026.png)

I originally thought it was a problem with the code, but it seems to be running correctly in Colab. It looks like I have to use [GoogleApiYoutubeLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.youtube.GoogleApiYoutubeLoader.html).

# Using LangChain Youtube Loader on Google CloudRun

Trying to use [GoogleApiYoutubeLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.youtube.GoogleApiYoutubeLoader.html) on CloudRun requires some steps:

- Get the System Account's json file. (Refer to [this article](https://www.evanlin.com/til-heroku-gcp-key/))
  - Put the json content in [Secret Manager](https://cloud.google.com/security/products/secret-manager?hl=zh-TW) to store the data.
  - Get the data through [Secret Manager](https://cloud.google.com/security/products/secret-manager?hl=zh-TW) in the code.
  - Call the YouTube code.
- You can [enable the YouTube API in Google Cloud](https://console.developers.google.com/apis/api/youtube.googleapis.com/overview?project=660825558664) through the following methods.

![image-20241005231500039](https://www.evanlin.com/images/2022/image-20241005231500039.png)

- [Enable Secret Manager API](https://console.cloud.google.com/apis/library/secretmanager.googleapis.com) to manage System Account Key Json file data.

![image-20241006001807126](https://www.evanlin.com/images/2022/image-20241006001807126.png)

## Get data through Secret Manager

- First, you need to set the Project ID in the environment variable `PROJECT_ID`.
- Then, put the data in `youtube_api_credentials` in advance.

def get_secret(secret_id):
logging.debug(f"Fetching secret for: {secret_id}")
client = secretmanager.SecretManagerServiceClient()
name = f"projects/{os.environ['PROJECT_ID']}/secrets/{secret_id}/versions/latest"
response = client.access_secret_version(request={"name": name})
secret_data = response.payload.data.decode("UTF-8")
logging.debug(f"Secret fetched successfully for: {secret_id}, {secret_data[:50]}")
return secret_data


This way, you can securely obtain data through [Secret Manager](https://cloud.google.com/security/products/secret-manager?hl=zh-TW).

## Try writing an example code and deploy it to GCP CloudRun

Code: [https://github.com/kkdai/gcp-test-youtuber](https://github.com/kkdai/gcp-test-youtuber)

def load_youtube_data():
try:
logging.debug("Loading YouTube data")
google_api_client = init_google_api_client()

    # Use a Channel
    youtube_loader_channel = GoogleApiYoutubeLoader(
        google_api_client=google_api_client,
        channel_name="Reducible",
        captions_language="en",
    )

    # Use Youtube Ids
    youtube_loader_ids = GoogleApiYoutubeLoader(
        google_api_client=google_api_client,
        video_ids=["TrdevFK_am4"],
        add_video_info=True,
    )

    # Load data
    logging.debug("Loading data from channel")
    channel_data = youtube_loader_channel.load()
    logging.debug("Loading data from video IDs")
    ids_data = youtube_loader_ids.load()

    logging.debug("Data loaded successfully")
    return jsonify({"channel_data": str(channel_data), "ids_data": str(ids_data)})
except Exception as e:
    logging.error(f"An error occurred: {str(e)}", exc_info=True)
    return jsonify({"error": str(e)}), 500


### Results

Video source: [https://www.youtube.com/watch?v=TrdevFK\_am4](https://www.youtube.com/watch?v=TrdevFK_am4)

![image-20241006004150943](https://www.evanlin.com/images/2022/image-20241006004150943.png)

DEV Community

LangChain on Cloud Run: Getting YouTube Info

Top comments (0)