DEV Community

Cover image for Scrape Google Inline Videos with Python
Dmitriy Zub ☀️
Dmitriy Zub ☀️

Posted on • Edited on

1 1

Scrape Google Inline Videos with Python

Contents: intro, imports, what will be scraped, process, code, links, outro.

Intro

This blog post is a continuation of Google's web scraping series. Here you'll see examples of how you can scrape Inline Videos from Google Search using Python using beautifulsoup, requests and lxml libraries. An alternative API solution will be shown.

Imports

import requests, lxml
from bs4 import BeautifulSoup
from serpapi import GoogleSearch
Enter fullscreen mode Exit fullscreen mode

What will be scraped

image

Process

Selecting Container. Link lays directly in the container under href attribute.

Selecting Title, Channel name, Platform, Date, Duration CSS selectors.

Code

import requests, lxml
from bs4 import BeautifulSoup

headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

response = requests.get("https://www.google.com/search?q=the last of us 2 reviews", headers=headers)
soup = BeautifulSoup(response.text, 'lxml')

for result in soup.select('.WpKAof'):
    title = result.select_one('.p5AXld').text
    link = result['href']
    channel = result.select_one('.YnLDzf').text.replace(' · ', '')
    video_platform = result.select_one('.hDeAhf').text
    date = result.select_one('.rjmdhd span').text
    duration = result.select_one('.MyDQSe span').text
    print(f'{title}\n{link}\n{video_platform}\n{channel}\n{date}\n{duration}\n')

---------------
'''
The Last of Us 2 Review
https://www.youtube.com/watch?v=QwreMeXlFoY
YouTube
IGN
Jun 12, 2020
8:01
'''
Enter fullscreen mode Exit fullscreen mode

Using Google Inline Videos API

SerpApi is a paid API that provides a free trial of 5,000 searches.

The main differences is you don't have to maintain the parser, e.g. if layout/selectors is changed there's no need for debugging since it already done for the end-user, because at times it could annoying...


import json # used for pretty print output
from serpapi import GoogleSearch

params = {
  "api_key": "YOUR_API_KEY",
  "engine": "google",
  "q": "the last of us 2 review",
  "gl": "us",
  "hl": "en"
}

search = GoogleSearch(params)
results = search.get_dict()

for results in results['inline_videos']:
    print(json.dumps(results, indent=2, ensure_ascii=False))

--------------------
'''
{
  "position": 1,
  "title": "The Last of Us 2 Review",
  "link": "https://www.youtube.com/watch?v=QwreMeXlFoY",
  "thumbnail": "https://serpapi.com/searches/60e144a7d737d7a357e568fc/images/b8492386da38ba88cc43d7cb6b9076998ce8d724281cad47c9ee2d1516f61052.jpeg",
  "channel": "IGN",
  "duration": "8:01",
  "platform": "YouTube",
  "date": "Jun 12, 2020"
}
...
'''
Enter fullscreen mode Exit fullscreen mode

Links

Code in the online IDEGoogle Inline Videos API

Outro

If you have any questions or something isn't working correctly or you want to write something else, feel free to drop a comment in the comment section or via Twitter at @serp_api.

Yours,
Dimitry, and the rest of SerpApi Team.

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

The Most Contextual AI Development Assistant

Pieces.app image

Our centralized storage agent works on-device, unifying various developer tools to proactively capture and enrich useful materials, streamline collaboration, and solve complex problems through a contextual understanding of your unique workflow.

👥 Ideal for solo developers, teams, and cross-company projects

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay