DEV Community

Cover image for Scrape Google Inline Videos with Python
Dimitry Zub
Dimitry Zub

Posted on • Updated on

Scrape Google Inline Videos with Python

Contents: intro, imports, what will be scraped, process, code, links, outro.

Intro

This blog post is a continuation of Google's web scraping series. Here you'll see examples of how you can scrape Inline Videos from Google Search using Python using beautifulsoup, requests and lxml libraries. An alternative API solution will be shown.

Imports

import requests, lxml
from bs4 import BeautifulSoup
from serpapi import GoogleSearch
Enter fullscreen mode Exit fullscreen mode

What will be scraped

image

Process

Selecting Container. Link lays directly in the container under href attribute.

Selecting Title, Channel name, Platform, Date, Duration CSS selectors.

Code

import requests, lxml
from bs4 import BeautifulSoup

headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

response = requests.get("https://www.google.com/search?q=the last of us 2 reviews", headers=headers)
soup = BeautifulSoup(response.text, 'lxml')

for result in soup.select('.WpKAof'):
    title = result.select_one('.p5AXld').text
    link = result['href']
    channel = result.select_one('.YnLDzf').text.replace(' · ', '')
    video_platform = result.select_one('.hDeAhf').text
    date = result.select_one('.rjmdhd span').text
    duration = result.select_one('.MyDQSe span').text
    print(f'{title}\n{link}\n{video_platform}\n{channel}\n{date}\n{duration}\n')

---------------
'''
The Last of Us 2 Review
https://www.youtube.com/watch?v=QwreMeXlFoY
YouTube
IGN
Jun 12, 2020
8:01
'''
Enter fullscreen mode Exit fullscreen mode

Using Google Inline Videos API

SerpApi is a paid API that provides a free trial of 5,000 searches.

The main differences is you don't have to maintain the parser, e.g. if layout/selectors is changed there's no need for debugging since it already done for the end-user, because at times it could annoying...

import json # used for pretty print output
from serpapi import GoogleSearch

params = {
  "api_key": "YOUR_API_KEY",
  "engine": "google",
  "q": "the last of us 2 review",
  "gl": "us",
  "hl": "en"
}

search = GoogleSearch(params)
results = search.get_dict()

for results in results['inline_videos']:
    print(json.dumps(results, indent=2, ensure_ascii=False))

--------------------
'''
{
  "position": 1,
  "title": "The Last of Us 2 Review",
  "link": "https://www.youtube.com/watch?v=QwreMeXlFoY",
  "thumbnail": "https://serpapi.com/searches/60e144a7d737d7a357e568fc/images/b8492386da38ba88cc43d7cb6b9076998ce8d724281cad47c9ee2d1516f61052.jpeg",
  "channel": "IGN",
  "duration": "8:01",
  "platform": "YouTube",
  "date": "Jun 12, 2020"
}
...
'''
Enter fullscreen mode Exit fullscreen mode

Links

Code in the online IDEGoogle Inline Videos API

Outro

If you have any questions or something isn't working correctly or you want to write something else, feel free to drop a comment in the comment section or via Twitter at @serp_api.

Yours,
Dimitry, and the rest of SerpApi Team.

Discussion (0)