DEV Community

Cover image for Scrape Naver Related Search Results with Python
Dmitriy Zub ☀️
Dmitriy Zub ☀️

Posted on • Originally published at serpapi.com

3 2

Scrape Naver Related Search Results with Python

What will be scraped

image

Prerequisites

Basic knowledge scraping with CSS selectors

If you haven't scraped with CSS selectors, there's a dedicated blog post of mine about how to use CSS selectors when web-scraping that covers what it is, pros and cons, and why they're matter from a web-scraping perspective.

CSS selectors declare which part of the markup a style applies to thus allowing to extract data from matching tags and attributes.

Separate virtual environment

If you didn't work with a virtual environment before, have a look at the dedicated Python virtual environments tutorial using Virtualenv and Poetry blog post of mine to get familiar.

In short, it's a thing that creates an independent set of installed libraries including different Python versions that can coexist with each other at the same system thus preventing libraries or Python version conflicts.

📌Note: this is not a strict requirement for this blog post.

Install libraries:

pip install requests, parsel
Enter fullscreen mode Exit fullscreen mode

Reduce the chance of being blocked

There's a chance that a request might be blocked. Have a look at how to reduce the chance of being blocked while web-scraping, there are eleven methods to bypass blocks from most websites.


Full Code

import requests, json
from parsel import Selector  # https://parsel.readthedocs.io/

# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
    "query": "minecraft",  # search query
    "where": "web"         # web results. works with nexearch as well
}

# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36",
}

html = requests.get("https://search.naver.com/search.naver", params=params, headers=headers, timeout=30)
selector = Selector(html.text)

related_results = []

# https://www.programiz.com/python-programming/methods/built-in/enumerate
for index, related_result in enumerate(selector.css(".related_srch .keyword"), start=1):
    keyword = related_result.css(".tit::text").get().strip()
    link = f'https://search.naver.com/search.naver{related_result.css("a::attr(href)").get()}'

    related_results.append({
        "position": index,    # 1,2,3..
        "title": keyword,
        "link": link
    })


print(json.dumps(related_results, indent=2, ensure_ascii=False))
Enter fullscreen mode Exit fullscreen mode

Output:

[
  {
    "position": 1,
    "title": "마인크래프트",
    "link": "https://search.naver.com/search.naver?where=nexearch&query=%EB%A7%88%EC%9D%B8%ED%81%AC%EB%9E%98%ED%94%84%ED%8A%B8&ie=utf8&sm=tab_she&qdt=0"
  },
  {
    "position": 2,
    "title": "minecraft 뜻",
    "link": "https://search.naver.com/search.naver?where=nexearch&query=minecraft+%EB%9C%BB&ie=utf8&sm=tab_she&qdt=0"
  },
  {
    "position": 3,
    "title": "craft",
    "link": "https://search.naver.com/search.naver?where=nexearch&query=craft&ie=utf8&sm=tab_she&qdt=0"
  },
  {
    "position": 4,
    "title": "mine",
    "link": "https://search.naver.com/search.naver?where=nexearch&query=mine&ie=utf8&sm=tab_she&qdt=0"
  },
  {
    "position": 5,
    "title": "mojang",
    "link": "https://search.naver.com/search.naver?where=nexearch&query=mojang&ie=utf8&sm=tab_she&qdt=0"
  }
]
Enter fullscreen mode Exit fullscreen mode

Alternatively, you can achieve the same by using Naver Related results API from SerpApi. It is a paid API with a free plan.

It's almost the same, except you don't need to create the parser from scratch, maintain it, how to bypass blocks from Naver or other search engines, figure out which proxy/CAPTCHA providers is reliable, how to scale it.

from serpapi import NaverSearch
import os, json

params = {
    # https://docs.python.org/3/library/os.html#os.getenv
    "api_key": os.getenv("API_KEY"),  # your serpapi api key
    "engine": "naver",                # search engine to parse results from
    "query": "minecraft",             # search query
    "where": "web"                    # web results
}

search = NaverSearch(params)          # where data extraction happens
results = search.get_dict()           # JSON -> Python dictionary

related_results = []

# iterate over "related_results" and extract position, title and link
for related_result in results["related_results"]:
    related_results.append({
        "position": related_result["position"],
        "title": related_result["title"],
        "link": related_result["link"]
    })

print(json.dumps(related_results, indent=2, ensure_ascii=False))
Enter fullscreen mode Exit fullscreen mode

Output:

[
  {
    "position": 1,
    "title": "마인크래프트",
    "link": "https://search.naver.com?where=nexearch&query=%EB%A7%88%EC%9D%B8%ED%81%AC%EB%9E%98%ED%94%84%ED%8A%B8&ie=utf8&sm=tab_she&qdt=0"
  },
  {
    "position": 2,
    "title": "minecraft 뜻",
    "link": "https://search.naver.com?where=nexearch&query=minecraft+%EB%9C%BB&ie=utf8&sm=tab_she&qdt=0"
  },
  {
    "position": 3,
    "title": "craft",
    "link": "https://search.naver.com?where=nexearch&query=craft&ie=utf8&sm=tab_she&qdt=0"
  },
  {
    "position": 4,
    "title": "mine",
    "link": "https://search.naver.com?where=nexearch&query=mine&ie=utf8&sm=tab_she&qdt=0"
  },
  {
    "position": 5,
    "title": "mojang",
    "link": "https://search.naver.com?where=nexearch&query=mojang&ie=utf8&sm=tab_she&qdt=0"
  }
]
Enter fullscreen mode Exit fullscreen mode


Outro

If you have anything to share, any questions, suggestions, or something that isn't working correctly, reach out via Twitter at @dimitryzub, or @serp_api.

Yours,
Dmitriy, and the rest of SerpApi Team.


Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more →

Top comments (0)

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay