DEV Community

fooooo-png
fooooo-png

Posted on • Originally published at Medium

Customize News Aggregator with Web Scraping | 2020 Guide

News and information are overwhelming on the Internet. Just think of how many news feeds are updated in merely one second. What’s more, all those news are scattered across different websites and platforms. Owing to the time limitation, searching and visiting all those news that you’re interested in could be an unrealistic task.

So, what are the solutions for people to gather all the news together without repetitive and tedious browsing drudgery:

Part 1:Customizing your News Aggregator with a web scraping tool (like Octoparse).

If you want to simply browse the information, then using a News Aggregator Application is the easiest and most convenient way. However, if you want to achieve the business value of news accessible on the Internet, then a customized News Aggregator would be the best choice.

This article will dive deeply into News Aggregation, introducing its business value and how to build your own News Aggregator with Octoparse.

Part 1:What is News Aggregation?
Part 2:How does web scraping contribute to News Aggregation?
Part 3:How to create a web scraper to aggregate Financial news?

News Aggregation is a process that helps people to access the assembled news from a variety of sources in one place. Generally speaking, people may be more familiar with some other terms, like news aggregator, news reader, feed readers, RSS reader and so on. Anyway, they all work under the same principle, scraping/extracting/gathering the news and storing/placing them in a handy location, either on your own computer or in the cloud.

Further, we can easily extend News Aggregation to all kinds of Content Aggregation. With a set of content aggregators, we could access our needed information and data anytime we want.

Here are 3 examples listed in the below table:
use cases

Part 2:How does web scraping contribute to News Aggregation?

Web scraping is a technique for website data extraction. We can either create a web scraper with tools (like Octoparse), or build it from scratch by computer languages such as Python, R, and JavaScript. That said, web scraping is the core of the News Aggregation.

Ø Collect news information effectively
Ø Export the scraped data to Excel or via API directly
Ø Update to the latest news at a certain frequency

Part 3:How to create a web scraper to aggregate Financial news?

With Octoparse, everyone can create a web scraper to scrape the news sites easily without coding. As long as you finish reading the short guide below, you can do it too!

I’d love to take Yahoo sport as an example to show you how to create a sports news aggregator.
Image for post

Download Octoparse on your computer.

Go through Octoparse Scraping 101 to get familiar with how it works.

1) Start a task
Open Octoparse on your computer. Enter the URL to the box and click “Start”.
Image for post

As you click “start”, the built-in browser will pop up in a second. Just wait a moment for the web to load. In the meantime, you can find the Tips Panel below in the corner.
Image for post

Click the “Auto-detect web page data” option and Octoparse will help you auto-detect the data available on the present page.

Image for post

2) Go with auto-detection

After finishing the auto-detection process, Octoparse will tell you what data it has detected(selected in red). If that’s what you need, simply click “Create workflow” on the Tips Panel.
If that’s not what you need, you can choose “switch auto-detect results” to scrape other sets of information.

Image for post

3) Run the task

Now, you can see the workflow has been created automatically with only a few clicks. You can check the settings and do some minor revisions(if necessary) on the workflow bar according to your needs.

However, in most cases, you can simply click “Run the task” to get the data directly.

Image for post

4) Options of running

There are three options in Octoparse to run the task.

Because of the nature of news, most likely you would love to gather the updated news at intervals of a certain span of time. When you run the task, you could choose the “Schedule task” to set the starting time and the frequency of updating the data as you need.

Image for post

Image for post

Through the above steps, you just build your own sports news aggregator in Octoparse!

If you have any problem with creating a news aggregator, please feel free to contact us at support@octoparse.com.
Nowadays, the capacity to seize the value of data is more and more important for career development. Building your own web scraper, you can get customized information as you need. Furthermore, news aggregation with Octoparse gives you a head start as it always keeps abreast of the latest news.

Try Octoparse for FREE to start your News aggregation project!

Author:Erika

Top comments (0)