DEV Community

Leonardo Gasparini Romão
Leonardo Gasparini Romão

Posted on • Edited on

4 2

How to web scraping using C#

This article is a part of web scrapping series using c#:

How to web scrapping using C#
Speed up web scrapping using C#

Web scraping is a useful skill to learn for getting data. We can get data from the web, to use for Data Analysis, Data Science or even to Learn some features of Web. So what is necessary to get data from Web?

HtmlAgilityPack

HtmlAgilityPack

HtmlAgilityPack is a library to help getting information from Html pages. With this library, we can transform Html pages into strings and get some part of the entire text, using string methods or Xpath to search specific CSS classes. Here is an example to get page info using then.


Console.WriteLine("Getting page from Lord of the rings...");

//Download Html from a Url:
var HtmlRequestResult = Client.DownloadString("https://www.rottentomatoes.com/m/the_lord_of_the_rings_the_return_of_the_king");

//Load HtmlString to AgilityPack Document
var Document = new HtmlDocument();
Document.LoadHtml(HtmlRequestResult);
Console.WriteLine("Gettting data from page...");

//Get movie title, critic score and user score
var MovieTitle = Document.DocumentNode.Descendants("h1").FirstOrDefault()?
    .InnerText.Trim();
var CriticScore = Document.GetElementbyId("tomato_meter_link")?
    .InnerText.Trim();
var UserScore = Document.DocumentNode.Descendants("a")
    .FirstOrDefault(x => x.GetAttributeValue("href", "") == "#audience_reviews")?
    .InnerText.Trim();

//Show the results
Console.WriteLine(string.Format(" Title:{0} \r\n Critic Score:{1} \r\n User Score:{2}", MovieTitle, CriticScore, UserScore));

Console.WriteLine("Press any key to close the program...");
Console.ReadKey();

Enter fullscreen mode Exit fullscreen mode

This is a simple example to get data from rotten tomatoes. It's common to see in the internet other examples, generally in Python using Beaultiful Soup to get data. Here, the strategy is the same as Python. The results are showed bellow and you can see the original page in this link...

Results

So.. after this, you can improve your strategy to web scrapping, but this is a good start point if you want to learn how to get data from the web.

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (2)

Collapse
 
florianrappl profile image
Florian Rappl

Have you tried AngleSharp?
github.com/AngleSharp/AngleSharp

Collapse
 
lleonardogr profile image
Leonardo Gasparini Romão • Edited

I dit not, because HtmlAgilityPack solved all my problens about web scrapping. But this library it's an interesting open source option to consider in the future

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more