DEV Community

loading...

How to web scrapping using C#

Leonardo Gasparini Romão
Trabalhando com programação desde 2012, desenvolvedor ASP.NET
Updated on ・2 min read

This article is a part of web scrapping series using c#:

How to web scrapping using C#
Speed up web scrapping using C#

Web scraping is a useful skill to learn for getting data. We can get data from the web, to use for Data Analysis, Data Science or even to Learn some features of Web. So what is necessary to get data from Web?

HtmlAgilityPack

HtmlAgilityPack

HtmlAgilityPack is a library to help getting information from Html pages. With this library, we can transform Html pages into strings and get some part of the entire text, using string methods or Xpath to search specific CSS classes. Here is an example to get page info using then.


Console.WriteLine("Getting page from Lord of the rings...");

//Download Html from a Url:
var HtmlRequestResult = Client.DownloadString("https://www.rottentomatoes.com/m/the_lord_of_the_rings_the_return_of_the_king");

//Load HtmlString to AgilityPack Document
var Document = new HtmlDocument();
Document.LoadHtml(HtmlRequestResult);
Console.WriteLine("Gettting data from page...");

//Get movie title, critic score and user score
var MovieTitle = Document.DocumentNode.Descendants("h1").FirstOrDefault()?
    .InnerText.Trim();
var CriticScore = Document.GetElementbyId("tomato_meter_link")?
    .InnerText.Trim();
var UserScore = Document.DocumentNode.Descendants("a")
    .FirstOrDefault(x => x.GetAttributeValue("href", "") == "#audience_reviews")?
    .InnerText.Trim();

//Show the results
Console.WriteLine(string.Format(" Title:{0} \r\n Critic Score:{1} \r\n User Score:{2}", MovieTitle, CriticScore, UserScore));

Console.WriteLine("Press any key to close the program...");
Console.ReadKey();

This is a simple example to get data from rotten tomatoes. It's common to see in the internet other examples, generally in Python using Beaultiful Soup to get data. Here, the strategy is the same as Python. The results are showed bellow and you can see the original page in this link...

Results

So.. after this, you can improve your strategy to web scrapping, but this is a good start point if you want to learn how to get data from the web.

Discussion (2)

Collapse
florianrappl profile image
Florian Rappl

Have you tried AngleSharp?
github.com/AngleSharp/AngleSharp

Collapse
lleonardogr profile image
Leonardo Gasparini Romão Author • Edited

I dit not, because HtmlAgilityPack solved all my problens about web scrapping. But this library it's an interesting open source option to consider in the future