DEV Community

Cover image for Super Simple Web Scraping in Java (Jsoup)
Deividas Strole
Deividas Strole

Posted on

Super Simple Web Scraping in Java (Jsoup)

Lets create a super-duper simple web scraper with Java! For that we will need Java, Jsoup, 5 minutes and a good mood!

Add Jsoup

<dependency> 
    <groupId>org.jsoup</groupId> 
    <artifactId>jsoup</artifactId> 
    <version>1.17.2</version> 
</dependency>
Enter fullscreen mode Exit fullscreen mode

Create super-duper minimal scraper

In this our example we will print all links (text and URL) from a page:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class SimpleScraper {
    public static void main(String[] args) throws Exception {
        String url = "https://example.com"; // change this 

        Document doc = Jsoup.connect(url).get(); 

        for (Element link : doc.select("a[href]")) {            
            System.out .println(link.text() + " -> " + link.absUrl("href")); 
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Thats it!!!! You are done! No models, No JSON, and no extra libraries!

Extra credit: If you want something spacific - change the selector.

Examples:

  • Article titles: h1, h2, h3
  • Product cards: .product
  • Price: .price
  • Any element by id: #price

Example: print all <h2> titles:

for (Element h : doc.select("h2")) {
  System.out.println(h.text());
}
Enter fullscreen mode Exit fullscreen mode

More extra credit: In this code, the request originates from a Java client, which many websites will identify as a bot. As a result, they may block your access. To make your scraper appear as though it is coming from a real user, include this line before .get():

.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36")
Enter fullscreen mode Exit fullscreen mode

Happy Coding!!!

About the Author

Deividas Strole is a Full-Stack Developer based in California, specializing in Java, Spring Boot, React, and AI-driven development. He writes about software engineering, modern full-stack development, and digital marketing strategies.

Connect with me:

Top comments (0)