Super Simple Web Scraping in Java (Jsoup)

#java #webscraping #intranet #webdev

Lets create a super-duper simple web scraper with Java! For that we will need Java, Jsoup, 5 minutes and a good mood!

Add Jsoup

<dependency> 
    <groupId>org.jsoup</groupId> 
    <artifactId>jsoup</artifactId> 
    <version>1.17.2</version> 
</dependency>

Create super-duper minimal scraper

In this our example we will print all links (text and URL) from a page:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class SimpleScraper {
    public static void main(String[] args) throws Exception {
        String url = "https://example.com"; // change this 

        Document doc = Jsoup.connect(url).get(); 

        for (Element link : doc.select("a[href]")) {            
            System.out .println(link.text() + " -> " + link.absUrl("href")); 
        }
    }
}

Thats it!!!! You are done! No models, No JSON, and no extra libraries!

Extra credit: If you want something spacific - change the selector.

Examples:

Article titles: h1, h2, h3
Product cards: .product
Price: .price
Any element by id: #price

Example: print all <h2> titles:

for (Element h : doc.select("h2")) {
  System.out.println(h.text());
}

More extra credit: In this code, the request originates from a Java client, which many websites will identify as a bot. As a result, they may block your access. To make your scraper appear as though it is coming from a real user, include this line before .get():

.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36")

Happy Coding!!!

About the Author

Deividas Strole is a Full-Stack Developer based in California, specializing in Java, Spring Boot, React, and AI-driven development. He writes about software engineering, modern full-stack development, and digital marketing strategies.

Connect with me:

DEV Community

Super Simple Web Scraping in Java (Jsoup)

About the Author

Top comments (0)