Lets create a super-duper simple web scraper with Java! For that we will need Java, Jsoup, 5 minutes and a good mood!
Add Jsoup
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.17.2</version>
</dependency>
Create super-duper minimal scraper
In this our example we will print all links (text and URL) from a page:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class SimpleScraper {
public static void main(String[] args) throws Exception {
String url = "https://example.com"; // change this
Document doc = Jsoup.connect(url).get();
for (Element link : doc.select("a[href]")) {
System.out .println(link.text() + " -> " + link.absUrl("href"));
}
}
}
Thats it!!!! You are done! No models, No JSON, and no extra libraries!
Extra credit: If you want something spacific - change the selector.
Examples:
- Article titles: h1, h2, h3
- Product cards: .product
- Price: .price
- Any element by id: #price
Example: print all <h2> titles:
for (Element h : doc.select("h2")) {
System.out.println(h.text());
}
More extra credit: In this code, the request originates from a Java client, which many websites will identify as a bot. As a result, they may block your access. To make your scraper appear as though it is coming from a real user, include this line before .get():
.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36")
Happy Coding!!!
About the Author
Deividas Strole is a Full-Stack Developer based in California, specializing in Java, Spring Boot, React, and AI-driven development. He writes about software engineering, modern full-stack development, and digital marketing strategies.
Connect with me:
Top comments (0)