To scrape links from a webpage using PHP, you can use the file_get_contents function to fetch the HTML content and then parse it using the DOMDocument class. Here's a simple example: Site : SportsFire
<?php
// Function to scrape links from a given URL
function scrapeLinks($url) {
// Get the HTML content of the webpage
$html = file_get_contents($url);
// Create a new DOMDocument instance
$dom = new DOMDocument();
// Suppress errors due to malformed HTML
libxml_use_internal_errors(true);
// Load the HTML content
$dom->loadHTML($html);
// Clear the errors
libxml_clear_errors();
// Create an array to hold the links
$links = [];
// Get all <a> elements
$anchors = $dom->getElementsByTagName('a');
// Loop through the anchors and collect the href attributes
foreach ($anchors as $anchor) {
$href = $anchor->getAttribute('href');
// Add the link to the array if it's not empty
if (!empty($href)) {
$links[] = $href;
}
}
return $links;
}
// Example usage
$url = 'https://www.example.com'; // Change this to the URL you want to scrape
$links = scrapeLinks($url);
// Print the scraped links
foreach ($links as $link) {
echo $link . PHP_EOL;
}
?>
Top comments (1)
Make this useful by using recursion and letting it crawl every link it finds too.
Of course, from there it gets to be more fun, as you then have to watch memory limits, write the returned data to files, etc etc, plus you can parse out useful information like seo tags and meta/open graph data.
crawling is fun, until you get blocked :)