DEV Community

Marvelous Akpotu
Marvelous Akpotu

Posted on

Scraping a Website Using a Symfony Console Command (Clean & Production-Friendly)

Web scraping doesn’t belong in controllers.

  • It’s long-running.
  • It may fail.
  • It’s often scheduled.
  • It’s automation.

That’s exactly why Symfony Console Commands are perfect for it.

Symfony Console Commands allow you to create custom CLI tasks that run inside your Symfony application with full access to its services and dependency injection container. They are ideal for the following:

  • Background jobs
  • Automation
  • Data processing
  • Any long-running operations that shouldn’t live inside controllers.

In this article, we’ll:

  • Scrape country data
  • Parse HTML with DomCrawler
  • Sort results
  • Display a clean CLI table

This is the GitHub repo to follow along: https://github.com/Marvelxy/symfony-web-scraper

We’ll use a free scraping sandbox from www.scrapethissite.com :

https://www.scrapethissite.com/pages/simple/

Required Packages

If you don't already have them:

composer require symfony/http-client
composer require symfony/dom-crawler
composer require symfony/css-selector
Enter fullscreen mode Exit fullscreen mode

Create the Command

php bin/console make:command app:create-countries
Enter fullscreen mode Exit fullscreen mode

Now, let’s focus only on the parts that matter.

Inject the HttpClient

Instead of manually creating a client, inject it:

use Symfony\Contracts\HttpClient\HttpClientInterface;

public function __construct(
    private HttpClientInterface $client
) {}
Enter fullscreen mode Exit fullscreen mode

Clean, testable, and native Symfony DI.

Fetch the Page

$response = $this->client->request('GET', self::URL);
$html = $response->getContent();
Enter fullscreen mode Exit fullscreen mode

This gives us the raw HTML.

Parse with DomCrawler

use Symfony\Component\DomCrawler\Crawler;

$crawler = new Crawler($html);
Enter fullscreen mode Exit fullscreen mode

Now, we can extract the block of each country:

$countryInfo = [];

$crawler->filter('.country')->each(function (Crawler $row) use (&$countryInfo) {
    $countryInfo[] = [
        $row->filter('.country-name')->text(),
        $row->filter('.country-capital')->text(),
        $row->filter('.country-population')->text(),
        $row->filter('.country-area')->text(),
    ];
});
Enter fullscreen mode Exit fullscreen mode

If you’ve used JavaScript’s querySelectorAll(), this will feel familiar.

Sort the countries alphabetically

usort($countryInfo, function ($a, $b) {
    return strcasecmp($a[0], $b[0]);
});
Enter fullscreen mode Exit fullscreen mode

Format Output Like a Pro

Instead of dumping raw arrays, format the output properly in a table

printf(
    "%-45s | %-20s | %15s | %15s\n",
    "Country name",
    "Capital",
    "Population",
    "Area (km2)"
);
Enter fullscreen mode Exit fullscreen mode

Then loop through the results and print them.

You can also add a small multibyte-safe padding helper to ensure alignment works even with Unicode characters.

The result looks like a professional terminal table instead of debug output.

Run It

php bin/console app:create-countries
Enter fullscreen mode Exit fullscreen mode

You should see an output similar to this:

Command output in table format

Why Use a Symfony Command for Scraping?

Because it gives you:

  • Separation of concerns
  • Cron scheduling capability
  • Clean architecture
  • Reusability
  • Easy refactoring into async jobs

This is how scraping should be structured in real applications, not inside controllers.

Production Tips

Before scraping real websites:

  • Check Terms of Service
  • Respect robots.txt
  • Avoid aggressive request rates
  • Add delays if scraping multiple pages. Example:
sleep(1);
Enter fullscreen mode Exit fullscreen mode

Full Source Code

I’ve published the complete working project on GitHub, including:

  • The full command implementation
  • Proper formatting helpers
  • Setup instructions

GitHub repo: https://github.com/Marvelxy/symfony-web-scraper

Final Thoughts

Symfony + Console Commands + DomCrawler is an underrated combination.

If you’re building:

  • Data aggregation tools
  • Monitoring systems
  • Intelligence platforms
  • Background automation jobs

This pattern scales cleanly and keeps your application architecture solid.

In part 2, I will store the result in the database.

Top comments (0)