DEV Community

Cover image for PHP Profiling: From Problem to Solution (with Metrics)
Oleksandr Vasyliev
Oleksandr Vasyliev

Posted on • Edited on

PHP Profiling: From Problem to Solution (with Metrics)

Initial Scenario

We have a script that processes 200,000 email addresses.
It was expected to run fast, but the measurement showed 12.8 seconds.
We need to identify what slows it down and speed it up.

Step 1. Capture a Profile

Xdebug config (CLI):

zend_extension=xdebug
xdebug.mode=profile
xdebug.output_dir=/tmp/xdebug
xdebug.start_with_request=yes
Enter fullscreen mode Exit fullscreen mode

Run:

php import_emails.php
Enter fullscreen mode Exit fullscreen mode

A file cachegrind.out.* appears in /tmp/xdebug.
Open it with KCachegrind (Linux) or QCachegrind (macOS/Windows).

Step 2. Read the Profile

The profile shows:

  • array_unique42% Self time.
  • filter_var35% Inclusive time.
  • Lots of unnecessary string allocations due to repeated transformations.

Step 3. Optimize Step‑by‑Step

PHP Email Import Optimization — Before vs After

1) Remove array_unique

Before (expensive: stores all emails, then deduplicates):

$emails = [];

foreach ($rows as $row) {
    $email = $row['email'];
    // ... other logic ...
    $emails[] = $email; // duplicates pile up
}

$emails = array_unique($emails); // heavy on time & memory
Enter fullscreen mode Exit fullscreen mode

After (instant dedup with a set-like map):

$set = [];

foreach ($rows as $row) {
    $email = $row['email'];
    if (!isset($set[$email])) { $set[$email] = true; }
}
Enter fullscreen mode Exit fullscreen mode

Metric: 12.8 → 8.5 s, memory: 78 → 56 MB.

2) Fewer string transformations

Before (repeated/extra transformations):

foreach ($rows as $row) {
    $email = $row['email'] ?? '';
    $email = trim($email);
    $email = strtolower($email);

    // later again (duplicated work):
    $email = trim($email);
    $email = strtolower($email);
    if ($email === '') continue;
}
Enter fullscreen mode Exit fullscreen mode

After (do it once, in the right order):

foreach ($rows as $row) {
    $email = trim($row['email'] ?? '');
    if ($email === '') continue;
    $email = strtolower($email);
    // proceed with $email
}
Enter fullscreen mode Exit fullscreen mode

Metric: 8.5 → 5.4 s, memory: 56 → 42 MB.

3) Lightweight email validation (optional)

Before (heavier checks like full RFC/IDN):

foreach ($rows as $row) {
    $email = strtolower(trim($row['email'] ?? ''));
    if ($email === '') continue;

    if (!filter_var($email, FILTER_VALIDATE_EMAIL)) continue;
    // ...
}
Enter fullscreen mode Exit fullscreen mode

After (simple format check):

foreach ($rows as $row) {
    $email = strtolower(trim($row['email'] ?? ''));
    if ($email === '') continue;

    if (!preg_match('/^[^\s@]+@[^\s@]+\.[^\s@]+$/', $email)) continue;
    // ...
}
Enter fullscreen mode Exit fullscreen mode

Metric: 5.4 → 2.1 s, memory: 42 → 40 MB.

Keep filter_var/IDN if you need strict RFC compliance or internationalized domains.

Results

Step Time Memory Gain
Before 12.8s 78 MB
After step 1 8.5s 56 MB −4.3s
After step 2 5.4s 42 MB −3.1s
After step 3 2.1s 40 MB −3.3s

Why This Matters

Without profiling we’d waste time optimizing blindly.
With the profile we see exactly where resources are spent and fix only that.
Result — 6× faster and 2× less memory.
(the exact numbers may vary depending on your machine)

Five Golden Rules of Profiling

  1. Measure, don’t speculate.
  2. Compare apples to apples (same data, same setup).
  3. Hunt down the biggest time-wasters first.
  4. Fix one thing at a time.
  5. Trust metrics, not intuition.

Repository

All the code examples used in this article (before/after scripts) are available here: https://github.com/phpner/php-profiling-example

Conclusion

Profiling gives a clear picture of what slows your code.

Top comments (2)

Collapse
 
xwero profile image
david duymelinck

What made you think the script was fast?
The first thing that needs to happen is to remove data you don't need. That are the two first improvements.

With the set like code you assume the first entry is the most accurate. What if it is the last entry?

While i think using metrics is a good way to look for performance gains. Common programming patterns help you without profiling.

Collapse
 
oleksandr_vasyliev profile image
Oleksandr Vasyliev

You’re right cleaning data and using common patterns is the first step.
In this article I kept the dataset raw to focus on profiling.
Patterns help in general, but profiling shows where they give the biggest impact.