PHP Profiling: From Problem to Solution (with Metrics)

Initial Scenario

We have a script that processes 200,000 email addresses.
It was expected to run fast, but the measurement showed 12.8 seconds.
We need to identify what slows it down and speed it up.

Step 1. Capture a Profile

Xdebug config (CLI):

zend_extension=xdebug
xdebug.mode=profile
xdebug.output_dir=/tmp/xdebug
xdebug.start_with_request=yes

Run:

php import_emails.php

A file cachegrind.out.* appears in /tmp/xdebug.
Open it with KCachegrind (Linux) or QCachegrind (macOS/Windows).

Step 2. Read the Profile

The profile shows:

array_unique — 42% Self time.
filter_var — 35% Inclusive time.
Lots of unnecessary string allocations due to repeated transformations.

Step 3. Optimize Step‑by‑Step

PHP Email Import Optimization — Before vs After

1) Remove `array_unique`

Before (expensive: stores all emails, then deduplicates):

$emails = [];

foreach ($rows as $row) {
    $email = $row['email'];
    // ... other logic ...
    $emails[] = $email; // duplicates pile up
}

$emails = array_unique($emails); // heavy on time & memory

After (instant dedup with a set-like map):

$set = [];

foreach ($rows as $row) {
    $email = $row['email'];
    if (!isset($set[$email])) { $set[$email] = true; }
}

Metric: 12.8 → 8.5 s, memory: 78 → 56 MB.

2) Fewer string transformations

Before (repeated/extra transformations):

foreach ($rows as $row) {
    $email = $row['email'] ?? '';
    $email = trim($email);
    $email = strtolower($email);

    // later again (duplicated work):
    $email = trim($email);
    $email = strtolower($email);
    if ($email === '') continue;
}

After (do it once, in the right order):

foreach ($rows as $row) {
    $email = trim($row['email'] ?? '');
    if ($email === '') continue;
    $email = strtolower($email);
    // proceed with $email
}

Metric: 8.5 → 5.4 s, memory: 56 → 42 MB.

3) Lightweight email validation (optional)

Before (heavier checks like full RFC/IDN):

foreach ($rows as $row) {
    $email = strtolower(trim($row['email'] ?? ''));
    if ($email === '') continue;

    if (!filter_var($email, FILTER_VALIDATE_EMAIL)) continue;
    // ...
}

After (simple format check):

foreach ($rows as $row) {
    $email = strtolower(trim($row['email'] ?? ''));
    if ($email === '') continue;

    if (!preg_match('/^[^\s@]+@[^\s@]+\.[^\s@]+$/', $email)) continue;
    // ...
}

Metric: 5.4 → 2.1 s, memory: 42 → 40 MB.

Keep filter_var/IDN if you need strict RFC compliance or internationalized domains.

Results

Step	Time	Memory	Gain
Before	12.8s	78 MB	—
After step 1	8.5s	56 MB	−4.3s
After step 2	5.4s	42 MB	−3.1s
After step 3	2.1s	40 MB	−3.3s

Why This Matters

Without profiling we’d waste time optimizing blindly.
With the profile we see exactly where resources are spent and fix only that.
Result — 6× faster and 2× less memory.
(the exact numbers may vary depending on your machine)

Five Golden Rules of Profiling

Measure, don’t speculate.
Compare apples to apples (same data, same setup).
Hunt down the biggest time-wasters first.
Fix one thing at a time.
Trust metrics, not intuition.

Repository

All the code examples used in this article (before/after scripts) are available here: https://github.com/phpner/php-profiling-example

Conclusion

Profiling gives a clear picture of what slows your code.

Top comments (2)

david duymelinck • Aug 16

What made you think the script was fast?
The first thing that needs to happen is to remove data you don't need. That are the two first improvements.

With the set like code you assume the first entry is the most accurate. What if it is the last entry?

While i think using metrics is a good way to look for performance gains. Common programming patterns help you without profiling.

Oleksandr Vasyliev • Aug 16

You’re right cleaning data and using common patterns is the first step.
In this article I kept the dataset raw to focus on profiling.
Patterns help in general, but profiling shows where they give the biggest impact.