Initial Scenario
We have a script that processes 200,000 email addresses.
It was expected to run fast, but the measurement showed 12.8 seconds.
We need to identify what slows it down and speed it up.
Step 1. Capture a Profile
Xdebug config (CLI):
zend_extension=xdebug
xdebug.mode=profile
xdebug.output_dir=/tmp/xdebug
xdebug.start_with_request=yes
Run:
php import_emails.php
A file cachegrind.out.*
appears in /tmp/xdebug
.
Open it with KCachegrind (Linux) or QCachegrind (macOS/Windows).
Step 2. Read the Profile
The profile shows:
-
array_unique
— 42% Self time. -
filter_var
— 35% Inclusive time. - Lots of unnecessary string allocations due to repeated transformations.
Step 3. Optimize Step‑by‑Step
PHP Email Import Optimization — Before vs After
1) Remove array_unique
Before (expensive: stores all emails, then deduplicates):
$emails = [];
foreach ($rows as $row) {
$email = $row['email'];
// ... other logic ...
$emails[] = $email; // duplicates pile up
}
$emails = array_unique($emails); // heavy on time & memory
After (instant dedup with a set-like map):
$set = [];
foreach ($rows as $row) {
$email = $row['email'];
if (!isset($set[$email])) { $set[$email] = true; }
}
Metric: 12.8 → 8.5 s, memory: 78 → 56 MB.
2) Fewer string transformations
Before (repeated/extra transformations):
foreach ($rows as $row) {
$email = $row['email'] ?? '';
$email = trim($email);
$email = strtolower($email);
// later again (duplicated work):
$email = trim($email);
$email = strtolower($email);
if ($email === '') continue;
}
After (do it once, in the right order):
foreach ($rows as $row) {
$email = trim($row['email'] ?? '');
if ($email === '') continue;
$email = strtolower($email);
// proceed with $email
}
Metric: 8.5 → 5.4 s, memory: 56 → 42 MB.
3) Lightweight email validation (optional)
Before (heavier checks like full RFC/IDN):
foreach ($rows as $row) {
$email = strtolower(trim($row['email'] ?? ''));
if ($email === '') continue;
if (!filter_var($email, FILTER_VALIDATE_EMAIL)) continue;
// ...
}
After (simple format check):
foreach ($rows as $row) {
$email = strtolower(trim($row['email'] ?? ''));
if ($email === '') continue;
if (!preg_match('/^[^\s@]+@[^\s@]+\.[^\s@]+$/', $email)) continue;
// ...
}
Metric: 5.4 → 2.1 s, memory: 42 → 40 MB.
Keep
filter_var
/IDN if you need strict RFC compliance or internationalized domains.
Results
Step | Time | Memory | Gain |
---|---|---|---|
Before | 12.8s | 78 MB | — |
After step 1 | 8.5s | 56 MB | −4.3s |
After step 2 | 5.4s | 42 MB | −3.1s |
After step 3 | 2.1s | 40 MB | −3.3s |
Why This Matters
Without profiling we’d waste time optimizing blindly.
With the profile we see exactly where resources are spent and fix only that.
Result — 6× faster and 2× less memory.
(the exact numbers may vary depending on your machine)
Five Golden Rules of Profiling
- Measure, don’t speculate.
- Compare apples to apples (same data, same setup).
- Hunt down the biggest time-wasters first.
- Fix one thing at a time.
- Trust metrics, not intuition.
Repository
All the code examples used in this article (before/after scripts) are available here: https://github.com/phpner/php-profiling-example
Conclusion
Profiling gives a clear picture of what slows your code.
Top comments (2)
What made you think the script was fast?
The first thing that needs to happen is to remove data you don't need. That are the two first improvements.
With the set like code you assume the first entry is the most accurate. What if it is the last entry?
While i think using metrics is a good way to look for performance gains. Common programming patterns help you without profiling.
You’re right cleaning data and using common patterns is the first step.
In this article I kept the dataset raw to focus on profiling.
Patterns help in general, but profiling shows where they give the biggest impact.