Designing Explicit Data Contracts
#blogPostAsWebApp: https://voku.github.io/PHPArrayBox/
Rule of thumb:
If your array-shape PHPDoc needs line breaks, comments, or nested generics — congratulations, you’ve just designed an object. You just didn’t have the courage to admit it.
Introduction: The $data Anti-Pattern Nobody Wants to Own
Every PHP codebase has it.
function handle(array $data): void
{
// 🤞 good luck
}
It starts harmlessly. A quick prototype. A deadline. “We’ll refactor it later.”
Six months later, $data is a structural landfill:
- undocumented keys
- optional flags with magic meaning
- half the validation duplicated across the codebase
- and a PHPStan array-shape that looks like a legal contract
If your system relies on arrays to model domain concepts, you don’t have flexibility — you have structural debt.
Background: Why PHP Developers Fell in Love with Arrays
Let’s be fair. PHP trained us this way.
Historically:
- No enums
- Weak typing
- No readonly properties
- Poor static analysis
Frameworks pushing “just pass arrays”
Arrays were:
- easy
- fast to write
- impossible to reason about long-term
That excuse died at least with PHP 8.x.
Today we have:
- readonly value objects
- enums
- promoted constructors
- strict typing
- PHPStan that actually enforces contracts
Continuing to model domain data as arrays in 2026 isn’t pragmatic — it’s lazy.
Core Argument: Arrays Are Not Data Contracts
An array is a container.
A data contract is a promise.
Arrays:
- allow invalid states
- encode meaning implicitly
- rely on comments and discipline
Objects:
- enforce invariants
- encode intent
- fail loudly and early
If a function expects specific data, then that data deserves a name, a type, and rules.
The Red Flag: When Array Shapes Go Wild 🚨
This is where things usually go off the rails:
/**
* @param array{
* id: int,
* email: non-empty-string,
* status: 'active'|'inactive',
* profile?: array{
* firstName: string,
* lastName: string,
* age?: int<0, 120>
* },
* meta?: array<string, scalar>
* } $user
*/
function processUser(array $user): void
{
}
Let’s be honest:
- This is not flexible
- This is not readable
- This is not reusable
This is an object that is cosplaying as an array.
Hard Rule (Write This on a Sticky Note)
If your array-shape PHPDoc needs more than ~3 keys, you should stop and create an object.
Array-shapes are a transitional tool, not a destination.
The Correct Alternative: Explicit Data Contracts
Step 1: Name the Concept
If the data has meaning, give it a name.
final readonly class UserProfile
{
public function __construct(
public string $firstName,
public string $lastName,
public ?int $age,
) {}
}
enum UserStatus: string
{
case Active = 'active';
case Inactive = 'inactive';
}
final readonly class User
{
public function __construct(
public int $id,
public string $email,
public UserStatus $status,
public ?UserProfile $profile,
public array $meta = [],
) {}
}
Now compare usage:
function processUser(User $user): void
{
if ($user->status === UserStatus::Inactive) {
return;
}
// Autocomplete, refactor-safe, readable
}
No guessing. No comments. No defensive isset() soup.
Static Analysis: From Afterthought to Design Tool
PHPStan isn’t just a linter — it’s a design feedback loop.
With explicit objects:
- invalid states become unrepresentable
- missing fields fail at construction time
- refactors are mechanical, not archaeological
This is the difference between:
- hoping your code is correct
- and knowing it can’t be wrong
If PHPStan complains early, your production incidents complain less.
Real-World Refactoring Strategy (Without Burning the Team)
“No greenfield refactors” is a fair rule. Here’s how you do this safely:
- Keep Arrays at the Boundaries
- HTTP requests
- JSON decoding
database rows
Convert Once
$user = User::fromArray($requestData);
- Never Pass Raw Arrays Deeper
After the boundary, objects only.
This creates an anti-corruption layer:
- legacy stays contained
- new code stays clean
- refactoring becomes incremental
Common Pushback (And Why It’s Wrong)
“This is too verbose”
Correct.
So are seatbelts.
“Arrays are more flexible”
They are less explicit, not more flexible.
“This slows us down”
Only until the second bug you don’t have to debug.
“We trust our developers”
Then give them tools that enforce correctness instead of relying on memory.
When Arrays Are Actually Fine (Yes, Really)
Arrays still have a place:
- serialization formats
- infrastructure boundaries
- simple lists (list)
- performance-critical internal transformations (measured!)
Rule:
Arrays describe structure.
Objects describe meaning.
Mixing the two is how systems rot.
Best Practices Summary
❌ array $data is not an API
❌ Massive array-shapes are a code smell
✅ Name your data
✅ Use readonly value objects
✅ Enums over strings
✅ Convert arrays once, at the boundary
✅ Let PHPStan enforce contracts
Conclusion: Arrays Are a Smell, Not a Strategy
If your codebase relies on:
- comments to explain data
- array keys to encode rules
- discipline instead of constraints
You’re not designing software — you’re managing risk manually.
Modern PHP gives us the tools to do better.
Using them is not “overengineering”.
It’s professionalism.
What’s Next?
If this resonated, good — it means you’re ready for the next step:
- Designing immutable domain models
- Enforcing invariants with constructors
- Using PHPStan as an architectural guardrail
- Killing $data once and for all
Stop passing arrays. Start designing systems.
You’ll never want to go back.
Top comments (13)
I let PHP enforce contracts. The problem I have with PHPStan to enforce contracts is that bugs can still occur when running the code in production.
I don't want to let PHPStan be the Typescript of PHP.
To be clear, I'm not against PHPStan.
It is the
array<string, scalar>andnon-empty-stringcomments that gives me the creeps.In the end PHPStan is an additional tool. It shouldn't be considered as part of the language.
Why not, even the language developers from php itself though of this more than once, same as hack (Facebook) who has introduced Generics since many years this way. docs.hhvm.com/hack/reified-generic...
I think you misunderstood the last sentence of my comment.
It is not that I don't want generics in PHP, I don't want generics to handled only by PHPStan.
I understand, that's why I used that example..., what if the php foundation would include it into php, same as hack did it? (Hack supports generics for type safety, but by default they are erased at runtime (type erasure).)
Generics would be nice, but there are other ways to get the same result. For example an object that only accepts objects that have string and scalar properties.
It is more verbose than generics, but it has the same reliability.
Yes, for many use-cases adding more code is often a more readable way instead of using generics but often enough at some important core points a generic is very helpful to maintaining reliability and to increase developer experience, see : phpstan.org/blog/generics-by-examples
Sure there are cases where generics have added reliability over the current PHP functionality, like matching the input type with the output type.
At the moment the only thing we can do to be sure of that is by adding tests. But we are doing that anyway, so that is another way to mitigate the lack of generics.
I think the question is more what are the benefits of adding generics at this point?
Wouldn't it be easier to implement the generics functionality that is not covered by the current code practices as standalone features?
While I don't like PHPStan for generics, I seem to be an outsider voice. For example the new Symfony configuration threw the builder pattern overboard in favor of an array and one of the reasons they gave is IDE and PHPStan understanding of array shapes.
I don't think they would implemented it that way if they sensed there was no community support for it.
I understand generics are a well known concept, and it could make the transition from another programming language easier. But not every language with generics has the same functionality so then the PHP maintainers have to deal with people that are not satisfied with the way they added generics.
As you already mentioned, phpstan (Generics and Co.) are only additional functions; PHP enforces reality. Static analysis enforces possibilities.
Two different ways to avoid "simple" errors, why shouldn't we use both?
PS: I am also like...
yieldwill returnGenerator<int,User>and as extra we receive autocompletion in the IDE... So why not?
Sure if you want to use both that is fine. And your reasons are valid.
It is only using PHPStan when in can be in code that rubs me the wrong way.
And in the case of the Symfony configuration, I'm on board with their reasoning.
I can adjust my point of view on a case to case basis.
I think we have a similar perspective, it was the quote in my first comment that made me think we were further apart in opinions.
Solid post!
Indeed, these collections of mixed types are convenient but prone to various bugs.
PHPStan allows catching errors early, but it can be hard to use for legacy code. This is not uncommon teams set a lower level of constraint for old code that will be refactored later.
It becomes even more tedious with frameworks and helpers from third-party tools.
You can generate a phpstan baseline file, where you ignore all old errros, and just start to write maintainable code.
For frameworks and other tools there are many phpstan extensions that supports you e.g. auto detect types directory from your sql queries or other magic stuff, if you really need it.