DEV Community

Antonio Turdo
Antonio Turdo

Posted on • Originally published at Medium

Generating JSON Schema from PHP DTOs with Symfony Serializer awareness

A PHP project that serializes and deserializes DTOs often needs a JSON Schema for them — for an LLM's structured output, for API documentation, or to validate incoming payloads.

Writing it by hand works until the code changes. With Symfony Serializer, the JSON shape can shift without the PHP types changing at all — a new serialization group, a #[SerializedName], a discriminator — and the hand-written schema no longer matches what the serializer produces.

json-schema-extractor is a PHP library that generates JSON Schema from DTOs by reading the metadata they already carry — native types, PHPDoc, and serializer configuration — so the schema stays in sync with the code. It supports plain json_encode/JsonSerializable and Symfony Serializer; this article focuses on the latter.


The problem in concrete terms

Take a simple DTO:

final class OrderSummary
{
    public function __construct(
        #[Groups(['public'])]
        #[SerializedName('order_id')]
        public readonly string $id,

        #[Groups(['public'])]
        public readonly Money $total,

        #[Groups(['internal'])]
        public readonly string $internalNote,
    ) {}
}
Enter fullscreen mode Exit fullscreen mode

When Symfony Serializer renders this with the public group, the JSON output is:

{ 
  "order_id": "ORD-1042", 
  "total": { 
    "amount": 4999, 
    "currency": "EUR" 
  } 
}
Enter fullscreen mode Exit fullscreen mode

internalNote is absent. id is renamed. total is a nested object. A hand-written schema that matches this needs to know about groups, serialized names, and how Money is normalized. If any of those change, you update them in two places.


How the library works

The extraction pipeline has four phases, each with a clear responsibility:

1. DiscoverReflectionDiscoverer reads the class with reflection: property names, native PHP types, visibility. No dependencies required.

2. Enrich — one or more enrichers augment the PHP model with metadata reflection alone can't see. PhpStanEnricher and PhpDocumentorEnricher both read PHPDoc (@var list<string>, @var array{name: string, age: int}, generics, descriptions, deprecation), via different parsers. SymfonyValidationEnricher maps Symfony Validator constraints (NotBlank, Length, Range…) to their JSON Schema equivalents — appropriate when the application actually validates the objects against those constraints, so the schema's guarantees hold for the real data. Enrichers are optional and composable.

3. Project — a serialization strategy converts the enriched PHP model into the serialized shape: the JSON-facing view of the class. The Symfony Serializer support is implemented in this phase.

4. MapStandardJsonSchemaMapper folds the projected shape into a JSON Schema document, handling $ref, reusable definitions, dialect (draft-7 or 2020-12), and union semantics.

The four phases are wired together by SchemaExtractor, the entry point you call to produce a schema.

Each phase is defined by an interface — DiscovererInterface, EnricherInterface, SerializationStrategyInterface, JsonSchemaMapperInterface — so every component can be swapped or extended. You can plug in your own discoverer, enricher, strategy, or mapper without touching the rest.


Installation

Install the library, then add only the optional packages your chosen components need:

composer require zeusi/json-schema-extractor

# optional, depending on what you enable:
composer require phpstan/phpdoc-parser              # for PhpStanEnricher
composer require phpdocumentor/reflection-docblock  # for PhpDocumentorEnricher
composer require symfony/validator                  # for SymfonyValidationEnricher
composer require symfony/serializer                 # for SymfonySerializerStrategy
Enter fullscreen mode Exit fullscreen mode

The core package itself has no mandatory dependencies.


A minimal extractor

use Zeusi\JsonSchemaExtractor\Discoverer\ReflectionDiscoverer;
use Zeusi\JsonSchemaExtractor\Enricher\PhpStanEnricher;
use Zeusi\JsonSchemaExtractor\Mapper\StandardJsonSchemaMapper;
use Zeusi\JsonSchemaExtractor\SchemaExtractor;
use Zeusi\JsonSchemaExtractor\Serialization\JsonEncodeSerializationStrategy;

$extractor = new SchemaExtractor(
    new ReflectionDiscoverer(),
    [new PhpStanEnricher()],
    new JsonEncodeSerializationStrategy(),
    new StandardJsonSchemaMapper(),
);

$schema = $extractor->extract(OrderSummary::class);
Enter fullscreen mode Exit fullscreen mode

This gives you a schema based on native PHP types and PHPDoc. JsonEncodeSerializationStrategy is the right choice when your JSON is produced by json_encode() or JsonSerializable (in that case, the shape is read from jsonSerialize()'s return type and PHPDoc, since its body is opaque to static analysis).

For Symfony Serializer, you swap the strategy.


Adding Symfony Serializer awareness

To make the schema follow Symfony Serializer instead of json_encode(), swap in SymfonySerializerStrategy. Runtime serializer context — serialization groups, for example — is passed to extract() through an ExtractionContext, which is optional:

use Symfony\Component\Serializer\Mapping\Factory\ClassMetadataFactory;
use Symfony\Component\Serializer\Mapping\Loader\AttributeLoader;
use Zeusi\JsonSchemaExtractor\Context\ExtractionContext;
use Zeusi\JsonSchemaExtractor\Context\SymfonySerializerContext;
use Zeusi\JsonSchemaExtractor\Serialization\SymfonySerializerStrategy;

$strategy = new SymfonySerializerStrategy(
    new ClassMetadataFactory(new AttributeLoader()),
);

$extractor = new SchemaExtractor(
    new ReflectionDiscoverer(),
    [new PhpStanEnricher()],
    $strategy,
    new StandardJsonSchemaMapper(),
);

// Optional: a context carrying the runtime serializer settings (here, the "public" group).
$context = (new ExtractionContext())->with(new SymfonySerializerContext([
    'groups' => ['public'],
]));

$schema = $extractor->extract(OrderSummary::class, $context);
Enter fullscreen mode Exit fullscreen mode

For OrderSummary with the public group, the schema matches what Symfony Serializer produces: internalNote is absent, and id appears as order_id.

The strategy reads the same inputs Symfony Serializer reads at runtime — both the attributes declared on the DTO and the options passed in the serializer context (the array shown above, where groups lives):

  • #[SerializedName] / name converters → property keys are renamed in the schema.
  • Serialization groups (groups context option) → only properties in the selected groups appear.
  • ignored_attributes (context option) → the listed properties are excluded.
  • attributes (context option) → restricts the schema to the listed attributes, with per-property nested views for class-backed ones.
  • Discriminator maps (#[DiscriminatorMap]) → a base type expands to a oneOf over its mapped subtypes, each tagged with the discriminator field; a concrete subtype is a single object with that field fixed to its key.
  • Known normalizers (by type) → DateTimeInterface becomes { type: string, format: date-time }, Symfony UIDs become { type: string, format: uuid }, and so on.
  • skip_null_values (context option) → nullable properties become optional in the schema.

What it does not model

The strategy reads static metadata — attributes and types — so it cannot mirror everything Symfony Serializer does at runtime. It is not a 1:1 mapping of the serializer's behaviour. The gaps worth knowing about:

  • preserve_empty_objects (context option) → serializes an empty collection as {} instead of []; whether it applies depends on the runtime value, which static analysis can't see.
  • skip_uninitialized_values (context option) → omits typed properties that were never assigned; that's a fact about the object instance at runtime, not about the class.
  • Custom normalizers outside the known set → their output shape is defined in application code, so it can't be inferred.
  • max_depth_handler / circular_reference_handler (context options) → callables that replace a node when the depth limit or a cycle is hit; their output is arbitrary application code.
  • #[MaxDepth] → not modeled as a tightened schema, and doesn't need to be: recursion is already broken with a $ref, and a depth-bounded payload is a valid instance of that (looser) recursive schema.
  • Interfaces / polymorphic base types → an interface has no single concrete shape, so it can't be resolved on its own. A Symfony #[DiscriminatorMap] turns it into a oneOf when the payload carries a type field; otherwise a custom strategy or enricher can supply the concrete shapes.

When a case isn't covered, the serialization strategy is the extension point: implement SerializationStrategyInterface, or decorate SymfonySerializerStrategy, and handle it there.


Reusable definitions and $ref

By default, class-backed nested types are emitted once under definitions (Draft-7) or $defs (2020-12) and referenced with $ref everywhere they are used. If you prefer the nested schemas expanded at the point of use instead, switch to ClassReferenceStrategy::Inline:

use Zeusi\JsonSchemaExtractor\Mapper\ClassReferenceStrategy;
use Zeusi\JsonSchemaExtractor\Mapper\StandardJsonSchemaMapper;
use Zeusi\JsonSchemaExtractor\Mapper\StandardJsonSchemaMapperOptions;

$mapper = new StandardJsonSchemaMapper(new StandardJsonSchemaMapperOptions(
    classReferenceStrategy: ClassReferenceStrategy::Inline,
));
Enter fullscreen mode Exit fullscreen mode

Circular references are handled automatically either way: a self-referential class produces a $ref back to the root (#) or to the relevant definition, rather than an infinite expansion.


Symfony Bundle

If you use the Symfony framework, a bundle registers the built-in components as services and wires the extractor into the container. Its main convenience is reusing the Symfony Serializer and Validator services your application already has — so you don't assemble a ClassMetadataFactory, a validator, and the strategy/enricher wiring by hand.

You declare one or more extractor pipelines, choosing the strategy that matches how your app serializes:

# config/packages/json_schema_extractor.yaml
json_schema_extractor:
  default_extractor: api
  extractors:
    api:
      enrichers:
        - json_schema_extractor.enricher.phpstan
      serialization: json_schema_extractor.serialization.symfony_serializer
Enter fullscreen mode Exit fullscreen mode

SchemaExtractor is aliased to the default extractor, so you inject it directly:

public function __construct(
    private readonly SchemaExtractor $extractor,
) {}

// ...
$schema = $this->extractor->extract(OrderSummary::class);
Enter fullscreen mode Exit fullscreen mode

See the bundle documentation for multiple pipelines, custom services, and the debug command.


Use case: structured output for LLMs

A common use today is structured output from an LLM. Most providers accept a JSON Schema as the contract for the model's response: you generate the schema from a DTO, send it with the request, and deserialize the response back into that same DTO.

Since the schema comes from the DTO you deserialize into, the two stay in sync as the DTO changes.


Resources

The library is open source under the MIT license: github.com/antonioturdo/json-schema-extractor. The documentation covers all enrichers, serialization strategies, mapper options, and the Symfony bundle in detail.

If you are building something with it — structured output pipelines, AsyncAPI documentation, API contract testing — feedback and contributions are welcome.

Top comments (0)