<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rishabh Bisht</title>
    <description>The latest articles on DEV Community by Rishabh Bisht (@therish2050).</description>
    <link>https://dev.to/therish2050</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F900668%2Facf128b6-68be-4ee8-97a8-2e07f7b1e734.jpeg</url>
      <title>DEV Community: Rishabh Bisht</title>
      <link>https://dev.to/therish2050</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/therish2050"/>
    <language>en</language>
    <item>
      <title>How to Scale to a Billion Documents in Symfony - Part II</title>
      <dc:creator>Rishabh Bisht</dc:creator>
      <pubDate>Fri, 27 Feb 2026 13:08:44 +0000</pubDate>
      <link>https://dev.to/mongodb/how-to-scale-to-a-billion-documents-in-symfony-part-ii-4h2e</link>
      <guid>https://dev.to/mongodb/how-to-scale-to-a-billion-documents-in-symfony-part-ii-4h2e</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was written by &lt;strong&gt;&lt;a href="https://connect.symfony.com/profile/alcaeus" rel="noopener noreferrer"&gt;Andreas Braun&lt;/a&gt;&lt;/strong&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Building an Application
&lt;/h1&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/mongodb/how-to-scale-to-a-billion-documents-in-symfony-16pf"&gt;first part&lt;/a&gt; of this series, we designed a schema to store fuel prices in MongoDB and imported a large amount of data into the database. We also looked at how to efficiently aggregate this data using MongoDB's aggregation framework. With the database prepared, it is now time to build a Symfony application that allows us to interact with this data. In this part of the tutorial, we'll use the Doctrine MongoDB ODM to map our documents to PHP classes, and Symfony UX to build a modern frontend.&lt;/p&gt;

&lt;p&gt;To recap, this is what we want our application to do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Show a dashboard with some general information
&lt;/li&gt;
&lt;li&gt;Show a list of stations
&lt;/li&gt;
&lt;li&gt;Allow searching for a station by name, brand, or location
&lt;/li&gt;
&lt;li&gt;Show the details of a station, including:

&lt;ul&gt;
&lt;li&gt;A map showing the station's location
&lt;/li&gt;
&lt;li&gt;The latest prices for each fuel type
&lt;/li&gt;
&lt;li&gt;A chart showing the price history for the last day&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setting Up the Application
&lt;/h2&gt;

&lt;p&gt;First, we set up our Symfony application using the Symfony CLI. If you don't have it installed yet, you can find instructions on the &lt;a href="https://symfony.com/download" rel="noopener noreferrer"&gt;Symfony website&lt;/a&gt;. Once the CLI is installed, create a new Symfony project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;symfony new &lt;span class="nt"&gt;--webapp&lt;/span&gt; fuel-price-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a new web application with the latest version of Symfony. Since we don't need the Doctrine ORM, we can remove a number of packages from composer.json:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer remove doctrine/dbal doctrine/doctrine-bundle doctrine/doctrine-migrations-bundle doctrine/orm symfony/doctrine-messenger
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, make sure that you have the MongoDB extension installed and enabled. If you don't have it installed yet, install it using &lt;a href="https://github.com/php/pie" rel="noopener noreferrer"&gt;pie&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pie &lt;span class="nb"&gt;install &lt;/span&gt;mongodb/mongodb-extension:^2.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we're ready to install the Doctrine MongoDB ODM bundle. This bundle allows easy configuration of the Doctrine MongoDB ODM and provides a set of tools to work with MongoDB in Symfony applications. To install the bundle, run the following command in your project directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require doctrine/mongodb-odm-bundle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using Symfony Flex, this command sets up everything you need to connect to MongoDB. You can check the default configuration in &lt;code&gt;config/packages/doctrine_mongodb.yaml&lt;/code&gt;. To connect to your database, update the &lt;code&gt;MONGODB_URL&lt;/code&gt; environment variable in your &lt;code&gt;.env&lt;/code&gt; file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mapping Documents
&lt;/h2&gt;

&lt;p&gt;The first document we'll map is the station data. Create a new &lt;code&gt;Document\Station&lt;/code&gt; class and add fields for everything we need to store:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="k"&gt;declare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strict_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Document&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Document\Partial\AbstractStation&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Repository\StationRepository&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\Common\Collections\ArrayCollection&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\Common\Collections\Collection&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\ODM\MongoDB\Mapping\Annotations\Document&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\ODM\MongoDB\Mapping\Annotations\EmbedOne&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\ODM\MongoDB\Mapping\Annotations\Field&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\ODM\MongoDB\Mapping\Annotations\Id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\ODM\MongoDB\Mapping\Annotations\Index&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;GeoJson\Geometry\Point&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Symfony\Component\Uid\Uuid&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Symfony\Component\Uid\UuidV4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="na"&gt;#[Document(repositoryClass: StationRepository::class)]&lt;/span&gt;
&lt;span class="na"&gt;#[Index(keys: ['location' =&amp;gt; '2dsphere'])]&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Station&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;#[Id(type: 'uuid', strategy: 'none')]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;Uuid&lt;/span&gt; &lt;span class="nv"&gt;$id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Field]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Field]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$brand&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Field(type: 'point')]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;Point&lt;/span&gt; &lt;span class="nv"&gt;$location&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[EmbedOne(targetDocument: Address::class)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;Address&lt;/span&gt; &lt;span class="nv"&gt;$address&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nc"&gt;UuidV4&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="nv"&gt;$id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$id&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nc"&gt;UuidV4&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="nv"&gt;$id&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;UuidV4&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that we're not yet mapping the &lt;code&gt;latestPrice&lt;/code&gt; and &lt;code&gt;latestPrices&lt;/code&gt; fields. We'll do that later when we have the &lt;code&gt;DailyPrice&lt;/code&gt; document ready. While we're mostly relying on imported data, we still want fully functionable documents, so we make sure that we can create a new station both with and without an ID. We store the ID as a &lt;code&gt;uuid&lt;/code&gt; type, which was recently added to the Doctrine MongoDB ODM. This allows us to use the &lt;code&gt;Uuid&lt;/code&gt; class from Symfony, mapping to a MongoDB &lt;code&gt;Binary&lt;/code&gt; UUID. The location is stored as a &lt;code&gt;Point&lt;/code&gt; type, which I also created specifically for this project.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Address&lt;/code&gt; class is a simple value object that we map as an embedded document. Since we're only handling German addresses, we can keep it simple and implement the &lt;code&gt;Stringable&lt;/code&gt; interface for easy string conversion. Here's the complete &lt;code&gt;Address&lt;/code&gt; class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="k"&gt;declare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strict_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Document&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\ODM\MongoDB\Mapping\Annotations\EmbeddedDocument&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\ODM\MongoDB\Mapping\Annotations\Field&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\ODM\MongoDB\Mapping\Annotations\Index&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Stringable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="na"&gt;#[EmbeddedDocument]&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Address&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;Stringable&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="na"&gt;#[Field]&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$street&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;#[Field]&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$houseNumber&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;#[Field]&lt;/span&gt;
        &lt;span class="na"&gt;#[Index]&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$postCode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;#[Field]&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__toString&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;&amp;lt;&amp;lt;&amp;lt;EOT
$this-&amp;gt;street $this-&amp;gt;houseNumber
$this-&amp;gt;postCode $this-&amp;gt;city
EOT;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mapping also shows the indexes we'll want to create. For the stations themselves, we're creating a 2dsphere index on the &lt;code&gt;location&lt;/code&gt; field to allow for geospatial queries. The &lt;code&gt;address.postCode&lt;/code&gt; field is indexed to allow for quick lookups by postal code. We'll talk about more indexes later, but for now this will suffice.&lt;/p&gt;

&lt;p&gt;Before we can continue with the &lt;code&gt;DailyPrice&lt;/code&gt; document, we need to think about the station mapping. We want to store stations as a standalone document, but we also want to be able to embed station data into the &lt;code&gt;DailyPrice&lt;/code&gt; document. Embedding certain station data allows us to query prices without having to join two collections. Since documents and embedded documents require different mapping information, we'll have to introduce an abstraction in the form of a mapped superclass:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\ODM\MongoDB\Mapping\Annotations\EmbedOne&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\ODM\MongoDB\Mapping\Annotations\Field&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\ODM\MongoDB\Mapping\Annotations\Index&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Doctrine\ODM\MongoDB\Mapping\Annotations\MappedSuperclass&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;GeoJson\Geometry\Point&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="na"&gt;#[MappedSuperclass]&lt;/span&gt;
&lt;span class="na"&gt;#[Index(keys: ['location' =&amp;gt; '2dsphere'])]&lt;/span&gt;
&lt;span class="k"&gt;abstract&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AbstractStation&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;#[Field]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Field]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$brand&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Field(type: 'point')]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;Point&lt;/span&gt; &lt;span class="nv"&gt;$location&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[EmbedOne(targetDocument: Address::class)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;Address&lt;/span&gt; &lt;span class="nv"&gt;$address&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We then create one station document that is used as a standalone document:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Station&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AbstractStation&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;#[Id(type: 'binaryUuid', strategy: 'none')]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;Uuid&lt;/span&gt; &lt;span class="nv"&gt;$id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="nc"&gt;UuidV4&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="nv"&gt;$id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$id&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nc"&gt;UuidV4&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="nv"&gt;$id&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;UuidV4&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;latestPrices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ArrayCollection&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The embedded station document is then mapped as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="na"&gt;#[EmbeddedDocument]&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;EmbeddedStation&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AbstractStation&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;#[ReferenceOne(name: '_id', storeAs: 'id', targetDocument: Station::class)]&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;Station&lt;/span&gt; &lt;span class="nv"&gt;$referencedStation&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the embedded document, we include a reference to the Station collection, but we only store its identifier in the &lt;code&gt;_id&lt;/code&gt; field. This is a common strategy to deal with denormalized data in MongoDB, as it allows us to resolve the reference using a &lt;code&gt;$lookup&lt;/code&gt; operator or the ODM’s reference mechanisms when we need to access data that we haven’t duplicated into the embedded document. We'll be using the exact same strategy for the &lt;code&gt;DailyPrice&lt;/code&gt; document, since it needs to be embedded into the &lt;code&gt;Station&lt;/code&gt; document as well. So, let's map that next:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="na"&gt;#[MappedSuperclass]&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AbstractDailyPrice&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;#[Id]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Field(type: Type::DATE_IMMUTABLE)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;DateTimeImmutable&lt;/span&gt; &lt;span class="nv"&gt;$day&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Field(enumType: Fuel::class)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;Fuel&lt;/span&gt; &lt;span class="nv"&gt;$fuel&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Field(nullable: true)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;?float&lt;/span&gt; &lt;span class="nv"&gt;$openingPrice&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Field]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="nv"&gt;$closingPrice&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[EmbedOne(targetDocument: Price::class)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;Price&lt;/span&gt; &lt;span class="nv"&gt;$lowestPrice&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[EmbedOne(targetDocument: Price::class)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;Price&lt;/span&gt; &lt;span class="nv"&gt;$highestPrice&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[EmbedMany(targetDocument: Price::class)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;Collection&lt;/span&gt; &lt;span class="nv"&gt;$prices&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Field]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="nv"&gt;$weightedAveragePrice&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the base class for all &lt;code&gt;DailyPrice&lt;/code&gt; documents, and it comes with a small embedded document:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="na"&gt;#[EmbeddedDocument]&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Price&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;#[Field(name: '_id', type: Type::OBJECTID)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Field(type: Type::DATE_IMMUTABLE)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;DateTimeImmutable&lt;/span&gt; &lt;span class="nv"&gt;$date&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Field()]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="nv"&gt;$price&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Field()]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;?float&lt;/span&gt; &lt;span class="nv"&gt;$previousPrice&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;DailyPrice&lt;/code&gt; document we embed in the &lt;code&gt;Station&lt;/code&gt; document does not need to embed a station, so it can be empty:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="na"&gt;#[EmbeddedDocument]&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;EmbeddedDailyPrice&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AbstractDailyPrice&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The actual &lt;code&gt;DailyPrice&lt;/code&gt; document that we store in its own collection adds an embedded relationship to a station:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="na"&gt;#[Index(keys: ['fuel' =&amp;gt; 1, 'day' =&amp;gt; -1, 'station._id' =&amp;gt; 1], unique: true)]&lt;/span&gt;
&lt;span class="na"&gt;#[Index(keys: ['station._id' =&amp;gt; 1, 'day' =&amp;gt; -1])]&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DailyPrice&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AbstractDailyPrice&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;#[EmbedOne(targetDocument: EmbeddedStation::class)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;EmbeddedStation&lt;/span&gt; &lt;span class="nv"&gt;$station&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the most important mappings completed. You can also see that we create two separate indexes: one for fuel, day, and station that we also require to be unique, and another for the station and day. These indexes will cover most of our use cases. When building indexes, I usually apply the &lt;a href="https://www.mongodb.com/docs/upcoming/tutorial/equality-sort-range-guideline/" rel="noopener noreferrer"&gt;ESR (Equality, Sort, Range) guideline&lt;/a&gt;. The first fields in an index should be those used to do equality checks (e.g. checking for a specific fuel type), followed by fields used for sorting, and lastly fields used for range queries. There may be reasons to slightly deviate from this guideline, but in most cases it works well. Compound indexes in MongoDB also come with prefix indexes, meaning that if you query for a subset of fields in an index, MongoDB can still use that index for the query. Our first index on &lt;code&gt;fuel&lt;/code&gt;, &lt;code&gt;day&lt;/code&gt;, and &lt;code&gt;station._id&lt;/code&gt; can also be used to query for just &lt;code&gt;fuel&lt;/code&gt; and &lt;code&gt;day&lt;/code&gt;, or just for &lt;code&gt;fuel&lt;/code&gt;. For queries that don't filter by fuel type, we can use the second index on &lt;code&gt;station._id&lt;/code&gt; and &lt;code&gt;day&lt;/code&gt;. Note that the order of the station ID and day are flipped here, as we may want to sort by date when querying for a station's prices, or might be looking for a range of days for a station. Speaking of sorting, you'll notice that the index order is reversed on the &lt;code&gt;day&lt;/code&gt; field, and that data is sorted in descending order. We are more likely to be interested in newer price data than in older price data, so a reverse sort on the index makes our queries more efficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  Showing Station Details
&lt;/h2&gt;

&lt;p&gt;Now that we have our documents mapped, let’s show some data in our application. I’ll skip over the details as the controllers for the station list and details page are essentially CRUD controllers. If you are interested in those details, you can check out the &lt;a href="https://github.com/alcaeus/fupran-doctrine/blob/ecd78286cd840d876a8d58ae8238d4b9c63a43b2/src/Controller/StationsController.php" rel="noopener noreferrer"&gt;full source code of the controller&lt;/a&gt;. In the templates, I created several Symfony UX components to have reusable cards showing important station information, like the name. One example would be the one for station details that includes a map of where to find this station. By using Symfony UX components, we can encapsulate the logic for generating the map and adding all the information inside a dedicated PHP class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="na"&gt;#[AsTwigComponent('Station:Map')]&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;AbstractStation&lt;/span&gt; &lt;span class="nv"&gt;$station&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;getMap&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;UxMap&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;$longitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$latitude&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;station&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getCoordinates&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="nv"&gt;$location&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$latitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$longitude&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nv"&gt;$map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;UxMap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="nv"&gt;$map&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;center&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$location&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;zoom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;addMarker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Marker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;$location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;station&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;infoWindow&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;InfoWindow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;headerContent&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;station&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;nl2br&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;station&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;));&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$map&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can then include a map for a station wherever we want by including our handy little component:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;twig:Station:Map station="{{ station }}" /&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We’re now showing a list of stations and some details about that station, but we’re not displaying prices yet. So, let’s see how we can efficiently show prices for the station.&lt;/p&gt;

&lt;h2&gt;
  
  
  Showing Latest Prices
&lt;/h2&gt;

&lt;p&gt;When we imported data, we stored the latest price for each fuel type in the &lt;code&gt;Station&lt;/code&gt; document, as well as a list of the last 30 days. To remind ourselves, this is the &lt;code&gt;latestPrice&lt;/code&gt; field in the &lt;code&gt;Station&lt;/code&gt; document:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"latestPrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"diesel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"$oid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"673f8d1f3de33b548fc21a9f"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"closingPrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.599&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"day"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-11-19T00:00:00.000Z"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"fuel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"diesel"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"highestPrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"$oid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"673f29d86526447058420b3d"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-11-19T11:30:38.000Z"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.649&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"lowestPrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"$oid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"673f29df65264470584ab598"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-11-19T21:07:55.000Z"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.559&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"openingPrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.599&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"prices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"weightedAveragePrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.6&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"e10"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"e5"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To map this data, we'll need to create a new &lt;code&gt;DailyPriceReport&lt;/code&gt; class that represents a daily price report for all fuel types at a station:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="na"&gt;#[EmbeddedDocument]&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LatestPrice&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;#[EmbedOne(targetDocument: EmbeddedDailyPrice::class)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;EmbeddedDailyPrice&lt;/span&gt; &lt;span class="nv"&gt;$diesel&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[EmbedOne(targetDocument: EmbeddedDailyPrice::class)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;EmbeddedDailyPrice&lt;/span&gt; &lt;span class="nv"&gt;$e5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[EmbedOne(targetDocument: EmbeddedDailyPrice::class)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;EmbeddedDailyPrice&lt;/span&gt; &lt;span class="nv"&gt;$e10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can now embed this document in the &lt;code&gt;Station&lt;/code&gt; class. We can build a similar embedded document with an &lt;code&gt;EmbedMany&lt;/code&gt; relationship for the last 30 days of prices, which we'll call &lt;code&gt;LatestPrices&lt;/code&gt; and map it here as well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Station&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AbstractStation&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;#[EmbedOne(targetDocument: LatestPrice::class)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;?LatestPrice&lt;/span&gt; &lt;span class="nv"&gt;$latestPrice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[EmbedOne(targetDocument: LatestPrices::class)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;?LatestPrices&lt;/span&gt; &lt;span class="nv"&gt;$latestPrices&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once we have our documents mapped, we can create more UX components to display them. For example, this is the component to show the latest price data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{# @var App\Document\EmbeddedDailyPrice #}
{% props dailyPrice = null %}

{% if dailyPrice is not null %}
&amp;lt;dt&amp;gt;{{ dailyPrice.fuel.displayValue }} &amp;lt;span class="text-muted"&amp;gt;({{ dailyPrice.latestPrice.date|date }})&amp;lt;/span&amp;gt;&amp;lt;/dt&amp;gt;
&amp;lt;dd&amp;gt;
    &amp;lt;twig:Price price="{{ dailyPrice.latestPrice.price }}" /&amp;gt;
    (Average: &amp;lt;twig:Price price="{{ dailyPrice.weightedAveragePrice }}" /&amp;gt;)
&amp;lt;/dd&amp;gt;
{% endif %}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I want to point out one small detail: accessing the latest price in a &lt;code&gt;DailyPrice&lt;/code&gt; object. While the latest price is also stored as the &lt;code&gt;closingPrice&lt;/code&gt;, we don't record the time or any other detail for that. So, in our templates, we'd always have to call &lt;code&gt;dailyPrice.prices.last()&lt;/code&gt; to access this. With PHP 8.4 and property accessors, we can also create a virtual property that returns the latest price:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;?Price&lt;/span&gt; &lt;span class="nv"&gt;$latestPrice&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;get&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;prices&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;last&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This way, our template doesn't need to know where the latest price comes from, and if we later change our schema to store all price information in the &lt;code&gt;closingPrice&lt;/code&gt; field, we can change the implementation of this property without affecting the template code. It's these small details that sometimes make code easier to understand and maintain.&lt;/p&gt;

&lt;p&gt;Our application now serves a paginated list of stations, and we can click on a station to see it on a map and see its latest prices. If you remember, we also wanted to show a chart with the price history. In the next part of this tutorial, we'll do exactly that. We'll be using Symfony UX's Chart.js integration to show charts on the station page, and we'll also embed charts from &lt;a href="https://www.mongodb.com/products/platform/atlas-charts" rel="noopener noreferrer"&gt;Atlas Charts&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizing Data with Charts
&lt;/h2&gt;

&lt;p&gt;Welcome to the third part of this series. In the previous page, we ended up with a list of stations and a details page that shows the latest prices for each fuel type. In this part, we want to add some charts to better visualize the price data that we have. To do this, we can start by using Symfony UX's Chart.js integration. First, we install the package as well as an adapter for Chart.js to handle dates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require symfony/ux-chartjs
./bin/console importmap:require chartjs-adapter-date-fns
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once it's installed, we can create a small component that creates and renders our chart. The template only contains a call to &lt;code&gt;render_chart()&lt;/code&gt;, all the logic is contained in the component class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="na"&gt;#[AsTwigComponent('Chart:Station:DayPriceOverview')]&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DayPriceOverview&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;LatestPrice&lt;/span&gt; &lt;span class="nv"&gt;$latestPrice&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;ChartBuilderInterface&lt;/span&gt; &lt;span class="nv"&gt;$chartBuilder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;getChart&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;Chart&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$chart&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;chartBuilder&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;createChart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Chart&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;TYPE_LINE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nv"&gt;$chart&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;setData&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'datasets'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;createDatasets&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="nv"&gt;$chart&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;setOptions&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'parsing'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'datasets'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'line'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'stepped'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'before'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'scales'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'x'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'type'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'time'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$chart&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;createDatasets&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$datasets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

        &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Fuel&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;cases&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$fuel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$fuelType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$fuel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="k"&gt;isset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;latestPrice&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;$fuelType&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="nv"&gt;$datasets&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;createDataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;latestPrice&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;$fuelType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$fuel&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$datasets&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;createDataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;EmbeddedDailyPrice&lt;/span&gt; &lt;span class="nv"&gt;$dailyPrice&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;Fuel&lt;/span&gt; &lt;span class="nv"&gt;$fuel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'label'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$fuel&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'data'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;array_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Price&lt;/span&gt; &lt;span class="nv"&gt;$price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'x'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$price&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getTimestamp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'y'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$price&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="nv"&gt;$dailyPrice&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;prices&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;toArray&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The logic basically revolves around massaging the data so that Chart.js can display it without having to parse it. We use millisecond timestamps for the X axis, and prices for the Y axis, showing each fuel type as its own dataset. We can now render this chart in our template. We can do pretty much the same for the price history chart, where we want to show the lowest, average, and highest prices for each of the last 30 days. We already have all the necessary data in the &lt;code&gt;latestPrices&lt;/code&gt; station field, so we can create an equivalent component for that data. With that, we're already showing a good chunk of information about the station's prices. And, since we have pre-aggregated this data, the view page is lightning fast and requires a single query to the &lt;code&gt;station&lt;/code&gt; collection. If we had to join to a collection storing all prices, loading this data would take significantly longer.&lt;/p&gt;

&lt;p&gt;This approach works great for the data that we have denormalized into the &lt;code&gt;Station&lt;/code&gt; document, and we can also leverage it for other data that we can fetch from the database. Since we might have to aggregate the &lt;code&gt;DailyPrice&lt;/code&gt; collection directly for other charts, we should also consider caching data for charts. This is especially important for charts that show historical data, as we don't want to run expensive aggregation queries every time a user visits a page. There is another alternative for this so we don't have to do this ourselves: &lt;a href="https://www.mongodb.com/products/platform/atlas-charts" rel="noopener noreferrer"&gt;MongoDB Atlas Charts&lt;/a&gt;. Atlas Charts allows us to create charts based on our MongoDB data, all without having to write any code. We can then embed single charts or an entire dashboard into our Symfony application using the &lt;a href="https://www.mongodb.com/docs/charts/embedding-charts/" rel="noopener noreferrer"&gt;Atlas Charts Embed API&lt;/a&gt;. Atlas Charts automatically caches chart data, so we don't have to worry too much about the performance of our charts. A full deep dive into Charts would be a bit too much for this blog post. Feel free to check out Atlas Charts yourself if you want to go further!&lt;/p&gt;

&lt;p&gt;We're now in a good place: we have a working application that allows us to view stations, their latest prices, and some charts to visualize the data. The data we have was entirely imported from a CSV dump, and we haven't yet considered adding new data to the database. In the next part of this series, we'll look at how we can maintain consistency in this denormalized database and how MongoDB's query framework makes this easier.  &lt;/p&gt;

</description>
      <category>symfony</category>
      <category>webdev</category>
      <category>database</category>
      <category>php</category>
    </item>
    <item>
      <title>How to Scale to a Billion Documents in Symfony</title>
      <dc:creator>Rishabh Bisht</dc:creator>
      <pubDate>Wed, 14 Jan 2026 11:45:32 +0000</pubDate>
      <link>https://dev.to/mongodb/how-to-scale-to-a-billion-documents-in-symfony-16pf</link>
      <guid>https://dev.to/mongodb/how-to-scale-to-a-billion-documents-in-symfony-16pf</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was written by &lt;strong&gt;&lt;a href="https://connect.symfony.com/profile/alcaeus" rel="noopener noreferrer"&gt;Andreas Braun&lt;/a&gt;&lt;/strong&gt;. Check out &lt;strong&gt;&lt;a href="https://dev.to/mongodb/how-to-scale-to-a-billion-documents-in-symfony-part-ii-4h2e"&gt;part 2&lt;/a&gt;&lt;/strong&gt; of this article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you regularly drive a car, you will no doubt be familiar with fuel prices. They vary wildly from country to country, change frequently depending on global oil prices, and are a constant discussion topic in politics. Have you sometimes felt that prices rise and fall somewhat arbitrarily? Speaking of price changes, how often do prices change where you live? How much could you save if you travelled to the next city, or even country to refuel your car?&lt;/p&gt;

&lt;p&gt;While these questions may seem silly to some, to people in Germany, they come up quite often. You'd be surprised to hear that fuel prices in Germany change multiple times per day, and they vary wildly between mornings and evenings, and even between different cities in Germany. In 2012, the German government passed a law that enabled the Federal Cartel Office to require petrol stations to &lt;a href="https://www.google.com/url?q=https://www.bundeskartellamt.de/EN/Tasks/markettransparencyunit_fuels/markettransparencyunit_fuels.html&amp;amp;sa=D&amp;amp;source=docs&amp;amp;ust=1753705960715998&amp;amp;usg=AOvVaw2a0wltiRA0Rp20mFImLpYq" rel="noopener noreferrer"&gt;submit their fuel prices to a central database&lt;/a&gt;, which can then be consumed by various websites and apps to help people find the cheapest petrol station in their area. While the data isn't considered "open data" and access is only granted to people or companies that maintain a service to inform consumers, there is one such service that provides full access to all data. Since I was always interested in working with big amounts of data to see various trends, I decided to use this dataset to build a small application that shows some trends. This also provides a good opportunity to show how to work with such large amounts of data in MongoDB.&lt;/p&gt;

&lt;p&gt;When we’re done, we’ll want to show a map of where a station is, its data, and the price history for the last day for which we have prices (which would be the current day once the application receives live data):&lt;br&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfsbi238xejirisyba63.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfsbi238xejirisyba63.png" alt="Screenshot of the application, showing a map of the petrol station, latest price data for Diesel, E5, and E10 petrol, and a chart showing the price evolution that day for all three fuel types" width="800" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this first part, we're going to inspect the data we have, how we can import it into MongoDB, and how we can design a schema that allows us to efficiently work with the data. In the second part, we will then build a Symfony application to display the data.&lt;/p&gt;
&lt;h2&gt;
  
  
  A first look at the data
&lt;/h2&gt;

&lt;p&gt;Unfortunately, there is no open data solution to get access to this, and the data is only provided to registered applications that provide an information service to users. However, CSV files were available through a git repository for a while, which is what I used here. Cloning the repository gives us a first indication of what a massive dataset we're dealing with here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;du&lt;/span&gt; &lt;span class="nt"&gt;-hcs&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt;
 78G    prices
8.0K    README.md
9.8G    stations
 87G    total
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;78 GB of price data, and almost 10 GB of station data. That's quite the dataset we've got here! Within the two datasets, the files are conveniently organized by year and month, with one file per day:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt; prices/2025/06
2025-06-01-prices.csv
2025-06-02-prices.csv
&lt;span class="o"&gt;[&lt;/span&gt;...]

&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt; stations/2025/06
2025-06-01-stations.csv
2025-06-02-stations.csv
&lt;span class="o"&gt;[&lt;/span&gt;...]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is good news for us, as it allows us to slowly start working with some data while we build our schema, then import more and more data as we try to stretch the performance limits. But what does 78 GB of price data even mean? First, let's see how many stations we have:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt; stations/2025/06/2025-06-24-stations.csv 
   17592 stations/2025/06/2025-06-24-stations.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ok, this is not too bad—we have just under 18,000 stations in the last file. Let's see how many price records we have for that day:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt; prices/2025/06/2025-06-24-prices.csv  
  453714 prices/2025/06/2025-06-24-prices.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wow, more than 450,000 price records for a single day, which works out to an average of 25.8 records per station per day! This is going to be a challenge, but let's not get ahead ourselves. First, let's take a more detailed look at the data to see what information we have, and then talk about how to efficiently store such data in MongoDB.&lt;/p&gt;

&lt;h2&gt;
  
  
  Analyzing data
&lt;/h2&gt;

&lt;p&gt;Let's start by looking at the station data. The first line of the CSV file contains the column names, so we can easily see what data is available:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;uuid,name,brand,street,house_number,post_code,city,latitude,longitude,first_active,openingtimes_json
0e18d0d3-ed38-4e7f-a18e-507a78ad901d,OIL! Tankstelle München,OIL!,Eversbuschstraße 33,,80999,München,48.1807,11.4609,1970-01-01 01:00:00+01,"{""openingTimes"":[{""applicable_days"":192,""periods"":[{""startp"":""07:00"",""endp"":""20:00""}]},{""applicable_days"":63,""periods"":[{""startp"":""06:00"",""endp"":""22:00""}]}]}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks quite promising, and just from this first look, we can already make some assumptions about the dataset:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The identifier is a UUID.
&lt;/li&gt;
&lt;li&gt;There's a name and a brand for the station, the latter of which will be quite interesting when comparing stations.
&lt;/li&gt;
&lt;li&gt;The address data is split into street, house number, post code, and city. However, the first record already shows the house number in the street field, so the data might need some cleaning if we want to use this data.
&lt;/li&gt;
&lt;li&gt;There are coordinates to place the petrol station on a map. We can use these coordinates in a geospatial index to find stations near a given location and better compare prices.
&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;first_active&lt;/code&gt; is supposed to indicate when the station was first active, but given that the first record lists the Unix epoch timestamp, this field might not provide data for all stations. We might just skip this as a result.
&lt;/li&gt;
&lt;li&gt;There's a JSON field containing the opening hours of the station. This data doesn't look like it will be useful for querying for open stations, so we will likely want to ignore this data as well.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, let's look at the price data. After all, this is what we came here for. Here's an example record:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;date,station_uuid,diesel,e5,e10,dieselchange,e5change,e10change
2025-06-24 00:00:32+02,cbc2ab85-28ad-47e7-a102-3a0990f5d615,1.559,1.689,1.629,1,1,1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From each record, we can obtain the following data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The date and time of the report
&lt;/li&gt;
&lt;li&gt;The station that reported the price
&lt;/li&gt;
&lt;li&gt;Prices for three types of fuel (Germany uses two different types of petrol with 5% and 10% ethanol)
&lt;/li&gt;
&lt;li&gt;Indicators whether any price changed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Looking at more data in this file, we can spot a couple of odd rows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2025-06-24 00:01:33+02,8336b8d7-01fc-4f79-bdd3-6870c8268035,1.669,0.000,0.000,1,0,0
2025-06-24 00:01:33+02,87666c5e-eead-4ead-9b8c-eb8ab410fddb,1.599,1.679,1.619,0,0,1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both rows indicate that some prices changed and others didn't, but there's a small difference: One row reports a price despite no change being recorded, and the other doesn’t. I'm not sure if this is significant—it could mean that the first station doesn't sell that fuel type. Either way, we'll want to exclude such data from our import. It is also interesting that the schema isn't properly normalized: Typically, the data would be stored in individual rows for each fuel type. This would prevent those odd rows from above:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;date,station_uuid,fuel,price
2025-06-24 00:00:32+02,cbc2ab85-28ad-47e7-a102-3a0990f5d615,diesel,1.559
2025-06-24 00:00:32+02,cbc2ab85-28ad-47e7-a102-3a0990f5d615,e5,1.689
2025-06-24 00:00:32+02,cbc2ab85-28ad-47e7-a102-3a0990f5d615,e10,1.629
2025-06-24 00:01:33+02,8336b8d7-01fc-4f79-bdd3-6870c8268035,diesel,1.669
2025-06-24 00:01:33+02,87666c5e-eead-4ead-9b8c-eb8ab410fddb,e10,1.619
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was most likely a deliberate choice to make the schema more efficient: Storing three additional flags (even when used as a byte) and two additional float fields per row is definitely more efficient than storing the date and UUID fields three times. So, while my university professor would probably dock some points in an exam for this, it is a valid choice. To confirm my suspicion, let's just count the number of lines in the entire price dataset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt; prices/&lt;span class="k"&gt;**&lt;/span&gt;/&lt;span class="k"&gt;*&lt;/span&gt;.csv
 993187497 total
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yes. Almost a billion price records in total! With around 50% of those records changing all three prices, we're probably looking at more than two billion individual prices in the dataset. No wonder they started optimizing the database. Now that we know what we're dealing with, let's try to design a schema for this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Designing the schema
&lt;/h2&gt;

&lt;p&gt;When we draw the schema up as an entity relationship (ER) diagram, it looks relatively straightforward:&lt;br&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffx1w79evms4n1u1xy7ul.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffx1w79evms4n1u1xy7ul.png" alt="ER diagram showing the relationship between the Station, PriceReport, Fuel, and Address entities" width="800" height="1084"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
In a relational database, we could start modeling tables based on this diagram, but as we saw with the prices, the sheer amount of data might force us to make some compromises. We're also using MongoDB, so a relational schema isn't going to be the way to go.&lt;/p&gt;

&lt;p&gt;Station data is going to be easy, with the station document containing all data found in the stations CSV file. We can store the address as an embedded document, as well as the location as a &lt;a href="https://en.wikipedia.org/wiki/GeoJSON" rel="noopener noreferrer"&gt;GeoJSON&lt;/a&gt; point:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$binary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"base64"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DhjQ0+04Tn+hjlB6eK2QHQ=="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"subType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"04"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Oil! Tankstelle München"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"brand"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OIL!"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"address"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"street"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Eversbuschstraße 33"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"houseNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"postCode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"80999"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"city"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"München"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Point"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"coordinates"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="mf"&gt;11.4609&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="mf"&gt;48.1807&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ID might look a bit strange here, but this is actually the same UUID as in the CSV file. We can store it as a &lt;a href="https://github.com/mongodb/specifications/blob/master/source/bson-binary-uuid/uuid.md" rel="noopener noreferrer"&gt;BSON binary with subtype 4&lt;/a&gt;, which stores a UUID according to &lt;a href="https://datatracker.ietf.org/doc/html/rfc4122" rel="noopener noreferrer"&gt;RFC 4122&lt;/a&gt;. The &lt;code&gt;$binary&lt;/code&gt; notation is the &lt;a href="https://github.com/mongodb/specifications/blob/master/source/extended-json/extended-json.md" rel="noopener noreferrer"&gt;extended JSON representation&lt;/a&gt; of the BSON binary type, which is used to represent these BSON types in JSON.&lt;/p&gt;

&lt;p&gt;Storing prices requires a little more thinking. If we wanted to store prices in a normalized way, we would store a document for each price report and reference the station and fuel type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$oid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"673f8a8b3de33b548f9228d0"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fuel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"e5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"station"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$binary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"base64"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DhjQ0+04Tn+hjlB6eK2QHQ=="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"subType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"04"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-06-24T05:08:10.000Z"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.839&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While this would certainly work and make my university professor quite happy, it isn't a very efficient way of storing prices for multiple reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The metadata for each price report is quite large compared to the actual information we want to store.
&lt;/li&gt;
&lt;li&gt;With roughly 150 million price reports per year, this metadata would require a lot of storage space, which is not very efficient.
&lt;/li&gt;
&lt;li&gt;With the collection growing to billions of documents, indexes will grow quite large as well, further complicating our storage requirements.
&lt;/li&gt;
&lt;li&gt;Showing aggregate data requires lots of reads on this large collection, making the process quite slow.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Thanks to MongoDB's document model, we don't have to live with these downsides. Instead, we can leverage schema patterns to store data more efficiently.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.mongodb.com/docs/manual/data-modeling/design-patterns/group-data/bucket-pattern/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;bucket pattern&lt;/a&gt; allows us to group data into larger documents, which can be stored and queried more efficiently. MongoDB's &lt;a href="https://www.mongodb.com/docs/manual/core/timeseries-collections/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;time series collections&lt;/a&gt; apply this pattern automatically, but we can't use it because we'll be using the $merge aggregation pipeline stage, which doesn't work with time series collections. An important consideration when using the bucket pattern is the granularity of the data we want to store. This is where it's useful to think about how you will be using the data.&lt;/p&gt;

&lt;p&gt;Our first use case will be to look for a given fuel station, then seeing their latest prices and maybe the price history for that day. We might also be interested in showing aggregated prices over a longer period of time. With this use case, it makes sense to create buckets for each day, containing all prices for that particular day. Our bucket will thus be identified by a station, a fuel type, and a date. Prices will be stored in a list that we'll keep sorted in ascending order. This is what that document will look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$oid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"673f8a8b3de33b548f9228d0"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fuel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"e5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"station"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$binary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"base64"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DhjQ0+04Tn+hjlB6eK2QHQ=="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"subType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"04"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"day"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-06-24T00:00:00.000Z"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-06-24T05:08:10.000Z"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.839&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This document certainly covers the use case above, but there's a small problem. To find a station, we need to either know the station's ID (e.g., by looking it up in the station collection), or we need to join to the station collection to find the station we're looking for. As you can imagine, this will slow down queries quite a bit. Again, we can make use of MongoDB's document model to work around this problem: Instead of storing a reference to the station, we can embed all station data in the bucket document. This way, we can create all necessary indexes and efficiently query the data. This is quite common in MongoDB, especially when working with data that doesn't change very often.&lt;/p&gt;

&lt;p&gt;Keep in mind that this is a trade-off: Our documents will occupy more storage space, and if station data changes, we will need to update all its bucket documents. In our case, this is a fair trade-off as the data is not likely to change often, and the performance benefits of more efficient queries outweigh the increased storage requirements. Note that we don't necessarily have to duplicate all data when embedding. The amount of data to be duplicated will heavily depend on the use case, how often the data changes, and how often it will be needed by the application. This is our price document with embedded station data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$oid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"673f8a8b3de33b548f9228d0"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fuel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"e5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"station"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"$binary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"base64"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DhjQ0+04Tn+hjlB6eK2QHQ=="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"subType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"04"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Oil! Tankstelle München"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"brand"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OIL!"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"address"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"street"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Eversbuschstraße 33"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"houseNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"postCode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"80999"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"city"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"München"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Point"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"coordinates"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="mf"&gt;11.4609&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="mf"&gt;48.1807&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"day"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-06-24T00:00:00.000Z"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-06-24T05:08:10.000Z"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.839&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this schema, we can now efficiently query prices, but if we want to show prices over a larger period of time, there's still some aggregation to do. Let's say that on the detail page for a station, we want to show the lowest, average, and highest price for a given fuel type over the last 30 days. We could certainly do this by querying the right documents, then aggregating the prices using the aggregation framework. However, once a day has ended, this data is not going to change anymore. It would make sense to pre-aggregate this data and store it in the bucket. So, let's add these aggregated values to our price document:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$oid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"673f8a8b3de33b548f9228d0"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fuel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"e5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"station"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"$binary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"base64"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DhjQ0+04Tn+hjlB6eK2QHQ=="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"subType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"04"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Oil! Tankstelle München"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"brand"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OIL!"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"address"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"street"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Eversbuschstraße 33"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"houseNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"postCode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"80999"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"city"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"München"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Point"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"coordinates"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="mf"&gt;11.4609&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="mf"&gt;48.1807&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"day"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-06-24T00:00:00.000Z"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-06-24T05:08:10.000Z"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.839&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lowestPrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-06-24T05:08:10.000Z"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.839&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"highestPrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-06-24T05:08:10.000Z"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.839&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"averagePrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.839&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll note that we're not just storing the highest and lowest price values, but also when that price was reported. This will allow us to look for patterns in the data and figure out if there's a trend when a station has the lowest price of the day. This is sufficient for now, so let's move on to the next step: importing the data and aggregating it into the schema above.&lt;/p&gt;

&lt;h2&gt;
  
  
  Importing data
&lt;/h2&gt;

&lt;p&gt;Despite price records storing station data, we will keep all station data in a separate collection. This will serve as a canonical record of all stations, and we can use it to look up station data when we need it. We now have a few options for importing data. If I only wanted to import data for a single day, I would leverage &lt;a href="https://www.mongodb.com/cloud/atlas/register/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;MongoDB Atlas&lt;/a&gt;' federated databases to read the CSV files directly from Azure DevOps and import them into MongoDB. However, since I want to import lots of historical data, there are more efficient ways to do this. Again, it's important to know the data that we have: We want to store buckets for each day, and the CSV files contain one file per day. This means that it will be efficient to load a single CSV file, pre-aggregate that data into the format that we need, then continue with the next file. Instead of using &lt;a href="https://www.mongodb.com/docs/database-tools/mongoimport/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;mongoimport&lt;/a&gt;, I chose to write a small &lt;a href="https://symfony.com/doc/current/console.html#creating-a-command" rel="noopener noreferrer"&gt;Symfony Console command&lt;/a&gt; that reads the CSV file into a collection, separating the fuel types and converting data into the right format. After that, the command aggregates the data into buckets and adds them to the &lt;code&gt;DailyPrice&lt;/code&gt; collection. MongoDB's aggregation framework is perfect for this task, so it's about time we look at some code.&lt;/p&gt;

&lt;p&gt;From a list of prices, we can group them into buckets and find the highest and lowest prices of the day, but to compute the average price, we would need to know the first price of the day. Since stations don't report prices at midnight, we'll need to know the last price of the previous day. So, we'll start out by storing the last price of the day and will shift that to the next document in a separate step. We also need to denormalize station data, so this is one of the few places where we'll use a join. Except for storing the opening price, we can achieve all of this in a single query and store the result in our &lt;code&gt;DailyPrice&lt;/code&gt; collection. Here's the aggregation pipeline I built:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;Stage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;day&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;dateTrunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;dateFieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'date'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'day'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;groupPriceReportsByStationDayFuel&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;reshapeGroupedPriceReports&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;addExtremeValues&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;lookupStation&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Aggregation pipeline builder
&lt;/h3&gt;

&lt;p&gt;Ah, yes, simple and elegant. Of course, I'm hiding all the interesting bits so far. This is the aggregation pipeline builder introduced in version 1.21 of the MongoDB driver. While the Doctrine ODM has had one for quite some time, the builder provided by the driver itself allows us to write our code in a functional style. This is also beneficial for testing. Just by looking at this, we can tell that the pipeline does the following in order:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Groups price reports by station, day, and fuel type
&lt;/li&gt;
&lt;li&gt;Reshapes the buckets into the document we want
&lt;/li&gt;
&lt;li&gt;Adds the extreme values (lowest and highest prices)
&lt;/li&gt;
&lt;li&gt;Looks up station data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since the first &lt;a href="https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/set/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;&lt;code&gt;$set&lt;/code&gt;&lt;/a&gt; stage of the pipeline is the simplest part, let’s take a more detailed look at it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nc"&gt;Stage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;// Truncate the date to the specified unit and store it in a new day field&lt;/span&gt;
    &lt;span class="n"&gt;day&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;dateTrunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="c1"&gt;// We truncate the date field, which contains the exact time of the price report&lt;/span&gt;
        &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;dateFieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'date'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="c1"&gt;// Truncating to the day effectively sets the day to midnight&lt;/span&gt;
        &lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'day'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Behind the scenes, the driver transforms this code to the following pipeline stage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"$set"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"day"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"$dateTrunc"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"unit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"day"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The big advantage of using the builder like this is that we can test each part of the aggregation pipeline in isolation in unit tests to make sure it does what we expect, as we do with other pieces of code. It also makes it much more readable, as we don't have to parse through a huge aggregation pipeline to understand what it does. Now, we can start grouping prices. Of course, we could have done so when reading the CSV, but this would've required us to load all prices into memory. Since we're dealing with large amounts of data, this is not a good idea. Using the aggregation framework allows us to push this workload to the database, where we could even use a sharded cluster to distribute the workload across multiple nodes. So, what does the grouping stage look like?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Stage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;_id&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;station&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'station'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;day&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'day'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;fuel&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'fuel'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;prices&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Accumulator&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;_id&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'_id'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'date'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'price'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)),&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;_id&lt;/code&gt; field will create a group for each station, day, and fuel. For the &lt;code&gt;prices&lt;/code&gt; field, we use the &lt;a href="https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/push/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;&lt;code&gt;$push&lt;/code&gt;&lt;/a&gt; accumulator and push all prices into a list. Note that while the &lt;code&gt;_id&lt;/code&gt; field is an object, we'll change this further down the pipeline to use a simpler ID that can be indexed more easily. Now, let's build the document that we actually want to store. This can be done with the &lt;a href="https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/replaceWith/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;&lt;code&gt;$replaceWith&lt;/code&gt;&lt;/a&gt; stage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Stage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;replaceWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;day&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'_id.day'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;station&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'_id.station'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;fuel&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'_id.fuel'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;prices&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;sortArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;arrayFieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'prices'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;pricesByPrice&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;sortArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;arrayFieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'prices'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We extract the day, station, and fuel from the previous &lt;code&gt;_id&lt;/code&gt; field, and we create two price lists here: one sorted by date, the other sorted by price. We can use the first list to find the closing price of the day, and the second list to find the lowest and highest prices of the day in the next stage. We can then also remove the price list sorted by price:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Stage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;closingPrice&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;getField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s1"&gt;'price'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;last&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;arrayFieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'prices'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;lowestPrice&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;arrayFieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pricesByPrice'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;highestPrice&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;last&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;arrayFieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pricesByPrice'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;pricesByPrice&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;variable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'REMOVE'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where things start getting a bit more complex. The &lt;a href="https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/first/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;&lt;code&gt;$first&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/last/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;&lt;code&gt;$last&lt;/code&gt;&lt;/a&gt; operators are pretty self-explanatory, but to only get the closing price of the day, we use the &lt;a href="https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/getField/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;&lt;code&gt;$getField&lt;/code&gt;&lt;/a&gt; operator to extract the value of the &lt;code&gt;price&lt;/code&gt; field from the price document returned by &lt;code&gt;$last&lt;/code&gt;. Last but not least, we use the &lt;a href="https://www.mongodb.com/docs/upcoming/reference/aggregation-variables/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant#mongodb-variable-variable.REMOVE" rel="noopener noreferrer"&gt;&lt;code&gt;$$REMOVE&lt;/code&gt;&lt;/a&gt; variable to unset the &lt;code&gt;pricesByPrice&lt;/code&gt; field, which saves us from using the &lt;code&gt;$unset&lt;/code&gt; pipeline stage later on. Next up, finding the station. This is where aggregation pipelines really shine, as we can use the &lt;a href="https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/lookup/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;&lt;code&gt;$lookup&lt;/code&gt;&lt;/a&gt; stage after grouping and filtering a lot of data to reduce the numbers of documents that need to be processed. In SQL, this would require some nested queries, which can become very difficult to read and understand, but in MongoDB, it is just another stage in the pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;Stage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'station'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;from&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'Station'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;localField&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'station'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;foreignField&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;Stage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;project&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;_id&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;brand&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Stage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;station&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;arrayFieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'station'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Stage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="k"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;station&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Query&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;ne&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But what's this? We're returning an entire pipeline from this factory to be used in a pipeline? Yes! When working with aggregation pipelines in the past, I noticed that there are some common tasks that require more than a single stage. Since the &lt;code&gt;$lookup&lt;/code&gt; stage returns an array of documents even if only a single document is matched, we use a &lt;code&gt;$set&lt;/code&gt; stage directly afterwards to extract the first (and, in our case, only) document from the array. The aggregation pipeline builder knows how to handle this and contcatenates nested pipelines so you never need to worry about this. For good measure, we also drop all documents where we couldn't find a station. We don't expect any such documents to be there, but it's better to be safe than sorry. Also, note that we're using a &lt;a href="https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/project/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;&lt;code&gt;$project&lt;/code&gt;&lt;/a&gt; stage within &lt;code&gt;$lookup&lt;/code&gt;. This is to ensure we're only copying the relevant station data over. As you'll see below, we're storing some extra data in the station document, which we don't want to duplicate.&lt;/p&gt;

&lt;p&gt;I previously mentioned that writing aggregation pipelines like these not only makes them more readable, but also allows us to test them in isolation. Here's one unit test I wrote to test the &lt;code&gt;addExtremeValues&lt;/code&gt; stage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;testAddExtremeValues&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$price1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'date'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;UTCDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DateTimeImmutable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-06-27T00:00:00+00:00'&lt;/span&gt;&lt;span class="p"&gt;))];&lt;/span&gt;
    &lt;span class="nv"&gt;$price2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'date'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;UTCDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DateTimeImmutable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-06-27T01:00:00+00:00'&lt;/span&gt;&lt;span class="p"&gt;))];&lt;/span&gt;
    &lt;span class="nv"&gt;$price3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'date'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;UTCDateTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DateTimeImmutable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-06-27T02:00:00+00:00'&lt;/span&gt;&lt;span class="p"&gt;))];&lt;/span&gt;

    &lt;span class="nv"&gt;$document&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'prices'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nv"&gt;$price1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nv"&gt;$price2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nv"&gt;$price3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'pricesByPrice'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nv"&gt;$price2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nv"&gt;$price1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nv"&gt;$price3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="nv"&gt;$pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;Stage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nv"&gt;$document&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="nc"&gt;PriceReport&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;addExtremeValues&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;iterator_to_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getTestDatabase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$pipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'typeMap'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;TYPEMAP&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;assertCount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;assertEquals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'prices'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="nv"&gt;$price1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nv"&gt;$price2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nv"&gt;$price3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'closingPrice'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'lowestPrice'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$price2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'highestPrice'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$price3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can leverage the &lt;a href="https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/documents/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;&lt;code&gt;$documents&lt;/code&gt;&lt;/a&gt; stage to create a pipeline that starts with a set of documents of our choosing, then insert the pipeline we want to test. This way, we don't have to worry about setting up a collection with test data and removing it again. The downside is that MongoDB won't be able to optimize the pipeline for index use, but since we're only testing small workloads, this is not a problem for us.&lt;/p&gt;

&lt;h3&gt;
  
  
  Initial import pipeline
&lt;/h3&gt;

&lt;p&gt;Now that we have our pipeline, we can run it against the temporary import collection and store the results in the &lt;code&gt;DailyPrice&lt;/code&gt; collection. This is easy enough, as we can use the &lt;a href="https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/merge/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;&lt;code&gt;$merge&lt;/code&gt;&lt;/a&gt; stage to write the results to that collection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;Stage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;day&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;dateTrunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;dateFieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'date'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'day'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;groupPriceReportsByStationDayFuel&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;reshapeGroupedPriceReports&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;addExtremeValues&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;lookupStation&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="nc"&gt;Stage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'DailyPrice'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;$merge&lt;/code&gt; stage is extremely powerful, as it allows us to specify the behavior when a record with the given identifier already exists. In our case, we're always inserting new data, but we could also handle the case of partial data being there (e.g., when a bucket already exists) and make sure to merge the price lists and run the above aggregation steps on the merged list. With the full import pipeline completed, we can now show why the aggregation builder is so useful. Without it, this is what the entire pipeline looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'$set'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'day'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'$dateTrunc'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'date'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$date'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'unit'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'day'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'$group'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'station'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$station'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'day'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$day'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'fuel'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$fuel'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'prices'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'$push'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'date'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$date'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$price'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'$replaceWith'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'day'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$_id.day'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'station'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$_id.station'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'fuel'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$_id.fuel'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'prices'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'$sortArray'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'input'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$prices'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'sortBy'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                        &lt;span class="s1"&gt;'date'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'pricesByPrice'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'$sortArray'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'input'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$prices'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'sortBy'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                        &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'$set'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'closingPrice'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'$getField'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'field'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'price'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'input'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                        &lt;span class="s1"&gt;'$last'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$prices'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'lowestPrice'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'$first'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$pricesByPrice'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'highestPrice'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'$last'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$pricesByPrice'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'pricesByPrice'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$$REMOVE'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'$lookup'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'as'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'station'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'from'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Station'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'localField'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'station'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'foreignField'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'pipeline'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'$project'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                        &lt;span class="s1"&gt;'_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="s1"&gt;'brand'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="s1"&gt;'location'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="s1"&gt;'address'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'$set'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'station'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'$first'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'$station'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'$match'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'station'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'$ne'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'$merge'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'into'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'DailyPrice'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sure, we could extract some parts into their own methods like we did before, but this still requires us to know about all stages, their syntax, and the types they accept. With the aggregation builder, we can leverage our IDE to provide auto-completion and type hints, making aggregation pipelines much easier to write and maintain.&lt;/p&gt;

&lt;h3&gt;
  
  
  Post-import aggregation
&lt;/h3&gt;

&lt;p&gt;Of course, getting the data into MongoDB was just the first step. We're still missing the opening price of the day, which we couldn't compute previously. With that, we can also compute the weighted average price of the day. I will spare you all the dirty details, but the aggregation framework allows us to use the &lt;a href="https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/setWindowFields/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;&lt;code&gt;$setWindowFields&lt;/code&gt;&lt;/a&gt; stage to access a field from a different document in the pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Stage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;setWindowFields&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;partitionBy&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;station&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'station._id'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;fuel&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'fuel'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;sortBy&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;day&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;openingPrice&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Accumulator&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;shift&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fieldPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'closingPrice'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;variable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'REMOVE'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, I want to partition by the station and fuel, then sort by the day. I then want to add an &lt;code&gt;openingPrice&lt;/code&gt; field using the &lt;a href="https://www.mongodb.com/docs/upcoming/reference/operator/aggregation/shift/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=scaling-symfony-mongodb&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;&lt;code&gt;$shift&lt;/code&gt;&lt;/a&gt; accumulator with a shift of &lt;code&gt;-1&lt;/code&gt;, which means that I want to access the previous document in the partition. If there is no previous document, we are again using the special &lt;code&gt;$$REMOVE&lt;/code&gt; variable to completely remove the field from the document. It may only save us 15 bytes, but when we're talking about around 18 million buckets per year, these 15 bytes still add up to around 270 MB of storage space. Speaking of saving storage space, another pattern is shortening field names. For example, instead of using &lt;code&gt;openingPrice&lt;/code&gt;, we could use &lt;code&gt;o&lt;/code&gt; as a key. I chose not to do so in this tutorial, but this is a valid solution to reduce document size. Using the Doctrine MongoDB ODM, we could still map these fields to more readable properties in our document classes, so we wouldn't even have to worry about this in the application code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running the import
&lt;/h3&gt;

&lt;p&gt;With all the aggregation pipeline written and tested on a small scale, it's now time to make our database server do some work. For maximum performance, I decided to run the import on a local MongoDB instance. This allows me to bypass the network latency and leverage the fast SSD storage on my machine. Here's the output of the import command, where I imported data for the years 2022 and 2023:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Import took 178m 51.6s: 582,944,678 documents inserted, 0 updated.
Aggregating imported price reports for further processing...Done in 122m 26.0s.
Adding opening price for each day and merge into prices collection...Done in 169m 49.2s.
Recomputing daily aggregates...Done in 43m 27.9s.
Setting last price report for stations...Done in 185m 42.0s.

Data aggregation completed in 11h 40m 16.8s.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Almost 12 hours. That's quite a long time, but considering that we've imported more than half a billion documents, this is actually not too bad. The first step is impressive: We read just under 583 million price reports from the CSV files and inserted them into the import collection in under three hours. That works out to an average of more than 54,000 documents per second. From previous testing, I know that the bottleneck here is disk I/O when reading the CSV files due to "endpoint protection software" installed on the machine. The actual write command to MongoDB was able to handle around 300,000 documents per second. That's quite the impressive performance, if you ask me!&lt;/p&gt;

&lt;p&gt;The next steps then took up the remainder of the time, although I will admit that I didn't optimize the pipeline for performance. This isn't something I'll be doing on the regular, and I'm sure that with some better index use, I would've been able to improve this. Since I simplified the station schema in the beginning and didn't explain the full pipeline for importing the stations, here's what we store in the database for a station:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$binary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"base64"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DhjQ0+04Tn+hjlB6eK2QHQ=="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"subType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"04"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Oil! Tankstelle München"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"brand"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OIL!"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"address"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"street"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Eversbuschstraße 33"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"houseNumber"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"postCode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"80999"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"city"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"München"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Point"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"coordinates"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="mf"&gt;11.4609&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="mf"&gt;48.1807&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"latestPrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"diesel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"e10"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"e5"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"latestPrices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"diesel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"e10"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"e5"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;latestPrice&lt;/code&gt; field contains the latest &lt;code&gt;DailyPrice&lt;/code&gt; document for each fuel type, while the &lt;code&gt;latestPrices&lt;/code&gt; field contains the &lt;code&gt;DailyPrice&lt;/code&gt; documents for the last 30 days. Why, you might ask? Well, in order to show station details, we now only need to find the one station among 17,000 stations that we're interested in, and we have all the data we need to show the latest price, the price history for the day, and even a chart with the lowest, highest, and average prices for the last 30 days, all without having to query our huge &lt;code&gt;DailyPrice&lt;/code&gt; collection. This is a great example of how MongoDB's document model allows us to optimize our data for the queries we want to run. And yes, we can still update this data pretty easily thanks to the power of MongoDB's query language, but let's save that for now.&lt;/p&gt;

&lt;p&gt;Each &lt;code&gt;DailyPrice&lt;/code&gt; document embedded into a station is exactly the same document we store in the &lt;code&gt;DailyPrice&lt;/code&gt; collection. This allows us to reuse our Doctrine document classes when we build our application.&lt;/p&gt;

&lt;p&gt;I also decided to store a &lt;code&gt;DailyAggregate&lt;/code&gt; document for each day, which contains some aggregates for each fuel type and day. We can use this to compare the price at a particular station against averages across all stations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$oid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"673fd3903de33b548f44d4a9"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"day"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-06-29T00:00:00.000Z"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fuel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"e5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"numChanges"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;22.3579504275982&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lowestPrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.678&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"highestPrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"weightedAveragePrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.8388233077257572&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"percentiles"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"p50"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.8306483516483516&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"p90"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.869&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"p95"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.883&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"p99"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;2.312&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that we have our data in MongoDB, let's take a look at the size of our collections and how much data we actually stored:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Station collection contains around 17,000 stations, totalling just over 1 GB of data.
&lt;/li&gt;
&lt;li&gt;The DailyPrice collection contains around 28 million documents (two full years worth of data), with 24 GB of data and a whopping 2.5 GB of indexes.
&lt;/li&gt;
&lt;li&gt;These 28 million bucket documents contain around 600 million individual prices, so we've successfully reduced the number of documents by a factor of 20.
&lt;/li&gt;
&lt;li&gt;Our DailyAggregates are just over 2,000 documents, coming in at 180 KB.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the next part, we'll start building our Symfony application to work with this data. We'll use the Doctrine MongoDB ODM to map our documents to PHP classes, and we'll use Symfony UX to build a modern frontend that allows us to interact with the data. The goal is to show how we can efficiently handle this amount of data in a Symfony application, while still providing a good user experience.&lt;/p&gt;

</description>
      <category>symfony</category>
      <category>php</category>
      <category>webdev</category>
      <category>database</category>
    </item>
    <item>
      <title>Building an Archaeology Matcher: A (Literal) Deep Dive Into Multimodal Vector Search</title>
      <dc:creator>Rishabh Bisht</dc:creator>
      <pubDate>Fri, 12 Dec 2025 14:17:25 +0000</pubDate>
      <link>https://dev.to/mongodb/building-an-archaeology-matcher-a-literal-deep-dive-into-multimodal-vector-search-2d5o</link>
      <guid>https://dev.to/mongodb/building-an-archaeology-matcher-a-literal-deep-dive-into-multimodal-vector-search-2d5o</guid>
      <description>&lt;p&gt;&lt;em&gt;The MongoDB team built a visual similarity search app for SymfonyCon. This article lays out how to build the app using Voyage AI multimodal embedding, MongoDB Atlas, and SymfonyAI.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article was written by &lt;a href="https://pauline-vos.nl/" rel="noopener noreferrer"&gt;&lt;strong&gt;Pauline Vos&lt;/strong&gt;&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you’ll learn&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What vector search and embedding models are.
&lt;/li&gt;
&lt;li&gt;How to use MongoDB to store embeddings and create vector search indexes.
&lt;/li&gt;
&lt;li&gt;How Symfony AI can help enhance your applications with AI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://live.symfony.com/2025-amsterdam-con/" rel="noopener noreferrer"&gt;SymfonyCon 2025&lt;/a&gt; is behind us, and MongoDB's PHP team is still basking in the afterglow of so many great talks and conversations at our booth. After six years, SymfonyCon returned to my home town: Amsterdam—a major European city with a rich history—and this year marks its 750-year jubilee. What better way to represent ourselves than to show off a bit of that history &lt;strong&gt;and&lt;/strong&gt; our powerful vector search capabilities?&lt;/p&gt;

&lt;h2&gt;
  
  
  First, some context
&lt;/h2&gt;

&lt;p&gt;(Or &lt;a href="https://vimeo.com/1140065884" rel="noopener noreferrer"&gt;watch the video&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;If you're ever in Amsterdam, I recommend making a stop at the &lt;a href="https://en.wikipedia.org/wiki/Rokin_metro_station" rel="noopener noreferrer"&gt;Rokin metro station&lt;/a&gt;. One of the  escalators serves as a sort of history museum: It displays a curation of archaeological finds collected during the construction of metro line 52.&lt;/p&gt;

&lt;p&gt;Not planning to come over anytime soon? Lucky for you, many of the ~140,000 recorded objects were cataloged and published on &lt;a href="https://belowthesurface.amsterdam/en/vondsten" rel="noopener noreferrer"&gt;Below the Surface&lt;/a&gt;. I've &lt;em&gt;adored&lt;/em&gt; this website since its launch in 2017, as it lets users explore a timeline of objects ranging from the early 2000s to over a hundred thousand years ago. Not only that, it offers all the data in a downloadable CSV file to use freely—and don't we love some good data at MongoDB?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Symfony app we built
&lt;/h2&gt;

&lt;p&gt;So, how to wrangle all this data? Using &lt;a href="https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-overview/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=building-archaeology-matcher&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;MongoDB Atlas&lt;/a&gt; Vector Search, Voyage AI's &lt;a href="https://docs.voyageai.com/docs/multimodal-embeddings" rel="noopener noreferrer"&gt;multimodal capabilities&lt;/a&gt;, and &lt;a href="https://github.com/symfony/ai/" rel="noopener noreferrer"&gt;Symfony AI&lt;/a&gt;, I built a little Symfony app that allows you to submit a description or image of an object, and finds the most similar object from the depths of Amsterdam's history! Pretty neat, right?&lt;/p&gt;

&lt;p&gt;Go on, &lt;a href="https://main-bvxea6i-yatli5hfzl4qo.eu-5.platformsh.site/search" rel="noopener noreferrer"&gt;try it out&lt;/a&gt; (we do not store your image).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg06xh5a4igketpv0vh2c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg06xh5a4igketpv0vh2c.png" alt="QR code that takes you to the app" width="230" height="230"&gt;&lt;/a&gt;&lt;small&gt;&lt;center&gt;&lt;em&gt;Scan if you're not on your phone&lt;/em&gt;&lt;/center&gt;&lt;/small&gt; &lt;/p&gt;

&lt;p&gt;You can also explore the repo on &lt;a href="https://github.com/paulinevos/symfonycon-2025" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the app works
&lt;/h2&gt;

&lt;p&gt;A common use of AI is similarity search. This technique uses an embedding model to take data and turn it into a vector embedding—a multi-dimensional array that records many different dimensions of said data. For instance, given one image of a parrot and one of a bicycle, a vector may record features like:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftqu2h7q7fcif6yy5xmec.png" alt="Cartoon illustration of a red macaw parrot" width="168" height="132"&gt;&lt;/th&gt;
&lt;th&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2weilepzq3ded63s26ly.png" alt="Cartoon illustration of a bicycle" width="182" height="114"&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Animal.&lt;/td&gt;
&lt;td&gt;Vehicle.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Has 2 wings.&lt;/td&gt;
&lt;td&gt;Has 2 wheels.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Has beak.&lt;/td&gt;
&lt;td&gt;Has basket.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flies.&lt;/td&gt;
&lt;td&gt;Rolls.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jungle.&lt;/td&gt;
&lt;td&gt;Urban.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mostly red.&lt;/td&gt;
&lt;td&gt;Mostly brown.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The number of data points recorded correlates to the number of dimensions the vector has, which is often several thousand.&lt;/p&gt;

&lt;p&gt;Using a similarity algorithm, we can now compare vectors to see which are mathematically most similar. This is called a vector search. Given a vector store containing embeddings for the parrot and the bicycle, we can generate another embedding for, say, an image of a duck. When we perform a vector search using the duck's vector embedding, it's statistically most likely the top search result will be a parrot.&lt;/p&gt;

&lt;p&gt;That's vector search in a nutshell! Different types of data may be used for this type of search. For instance, a blog enhanced with semantic search may embed text. A reverse image search feature will use image embeddings. In our demo use case, we use multiple data types (text from the CSV file, and images from the website), which is called "multimodal embedding."&lt;/p&gt;

&lt;p&gt;Not all embedding models support every data type. Also, different models may have different "opinions" of what constitutes similarity. This can be further affected by the specific purpose a model is trained for. All are important considerations when choosing the right model for your data and use case!&lt;/p&gt;

&lt;h2&gt;
  
  
  How it was built using MongoDB, Symfony, and Voyage AI
&lt;/h2&gt;

&lt;p&gt;There are two parts to this app: &lt;strong&gt;indexing&lt;/strong&gt; the data, and letting users perform a &lt;strong&gt;vector search&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa3o18ms0qq341pmxp7wi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa3o18ms0qq341pmxp7wi.png" alt="Diagram detailing the indexing architecture" width="800" height="463"&gt;&lt;/a&gt;&lt;small&gt;&lt;center&gt;&lt;em&gt;Indexing&lt;/em&gt;&lt;/center&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;SymfonyAI's &lt;code&gt;Store&lt;/code&gt; component offers an &lt;a href="https://symfony.com/doc/current/ai/components/store.html#indexing" rel="noopener noreferrer"&gt;Indexer&lt;/a&gt; that handles indexing. It will:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Load data from a given source (&lt;code&gt;Loader&lt;/code&gt;).
&lt;/li&gt;
&lt;li&gt;Generate vector embeddings (&lt;code&gt;Vectorizer&lt;/code&gt;).
&lt;/li&gt;
&lt;li&gt;Store the embeddings (&lt;code&gt;Store&lt;/code&gt;).&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  1. Load data
&lt;/h4&gt;

&lt;p&gt;First, we can create an implementation of &lt;code&gt;LoaderInterface&lt;/code&gt; to load all the artifacts' text and image data from the &lt;a href="https://belowthesurface.amsterdam/en/pagina/publicaties-en-datasets" rel="noopener noreferrer"&gt;CSV dataset&lt;/a&gt; and the &lt;a href="https://belowthesurface.amsterdam/en/vondsten" rel="noopener noreferrer"&gt;website&lt;/a&gt; respectively, to prepare for embedding. For the embedding input, I decided on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A concatenated string made up of all the text columns in the CSV (describing the object's function, material, date range, and more).
&lt;/li&gt;
&lt;li&gt;The object's image(s), which is one or two depending on the object.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Creating this sort of multimodal embedding will improve the similarity search by having both visual data and descriptions to go on for comparing the artifacts. Were we to only embed the images, for example, we wouldn't have data points for things like country of origin or date range.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Create vector embeddings
&lt;/h4&gt;

&lt;p&gt;Now we've gathered our inputs for our embedding model (&lt;code&gt;voyage-multimodal-3&lt;/code&gt;), we can pass them to &lt;code&gt;Vectorizer&lt;/code&gt;. To do this, we define a &lt;code&gt;Voyage&lt;/code&gt; &lt;a href="https://symfony.com/doc/current/ai/components/platform.html" rel="noopener noreferrer"&gt;Platform&lt;/a&gt; service, which we can inject into &lt;code&gt;Vectorizer&lt;/code&gt; along with the model we want to use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# config/packages/ai.yaml&lt;/span&gt;
&lt;span class="na"&gt;ai&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;platform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;voyage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;api_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%env(VOYAGE_API_KEY)%'&lt;/span&gt;
    &lt;span class="na"&gt;vectorizer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;voyage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;platform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ai.platform.voyage'&lt;/span&gt;
            &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;voyage-multimodal-3'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Vectorizer&lt;/code&gt; will handle sending all the inputs to Voyage AI (in batches), and return the resulting vector embeddings for storage.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Store the embeddings
&lt;/h4&gt;

&lt;p&gt;Now that we have our vectors, we can store them in MongoDB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: Before doing this, we have to create vector search indexes in our database. Since we're using Doctrine ODM entities with the &lt;a href="https://www.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/attributes-reference.html#vectorsearchindex" rel="noopener noreferrer"&gt;&lt;code&gt;VectorSearchIndex&lt;/code&gt;&lt;/a&gt; attribute, that's a matter of running &lt;code&gt;bin/console doctrine:mongodb:schema:create&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;While SymfonyAI comes with a MongoDB &lt;code&gt;Store&lt;/code&gt;, for this app, I made our own Doctrine ODM implementation of &lt;code&gt;StoreInterface&lt;/code&gt;. I hope to contribute one to the &lt;code&gt;Store&lt;/code&gt; component soon to make it easier to use Doctrine entities with SymfonyAI.&lt;/p&gt;

&lt;p&gt;When we wire up the &lt;code&gt;Indexer&lt;/code&gt; with these three services, it'll pass the data through each of them to complete the indexing process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# config/packages/ai.yaml&lt;/span&gt;
&lt;span class="na"&gt;ai&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

    &lt;span class="c1"&gt;# ...&lt;/span&gt;

    &lt;span class="na"&gt;indexer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;artifacts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;loader&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;App\Service\ArtefactLoader'&lt;/span&gt;
            &lt;span class="na"&gt;vectorizer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ai.vectorizer.voyage'&lt;/span&gt;
            &lt;span class="na"&gt;store&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;App\Store\DoctrineODMStore'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This configuration now enables us to run &lt;code&gt;bin/console ai:store:index artifacts&lt;/code&gt;, by the end of which all our artifacts will have multimodal embeddings stored in MongoDB Atlas.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F24sawiu9b3kvwm499q28.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F24sawiu9b3kvwm499q28.png" alt="Diagram detailing the search app architecture" width="800" height="553"&gt;&lt;/a&gt;&lt;br&gt;
&lt;small&gt;&lt;center&gt;&lt;em&gt;Performing the vector search&lt;/em&gt;&lt;/center&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;With our embedding data and search indexes set up, we're ready to start finding matches.&lt;/p&gt;
&lt;h4&gt;
  
  
  1. Submit photo and/or prompt
&lt;/h4&gt;

&lt;p&gt;In the UI, the user enters a description of what they're trying to find, takes a photo of an object, or both. For instance, a user may take a photo of a toothbrush and enter "medieval" in the text field to see if people in the Middle Ages used similar objects.&lt;/p&gt;
&lt;h4&gt;
  
  
  2. Generate image description
&lt;/h4&gt;

&lt;p&gt;If there's an image, we use one of OpenAI's GPT models to generate a description of it to embed alongside the image data. We'll need to define another &lt;code&gt;Platform&lt;/code&gt; service for OpenAI, which we can inject into our query handler.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# config/packages/ai.yaml&lt;/span&gt;
&lt;span class="na"&gt;ai&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;platform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;voyage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;api_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%env(VOYAGE_API_KEY)%'&lt;/span&gt;
        &lt;span class="na"&gt;openai&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;api_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%env(OPENAI_API_KEY)%'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# config/services.yaml&lt;/span&gt;
    &lt;span class="na"&gt;App\Query\MatchQueryHandler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;arguments&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;$openAi&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;@ai.platform.openai'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may wonder why I chose to generate a description of an image. After all, don't we already have an embedding model recording data from that same image? The main reason is that we can send instructions to GPT to only focus on the object in the photo and ignore any background, which would only serve as irrelevant noise in our similarity search. We can also ask it to focus specifically on features recorded in the CSV dataset (things like function and finish).&lt;/p&gt;

&lt;p&gt;This can add more weight to data points that we're especially interested in comparing, and &lt;em&gt;may&lt;/em&gt; improve our results. It's definitely trial and error, so I recommend playing around with different messages to see how they change results.&lt;/p&gt;

&lt;p&gt;For our app, we're sending the following to GPT:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MessageBag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;Message&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;forSystem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'You are an image analyzer bot that helps identify objects in images.'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Message&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;ofUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s1"&gt;'Describe the object in the foreground, ignoring the background or the person holding it. Try to focus primarily on the function. Color, shape, material, and finish of the object may also be included.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;$image&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. Create vector embedding
&lt;/h4&gt;

&lt;p&gt;Armed with our text description and image data, we can generate another multimodal embedding by sending it all off to Voyage AI. The resulting vector can then be used as a search query for a vector search!&lt;/p&gt;

&lt;h4&gt;
  
  
  4. Perform vector search
&lt;/h4&gt;

&lt;p&gt;For our final trick, we use the Doctrine ODM aggregation builder to execute a vector search aggregation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;documentManager&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Artefact&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;createAggregationBuilder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;hydrate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MatchCandidate&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nv"&gt;$builder&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;vectorSearch&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;numCandidates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'default'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'embeddingVector'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;queryVector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$vectorized&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getData&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;project&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'_id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'artefact'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'$$ROOT'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'score'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'vectorSearchScore'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nv"&gt;$matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$builder&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getAggregation&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This aggregation will return the 10 best matches it could find, including their scores. By default, they're ordered by score (descending), so the first match will be the most similar.&lt;/p&gt;

&lt;p&gt;Et voilà! Pass them on to a template, and the user is presented with a cool object from Amsterdam's history and some other close matches. For example, this picture of the ring my stepdad made me when I was a kid yields the following results:&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzg6ctagcmps4auuxz1e3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzg6ctagcmps4auuxz1e3.png" alt="A photograph of a hand holding a small ring. The band is silver. A square made of gold and silver sits on the band, and a small round ruby is embedded into the square." width="800" height="926"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fznn61gaslg4flh6tzrwf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fznn61gaslg4flh6tzrwf.png" alt="A screenshot of a web app showing two columns. The left column provides a description of the ring held in the previous image, and below it a picture of an old metal ring with a header saying " width="800" height="735"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;Want to see more? &lt;a href="https://main-bvxea6i-yatli5hfzl4qo.eu-5.platformsh.site/search" rel="noopener noreferrer"&gt;Try out the app&lt;/a&gt;! You can also build it yourself using &lt;a href="https://symfony.com/doc/current/ai/index.html" rel="noopener noreferrer"&gt;SymfonyAI&lt;/a&gt; and &lt;a href="https://www.doctrine-project.org/projects/doctrine-mongodb-odm/en/stable/" rel="noopener noreferrer"&gt;MongoDB&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate vector embeddings and store them in MongoDB Atlas to implement similarity search.
&lt;/li&gt;
&lt;li&gt;Combine different types of data in multimodal embeddings to enhance search results.
&lt;/li&gt;
&lt;li&gt;Symfony AI can help you tie your store and platform together to easily index and embed your data.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>symfony</category>
      <category>php</category>
      <category>ai</category>
      <category>vectordatabase</category>
    </item>
    <item>
      <title>Building With Symfony: Scaling Surfshark’s Backend</title>
      <dc:creator>Rishabh Bisht</dc:creator>
      <pubDate>Mon, 28 Jul 2025 13:00:00 +0000</pubDate>
      <link>https://dev.to/mongodb/building-with-symfony-scaling-surfsharks-backend-39pf</link>
      <guid>https://dev.to/mongodb/building-with-symfony-scaling-surfsharks-backend-39pf</guid>
      <description>&lt;p&gt;&lt;em&gt;Curious about how leading companies leverage Symfony to build and scale their platforms? In the new &lt;strong&gt;Building With Symfony&lt;/strong&gt; article series, we dive into the tech stacks of innovative organizations and share first-hand insights from their engineering teams. Discover the key tools, architecture choices, and best practices that help them deliver reliable, high-performance applications at scale.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article was contributed by &lt;a href="https://notrix.lt/" rel="noopener noreferrer"&gt;&lt;strong&gt;Vaidas Lažauskas&lt;/strong&gt;&lt;/a&gt;, the Head of Backend Engineering at Surfshark.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Launched in 2018, &lt;a href="https://surfshark.com/" rel="noopener noreferrer"&gt;Surfshark&lt;/a&gt; is a prominent cybersecurity company that provides a range of online privacy and security solutions. Best known for its VPN service, Surfshark has a team of over 500 employees spread globally and has been honored with more than 30 international awards. The company's mission is to build the most beloved security products for everyone. Surfshark has over 35 million application downloads, supported by more than 3,200 servers in 100 countries. &lt;/p&gt;

&lt;h2&gt;
  
  
  Architectural overview
&lt;/h2&gt;

&lt;p&gt;Surfshark processes over 5,000 API requests per second! It leverages a robust backend that utilizes the Symfony framework with MongoDB as its primary database. Their system is designed to handle operations at a massive scale—the largest MongoDB collection has over 45 million documents! The back end is fully structured around microservices architecture. In fact, Surfshark has never used a monolith; from the very beginning, all services have been built as microservices and all the infrastructure is deployed on AWS.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Symfony + MongoDB stack
&lt;/h3&gt;

&lt;p&gt;Surfshark’s backend is built with PHP and the Symfony framework. Since the very beginning (over seven years ago), they've integrated MongoDB using Doctrine ODM, making full use of its query builder for application development. All new services are built on this stack, with plans underway to retire MySQL entirely. The Surfshark team has also contributed pull requests to Doctrine ODM in the past and actively supports the Symfony community. Surfshark is committed to staying up-to-date with their dependencies, upgrading major versions annually and keeping libraries fresh every week. They've recently migrated from MongoDB 7 to 8, with a smooth, zero-downtime upgrade that required no application-level changes—a testament to the robustness of their integration and the power of MongoDB Atlas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;“We use Compass. It’s a really great tool for entry level developers for index management and analyzing collections.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Development at Surfshark is fully Dockerized using Docker Compose, with Mutagen syncing local files into the Docker environment for a seamless workflow. Developers predominantly use PHPStorm and work with MongoDB running in Docker containers during local development. For database management and analytics, the team enjoys a variety of tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MongoDB Compass&lt;/strong&gt;: Excellent for entry-level developers and for index/collection analytics
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MongoDB Shell&lt;/strong&gt;: Preferred for flexible scripting and rapid prototyping&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Overall thoughts on working with Symfony and MongoDB
&lt;/h2&gt;

&lt;p&gt;When Vaidas joined Surfshark, the backend relied heavily on MySQL—with multiple services built on top of it. However, as the platform scaled and handled thousands of requests per second, it experienced interruptions and errors under load. To solve this, Vaidas introduced MongoDB, and all new services immediately started using it as the primary database. Today, about 90% of Surfshark’s data is housed in MongoDB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;“The main thing every new developer notices is that there is no migration schema madness. We still have MySQL for some legacy services, so we need to run migrations, using Percona toolkit, whenever we have to add a new field to the schema. We get thousands of requests per second. Just renaming the table will cause us to lose user requests because of errors that might appear while migrating. The new devs are amazed that there is no fixed schema with MongoDB. You make a change and just push to production.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They also appreciate that schema and index changes can be performed without interrupting service, which is crucial for a platform serving thousands of requests per second.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;“Creating an index is also seamless. You pass the new index and it just works without any interruption.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Expiring data like auth tokens is easy and robust. They found MongoDB’s TTL handling superior to other solutions they have used, as the data is truly physically removed on schedule.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;“TTL index is also a great feature. Many other databases have this feature but it works so much better in MongoDB because the data is always physically deleted right away.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Surfshark’s story shows how Symfony and MongoDB can work together to power a fast-growing, reliable backend. By focusing on flexibility, easy scaling, and solid developer tools, they’ve built a system ready for millions of users. If you’d like to learn more about modern backend architecture or see how other companies use Symfony in production, stay tuned for the next articles in this series. Curious about integrating Symfony and MongoDB in your own projects? Check out our &lt;a href="https://www.mongodb.com/docs/drivers/php-frameworks/symfony/?utm_campaign=devrel&amp;amp;utm_source=third-party-content&amp;amp;utm_medium=cta&amp;amp;utm_content=symfony-surfshark&amp;amp;utm_term=megan.grant" rel="noopener noreferrer"&gt;Symfony MongoDB integration&lt;/a&gt; to get started!  &lt;/p&gt;

</description>
      <category>symfony</category>
      <category>php</category>
      <category>webdev</category>
      <category>database</category>
    </item>
    <item>
      <title>Best Database for C++ Development</title>
      <dc:creator>Rishabh Bisht</dc:creator>
      <pubDate>Thu, 12 Jan 2023 19:32:46 +0000</pubDate>
      <link>https://dev.to/therish2050/best-database-for-c-development-1fc</link>
      <guid>https://dev.to/therish2050/best-database-for-c-development-1fc</guid>
      <description>&lt;p&gt;C++, &lt;a href="https://www.tiobe.com/tiobe-index/" rel="noopener noreferrer"&gt;TIOBE’s&lt;/a&gt; most popular language for 2022,  is a powerful &amp;amp; high-performance programming language that is widely used for a variety of different purposes. One area where C++ is particularly useful is in the development of database-driven applications.&lt;/p&gt;

&lt;p&gt;Are you wondering what is the best database for C++ application you are building? In this article, we will take a look at the databases used by C++ developers. The source of the data is Stack Overflow survey 2022 raw data, filtered for C++ developers. There were over 70,000 developers who responded to this survey, and 16,024 of those had experience working with C++.&lt;/p&gt;

&lt;h2&gt;
  
  
  So, which database do C++ developers use?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fda75533rajdf7sdj1my3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fda75533rajdf7sdj1my3.png" alt="Best Database for C++" width="800" height="592"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MySQL (44.2%) and PostgreSQL (34.2%)&lt;/strong&gt;&lt;br&gt;
MySQL and PostgreSQL are open-source relational database management systems, which organize the data into tables with rows and columns. C++ developers can interact with these databases using libraries such as MySQL Connector/C++ and libpqxx (PostgreSQL C++ library).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQLite (37.4%)&lt;/strong&gt;&lt;br&gt;
SQLite is a small, lightweight, and self-contained SQL database engine that can be embedded in applications and used to store and retrieve data. C++ developers can use SQLite C++ library such as sqlite3pp, SQLiteCpp to work with SQLite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MongoDB (26.9%)&lt;/strong&gt;&lt;br&gt;
MongoDB is the most popular NoSQL database used by C++ developers based on the survey results.It is a document based database, designed to handle large amounts of unstructured data and can handle a very high volume of read and write operations with a wide variety of use cases. C++ developers can interact with MongoDB by using the MongoDB C++ Driver.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which database do C++ developers want to use?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzooc9laowjjo1v5lijr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzooc9laowjjo1v5lijr.png" alt="Database C++ developers want to use" width="800" height="604"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Stackoverflow survey also reports data on the database that C++ developers want to use. Here we can see PostgreSQL and MongoDB climbing up the ladder, but overall the result is similar.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL (30.0 %)&lt;/li&gt;
&lt;li&gt;SQLite (25.1%)&lt;/li&gt;
&lt;li&gt;MongoDB (23.96%)&lt;/li&gt;
&lt;li&gt;MySQL (23.66%)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>writing</category>
    </item>
  </channel>
</rss>
