<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nejc Korasa</title>
    <description>The latest articles on DEV Community by Nejc Korasa (@nejckorasa).</description>
    <link>https://dev.to/nejckorasa</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F21834%2F441c6680-8e6d-4b54-b6a4-606db581d956.jpeg</url>
      <title>DEV Community: Nejc Korasa</title>
      <link>https://dev.to/nejckorasa</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nejckorasa"/>
    <language>en</language>
    <item>
      <title>Kafka Backfill Patterns: A Guide to Accessing Historical Data</title>
      <dc:creator>Nejc Korasa</dc:creator>
      <pubDate>Tue, 04 Nov 2025 10:00:00 +0000</pubDate>
      <link>https://dev.to/nejckorasa/kafka-backfill-patterns-a-guide-to-accessing-historical-data-53ep</link>
      <guid>https://dev.to/nejckorasa/kafka-backfill-patterns-a-guide-to-accessing-historical-data-53ep</guid>
      <description>&lt;p&gt;Event-driven architectures with Kafka have become a standard way of building modern microservices. At first, everything works smoothly - services communicate via events, state is rebuilt from event streams, and the system scales well. But as your data grows, you face an inevitable challenge: what happens when you need to access historical events that are no longer in Kafka?&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Problem: Finite Retention &amp;amp; The Need for Backfills
&lt;/h2&gt;

&lt;p&gt;In a perfect world, we would keep every event log in Kafka forever. In the real world, however, storing an ever-growing history on high-performance broker disks is prohibitively expensive.&lt;/p&gt;

&lt;p&gt;This leads to the inevitable compromise: &lt;strong&gt;data retention policies&lt;/strong&gt;. We keep a few weeks or months of events in Kafka for real-time processing and offload the rest to cheaper, long-term cold storage like Amazon S3. This process becomes part of a general Data Lake sink strategy.&lt;/p&gt;

&lt;p&gt;This works well until a scenario arises that demands access to the full historical record, for example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bootstrapping a New Service:&lt;/strong&gt; A new microservice needs to build its own materialized view of the world by processing the entire history of events.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recovering from a Bug:&lt;/strong&gt; A subtle bug is discovered in a service, and you need to rebuild its state from a point in time months ago.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enriching Data for New Features:&lt;/strong&gt; A new feature requires historical context, forcing a re-process of old events.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core problem is the same: how do we gracefully rehydrate our services with data that now lives in cold storage?&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The Backfill Blueprint: A Two-Phase Process
&lt;/h2&gt;

&lt;p&gt;Backfill can be broken down into two distinct phases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Phase 1: Sourcing the Data:&lt;/strong&gt; First, we must establish a reliable way to get the stream of historical events from cold storage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phase 2: Consuming the Data:&lt;/strong&gt; Second, we need a robust strategy for our service to process this historical stream safely, without disrupting live traffic.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Phase 1: Sourcing Historical Data
&lt;/h2&gt;

&lt;p&gt;There are three primary architectural patterns for sourcing historical data that is no longer in Kafka.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 1: Kafka Tiered Storage
&lt;/h3&gt;

&lt;p&gt;The most elegant solution is one that eliminates the need for a separate ETL process: using a Kafka distribution that supports &lt;a href="https://docs.confluent.io/platform/current/kafka/tiered-storage.html" rel="noopener noreferrer"&gt;Tiered Storage&lt;/a&gt;. This feature allows Kafka to automatically move older event segments to object storage like S3, while the topic's log remains logically intact and infinitely queryable. The data is physically in two places, but Kafka presents it as a single, seamless stream.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2: ETL Bridge
&lt;/h3&gt;

&lt;p&gt;If you don’t have Tiered Storage, you need a safe, reliable bridge between your S3 data lake and Kafka. The core of this pattern is a generic, on-demand ETL job (AWS Glue or Spark is a perfect fit) that reads from S3 and produces it onto a &lt;strong&gt;dedicated, temporary backfill topic&lt;/strong&gt; (e.g., &lt;code&gt;events.backfill&lt;/code&gt;). This isolates the historical load from the live stream, preventing disruption to real-time consumers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Handling Schema Evolution:&lt;/strong&gt; Using a schema registry, the ETL job can perform a "schema-on-read" transformation. It reads multiple historical Avro schema versions from S3, evolves each record to the latest schema version, and writes the clean data to the backfill topic. This means the service consumer only needs to be aware of the latest schema.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0tdsm8qzk6tfvephcrk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0tdsm8qzk6tfvephcrk.png" alt="Glue Backfill Job" width="800" height="392"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3: Pull-Based Backfill (Bypassing Kafka)
&lt;/h3&gt;

&lt;p&gt;In some scenarios, re-populating a Kafka topic is unnecessary overhead. Instead, the service needing the data can fetch it directly from its long-term storage location. &lt;/p&gt;

&lt;p&gt;This pattern simplifies the data platform but shifts complexity to the consuming service. It must now contain logic to read from two different sources, merge the streams, and handle potential event ordering conflicts. Unless you can afford to isolate the backfill and run it before the service goes live.&lt;/p&gt;

&lt;h4&gt;
  
  
  Alternative A: Direct Lake Query
&lt;/h4&gt;

&lt;p&gt;If you have query engines like &lt;a href="https://trino.io/" rel="noopener noreferrer"&gt;Trino&lt;/a&gt; set up, your service can bypass Kafka for historical data. It can implement a job that directly queries S3 via Trino, fetching and processing data in controlled chunks.&lt;/p&gt;

&lt;h4&gt;
  
  
  Alternative B: Service-to-Service Backfill
&lt;/h4&gt;

&lt;p&gt;When historical data still resides in the source service's live database. The source service provides a paginated API, allowing the consuming service to pull the history in manageable batches.&lt;/p&gt;

&lt;p&gt;While often faster to set up, this approach puts a direct and heavy read load on a live production service. This can degrade the source service's performance, so mitigation is essential;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Control the Load: Throttling, rate-limiting.&lt;/li&gt;
&lt;li&gt;Schedule Wisely: Run the backfill during off-peak hours if possible.&lt;/li&gt;
&lt;li&gt;Isolate the Impact: Scale resources accordingly, use a database read replica if possible.&lt;/li&gt;
&lt;li&gt;Monitor&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. Phase 2: Consuming Historical Data
&lt;/h2&gt;

&lt;p&gt;Getting the data is only half the challenge. The consuming service must be architected to handle rehydration safely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Idempotent Processing is Non-Negotiable
&lt;/h3&gt;

&lt;p&gt;When a service re-processes historical events, it will inevitably encounter data it has already seen. The consumer logic must be &lt;strong&gt;idempotent&lt;/strong&gt;, meaning that processing the same event multiple times produces the same result as processing it once. This is the foundational prerequisite for any safe backfill strategy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choose Your Consumption Strategy
&lt;/h3&gt;

&lt;h4&gt;
  
  
  A. The Simple Replay
&lt;/h4&gt;

&lt;p&gt;For many use cases, like enriching data for an analytics model or rebuilding a non-critical cache, the strategy is simple. A dedicated consumer reads from the backfill source until it is empty. The job is complete when all historical data has been processed. This approach is perfect for stateless tasks or systems that can afford a brief maintenance window to switch over.&lt;/p&gt;

&lt;h4&gt;
  
  
  B. The Zero-Downtime Migration (The Shadow Pattern)
&lt;/h4&gt;

&lt;p&gt;For critical, stateful services that cannot have downtime, a more sophisticated strategy is required. This strategy rebuilds a system using the &lt;strong&gt;Shadow Migration&lt;/strong&gt; pattern. It's a specific implementation of &lt;strong&gt;&lt;a href="https://martinfowler.com/bliki/ParallelChange.html" rel="noopener noreferrer"&gt;Parallel Change&lt;/a&gt;&lt;/strong&gt;, sometimes called the &lt;strong&gt;&lt;a href="https://www.infoq.com/articles/shadow-table-strategy-data-migration/" rel="noopener noreferrer"&gt;Shadow Table Strategy&lt;/a&gt;&lt;/strong&gt;, where a "shadow" process runs alongside the live service before a final, coordinated cutover.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flqmfd8opvg7tkndd3w01.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flqmfd8opvg7tkndd3w01.png" alt="Shadow Migration" width="800" height="846"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Run in Parallel&lt;/strong&gt;: A &lt;strong&gt;"shadow" consumer&lt;/strong&gt; reads the entire event history, writing to the new table (&lt;code&gt;v2&lt;/code&gt;). Simultaneously, the existing "live" consumer continues its normal operation, writing only to the old table (&lt;code&gt;v1&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Catch Up&lt;/strong&gt;: The shadow consumer runs until it has processed all historical data and is keeping up with the live topic in near real-time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Verify Consistency&lt;/strong&gt;: Run validation jobs to ensure data in &lt;code&gt;v2&lt;/code&gt; is consistent with &lt;code&gt;v1&lt;/code&gt;. This critical go/no-go step confirms that the migration is safe to complete.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Execute the Cutover&lt;/strong&gt;: The final switch can be handled in two ways, depending on the system's downtime tolerance.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A. Hard Cutover (Simpler/Faster)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For systems that can tolerate a brief service pause, you can skip dual writes. This involves stopping the live consumer, reconfiguring it to write &lt;strong&gt;only to &lt;code&gt;v2&lt;/code&gt;&lt;/strong&gt;, and restarting it at the same time you repoint the application's reads to &lt;code&gt;v2&lt;/code&gt;. This must be a single, atomic action.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;B. Dual-Write Cutover (Safer/Zero-Downtime)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For critical systems, reconfigure the live consumer to &lt;strong&gt;write to both &lt;code&gt;v1&lt;/code&gt; and &lt;code&gt;v2&lt;/code&gt;&lt;/strong&gt;. This keeps both tables perfectly in sync, creating a safe, indefinite window to verify &lt;code&gt;v2&lt;/code&gt; under a live load before repointing the application reads at your leisure.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Decommission&lt;/strong&gt;: After a period of monitoring the new table, the process is complete. If you used the dual-write method, reconfigure the consumer one last time to write only to &lt;code&gt;v2&lt;/code&gt;. Finally, remove the old &lt;code&gt;v1&lt;/code&gt; table and any legacy code.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To prevent any missed events during the handoff, the &lt;strong&gt;live consumer should rewind its offset&lt;/strong&gt; to slightly before where the shadow consumer finished. This creates a small, intentional overlap of events. For this reason, &lt;strong&gt;idempotent processing&lt;/strong&gt; is absolutely essential, as it allows the system to handle these duplicates gracefully without corrupting data.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Optimizing the Backfill with Snapshots
&lt;/h2&gt;

&lt;p&gt;Replaying every event from the beginning of time can be slow. For many use cases, you can accelerate the process by using a &lt;strong&gt;snapshot&lt;/strong&gt;—a precomputed, materialized state of your data.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;State Snapshots:&lt;/strong&gt; A periodically generated full snapshot of an entity's state. Rehydration then involves loading this snapshot and replaying only the events from Kafka that have occurred &lt;em&gt;since&lt;/em&gt; the snapshot was created.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Kafka-Native Snapshots (Log Compaction):&lt;/strong&gt; For services that only need the &lt;em&gt;current state&lt;/em&gt; of an entity, Kafka's &lt;a href="https://docs.confluent.io/kafka/design/log_compaction.html" rel="noopener noreferrer"&gt;log compaction&lt;/a&gt; provides a powerful, built-in solution. A compacted topic retains at least the last known value for each message key. Reading this topic from the beginning provides a consumer with a full, live snapshot of the current state.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In short: use State Snapshots when you need a point-in-time view plus the full event history that followed; use Log Compaction when you only need the latest value for every entity, not their history.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Execution
&lt;/h2&gt;

&lt;p&gt;A successful backfill requires more than a solid architectural blueprint; it demands disciplined execution. Some operational best practices to mitigate risk and ensure a predictable outcome;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring and Observability:&lt;/strong&gt; A backfill should never be a "black box." Track key metrics like consumer lag, processing throughput, and resource utilization in real-time. This is the only way to detect bottlenecks or failures before they cascade.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resilience and Failure Handling:&lt;/strong&gt; The process must be resumable. Large backfills can take hours or days, and failures are inevitable. By tracking progress, it can resume where it left off, saving significant time and resources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Awareness:&lt;/strong&gt; A large-scale data replay can incur significant costs from compute resources (ETL jobs, consumer pods) and cloud data egress. Model these costs beforehand to avoid budget surprises.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental Testing:&lt;/strong&gt; Naturally don't run a full-scale backfill for the first time in production. Validate the entire process with a small, representative slice of data in a staging environment to catch logical errors and performance issues early.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ultimately, a historical data backfill is a planned, two-phase process for sourcing and consuming historical data. It can be done in a controlled and repeatable manner When you combine the right architectural patterns with operational best practices.&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>eventdriven</category>
      <category>distributedsystems</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Data Oriented Programming in Java</title>
      <dc:creator>Nejc Korasa</dc:creator>
      <pubDate>Sun, 20 Apr 2025 14:22:20 +0000</pubDate>
      <link>https://dev.to/nejckorasa/data-oriented-programming-in-java-mkb</link>
      <guid>https://dev.to/nejckorasa/data-oriented-programming-in-java-mkb</guid>
      <description>&lt;ul&gt;
&lt;li&gt;What is Data Oriented Programming?&lt;/li&gt;
&lt;li&gt;Why Consider DOP? The Benefits&lt;/li&gt;
&lt;li&gt;Java's Embrace of Data&lt;/li&gt;
&lt;li&gt;
Textbook Example

&lt;ul&gt;
&lt;li&gt;Introducing New Behavior&lt;/li&gt;
&lt;li&gt;Introducing New Data&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Handling Outcomes with Clarity&lt;/li&gt;

&lt;li&gt;In Conclusion: Clear Benefits of DOP and Modern Java&lt;/li&gt;

&lt;li&gt;References and Further Reading&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is Data Oriented Programming?
&lt;/h2&gt;

&lt;p&gt;Data Oriented Programming (DOP) is gaining momentum in the Java ecosystem due to recent language features streamlining its adoption. While conceptually straightforward, DOP offers significant advantages. But what is it?&lt;/p&gt;

&lt;p&gt;How do we build our objects? Where does the state go? Where does the behavior go? OOO encourages us to bundle state and behavior together. But what if we separated this? What if data became the primary focus, with logic completely separated? This is the central idea of Data Oriented Programming (DOP), simple.&lt;/p&gt;

&lt;p&gt;So instead of emphasizing objects with bundled state and methods, &lt;strong&gt;DOP centers around simple data structures&lt;/strong&gt;. The application's logic and behavior are implemented as independent functions that operate on this data. The data itself is passive; the intelligence lies in the functions. &lt;a href="https://inside.java/2024/05/23/dop-v1-1-introduction/" rel="noopener noreferrer"&gt;Inside Java&lt;/a&gt; defines DOP with the following principles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model data immutably and transparently.&lt;/li&gt;
&lt;li&gt;Model the data, the whole data, and nothing but the data.&lt;/li&gt;
&lt;li&gt;Make illegal states unrepresentable.&lt;/li&gt;
&lt;li&gt;Separate operations from data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Consider DOP? The Benefits
&lt;/h2&gt;

&lt;p&gt;Why might you choose this approach? Here are a few compelling reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simpler and More Readable Code:&lt;/strong&gt; Separating data from behavior leads to clearer data structures and focused functions, making the code easier to understand and follow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved Maintainability:&lt;/strong&gt; With simple data structures and distinct logic, modifications are less likely to create ripple effects across the codebase.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced Code Optionality and Reduced Coupling:&lt;/strong&gt; Adding new functionality often involves creating new functions rather than modifying existing data structures, leading to less invasive changes and reduced coupling between different parts of the system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easier Testing:&lt;/strong&gt; Functions operating on plain data with simple inputs and outputs are often easier to test in isolation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Java's Embrace of Data
&lt;/h2&gt;

&lt;p&gt;Modern Java provides excellent tools that make DOP a viable option:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Records:&lt;/strong&gt; Simple, immutable data carriers. Less boilerplate, letting you focus on the data itself.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;Point&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sealed Classes:&lt;/strong&gt; These allow you to restrict the possible subtypes of a class or interface. This is crucial for ensuring you can have exhaustive knowledge of the data you're dealing with.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;sealed&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;Shape&lt;/span&gt; &lt;span class="n"&gt;permits&lt;/span&gt; &lt;span class="nc"&gt;Circle&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Rectangle&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;Circle&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Point&lt;/span&gt; &lt;span class="n"&gt;center&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;radius&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;Shape&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;Rectangle&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Point&lt;/span&gt; &lt;span class="n"&gt;topLeft&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Point&lt;/span&gt; &lt;span class="n"&gt;bottomRight&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;Shape&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;Triangle&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Point&lt;/span&gt; &lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Point&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Point&lt;/span&gt; &lt;span class="n"&gt;p3&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;Shape&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Switch and Pattern Matching with Exhaustiveness Checks:&lt;/strong&gt; This is main one, the one that closes the loop and brings the main advantage. The enhanced &lt;code&gt;switch&lt;/code&gt; statement in Java, with its support for pattern matching, works hand in hand with sealed classes. The compiler helps you with exhaustiveness checks, shifting runtime errors to compile time. This is not limited to just sealed classes, pure enums also work.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;numOfEdgesCircle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nc"&gt;Circle&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nc"&gt;Rectangle&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nc"&gt;Triangle&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Textbook Example
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Introducing New Behavior
&lt;/h3&gt;

&lt;p&gt;Consider the &lt;code&gt;Shape&lt;/code&gt; example. In a traditional OOP approach, you might add a &lt;code&gt;getCenter()&lt;/code&gt; method to the &lt;code&gt;Shape&lt;/code&gt; interface and implement it in each concrete shape class. If you later needed to perform a new operation or modify an existing one, you'd likely need to update the &lt;code&gt;Shape&lt;/code&gt; interface and all its implementations, which can lead to tightly coupled code.&lt;/p&gt;

&lt;p&gt;With DOP, we define the data structures and then create separate functions to operate on them. This separation of concerns makes adding new functionality cleaner and less coupled. Here's how the &lt;code&gt;getCenter&lt;/code&gt; function looks in a DOP style:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Point&lt;/span&gt; &lt;span class="nf"&gt;getCenter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Shape&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;switch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nf"&gt;Circle&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Point&lt;/span&gt; &lt;span class="n"&gt;center&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Point&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;center&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;x&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;center&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nf"&gt;Rectangle&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Point&lt;/span&gt; &lt;span class="n"&gt;topLeft&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Point&lt;/span&gt; &lt;span class="n"&gt;bottomRight&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Point&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topLeft&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;bottomRight&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;x&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topLeft&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;bottomRight&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
            &lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nf"&gt;Triangle&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Point&lt;/span&gt; &lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Point&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Point&lt;/span&gt; &lt;span class="n"&gt;p3&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Point&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;p3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;x&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;y&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;p3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
            &lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;};&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The enhanced &lt;code&gt;switch&lt;/code&gt; statement, combined with sealed classes, ensures that all possible cases are handled at compile time. If you are familiar with programming design patterns, this makes the &lt;a href="https://refactoring.guru/design-patterns/visitor" rel="noopener noreferrer"&gt;Visitor Pattern&lt;/a&gt; redundant. The new features simplify similar scenarios dramatically by allowing you to handle all cases directly in a type-safe and concise manner.&lt;/p&gt;

&lt;h3&gt;
  
  
  Introducing New Data
&lt;/h3&gt;

&lt;p&gt;While DOP simplifies introducing new behavior, it also ensures consistency when introducing new data types. The compiler enforces the implementation of all missing operations, ensuring your code remains consistent and complete. This is one of the most powerful advantages of these new Java features.&lt;/p&gt;

&lt;p&gt;For instance, if you add a new &lt;code&gt;Pentagon&lt;/code&gt; shape, the compiler will flag the switch statement in the &lt;code&gt;getCenter()&lt;/code&gt; method as incomplete, requiring you to implement the logic for the new shape. This compile-time enforcement not only prevents runtime errors but also ensures that your codebase evolves safely and predictably as new data types are added.&lt;/p&gt;

&lt;p&gt;However, it's important to avoid using a default branch in your switch statements. A default branch bypasses the exhaustiveness checks provided by the compiler, which can lead to missed cases and potential bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Outcomes with Clarity
&lt;/h2&gt;

&lt;p&gt;Data Oriented Programming also lends itself well in scenarios where clear and explicit handling of outcomes is required, such as processing different types of results or managing errors/failures. Consider this example for handling the result of a &lt;code&gt;process&lt;/code&gt; function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;sealed&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nc"&gt;Ok&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Does some actual processing…&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;operationSuccessful&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Success return value"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Processing error"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nf"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;IllegalStateException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Processing error: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach allows callers to easily handle all possible results using a &lt;code&gt;switch&lt;/code&gt;expression, promoting explicit and type-safe result processing. It eliminates ambiguity about potential return values and encourages explicit error handling. &lt;/p&gt;

&lt;p&gt;See &lt;a href="https://www.infoq.com/articles/data-oriented-programming-java/" rel="noopener noreferrer"&gt;this InfoQ article&lt;/a&gt; with some more examples of complex return types and how they can be implemented in DOP style.&lt;/p&gt;

&lt;h2&gt;
  
  
  In Conclusion: Clear Benefits of DOP and Modern Java
&lt;/h2&gt;

&lt;p&gt;By focusing on data and keeping it separate from business logic and processing, Data Oriented Programming together with modern Java offer some great advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simpler and More Readable Code:&lt;/strong&gt; Easier to understand and follow due to the separation of concerns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved Maintainability:&lt;/strong&gt; Modifications are less likely to have widespread impact.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced Code Optionality and Reduced Coupling:&lt;/strong&gt; Adding new features is less invasive and reduces dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easier Testing:&lt;/strong&gt; Functions operating on plain data are more straightforward to test.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Keeps Data Clean and Decoupled from Business Logic.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safer (and Cheaper) to Refactor and Change:&lt;/strong&gt; Minimizing coupling reduces the cost of future changes, as explained in &lt;a href="https://www.oreilly.com/library/view/tidy-first/9781098151232/" rel="noopener noreferrer"&gt;Tidy First? By Kent Beck&lt;/a&gt;, cost of software is approximately the same as the cost of changing it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  References and Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://inside.java/2024/05/23/dop-v1-1-introduction/" rel="noopener noreferrer"&gt;Inside Java DOP v1.1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.infoq.com/articles/data-oriented-programming-java/" rel="noopener noreferrer"&gt;Data Oriented Programming in Java InfoQ Article&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=UQAw3pvZPCY" rel="noopener noreferrer"&gt;Data-Oriented Programming in Java on YouTube&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=8FRU_aGY4mY" rel="noopener noreferrer"&gt;Data Oriented Programming in Java 21 by Nicolai Parlog on YouTube&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Do You Think?
&lt;/h2&gt;

&lt;p&gt;Have you tried Data Oriented Programming in your Java projects? What challenges or benefits have you experienced? Share your thoughts in the comments or reach out to discus.&lt;/p&gt;

</description>
      <category>java</category>
      <category>designpatterns</category>
      <category>oop</category>
      <category>dop</category>
    </item>
    <item>
      <title>Idempotent Processing with Kafka</title>
      <dc:creator>Nejc Korasa</dc:creator>
      <pubDate>Sun, 12 Feb 2023 12:00:00 +0000</pubDate>
      <link>https://dev.to/nejckorasa/idempotent-processing-with-kafka-11l5</link>
      <guid>https://dev.to/nejckorasa/idempotent-processing-with-kafka-11l5</guid>
      <description>&lt;ul&gt;
&lt;li&gt;Duplicate Messages are Inevitable&lt;/li&gt;
&lt;li&gt;Understanding the Intricacies of exactly-once semantics in Kafka&lt;/li&gt;
&lt;li&gt;
Achieving Idempotent Processing with Kafka

&lt;ul&gt;
&lt;li&gt;Idempotent Consumer Pattern&lt;/li&gt;
&lt;li&gt;Ordering of Messages&lt;/li&gt;
&lt;li&gt;Retry Handling&lt;/li&gt;
&lt;li&gt;Idempotent Processing and External Side Effects&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

Publishing Output Messages to Kafka and Maintaining Data Consistency

&lt;ul&gt;
&lt;li&gt;The Simplest Solution&lt;/li&gt;
&lt;li&gt;Transactional Outbox Pattern&lt;/li&gt;
&lt;li&gt;Without Transactional Outbox&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;How it compares to Synchronous REST APIs&lt;/li&gt;

&lt;li&gt;Final Thoughts&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Duplicate Messages are Inevitable
&lt;/h2&gt;

&lt;p&gt;Duplicate messages are an inherent aspect of message-based systems and can occur for various reasons. In the context of Kafka, it is essential to ensure that your application is able to handle these duplicates effectively. As a Kafka consumer, there are several scenarios that can lead to the consumption of duplicate messages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There can be an actual duplicate message in the kafka topic you are consuming from. The consumer is reading 2 different messages that should be treated as duplicates.&lt;/li&gt;
&lt;li&gt;You consume the same message more than once due to various error scenarios that can happen, either in your application, or in the communication with a Kafka broker.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To ensure the idempotent processing and handle these scenarios, it's important to have a proper strategy to detect and handle duplicate messages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Intricacies of exactly-once semantics in Kafka
&lt;/h2&gt;

&lt;p&gt;Kafka offers different message delivery guarantees, or &lt;a href="https://kafka.apache.org/documentation/#semantics" rel="noopener noreferrer"&gt;delivery semantics&lt;/a&gt;, between producers and consumers, namely &lt;em&gt;at-least-once&lt;/em&gt;, &lt;em&gt;at-most-once&lt;/em&gt; and &lt;em&gt;exactly-once&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Exactly-once would seem like an obvious choice to guard against duplicate messages, but it not that simple and the devil is in the details. Confluent has spent a lot of resources to deliver exactly-once delivery guarantee, and you can read &lt;a href="https://www.confluent.io/en-gb/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/" rel="noopener noreferrer"&gt;here&lt;/a&gt; on how it works in detail. It requires enabling specific Kafka features (i.e. Idempotent Producer and Kafka Transactions). &lt;/p&gt;

&lt;p&gt;First of all, it is only applicable in an application that consumes a Kafka message, does some processing, and writes a resulting message to a Kafka topic. Exactly-once messaging semantics ensures the &lt;strong&gt;combined&lt;/strong&gt; outcome of multiple steps will happen exactly-once. Key word here is combined. A message will be consumed, processed, and resulting messages produced, exactly-once. &lt;/p&gt;

&lt;p&gt;Critical points to understand about exactly-once delivery are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;All other actions occurring as part of the processing can still happen multiple times, if the original message is re-consumed&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The guarantee only covers resulting messages from the processing to be written exactly once, so downstream transaction aware consumers will not have to handle duplicates. Hence, each individual action (internal or external) still needs to be processed in an idempotent fashion to ensure real end-to-end exactly once processing. Application may need to, for example, perform REST calls to other applications, write to the database etc.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;All participating consumers and producers need to be configured correctly&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kafka exactly-once semantics is achieved by enabling &lt;a href="https://www.conduktor.io/kafka/idempotent-kafka-producer" rel="noopener noreferrer"&gt;Kafka Idempotent Producers&lt;/a&gt; and &lt;a href="https://www.confluent.io/en-gb/blog/transactions-apache-kafka/" rel="noopener noreferrer"&gt;Kafka Transactions&lt;/a&gt; in &lt;strong&gt;all&lt;/strong&gt; consumers and producers involved. That includes the upstream producer and downstream consumers from the perspective of you application. If you are using Event-driven architecture to implement inter-service communication in your system, it is likely that you will consume messages you don't control, or own. Kafka topic is just your asynchronous API you are a consumer of. The topic and the producer can be owned by another team or a 3rd party. Similarly, you may not control downstream consumers. To add to the first point, outbound messages can still be written to the topic multiple times before being successfully committed, it is the responsibility of any downstream consumers to only read committed messages (i.e. be transaction aware) in order to meet the exactly-once guarantee.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;It comes with a performance impact&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Exactly-once delivery comes with a performance overhead. There are simply more steps involved for a single kafka message to be processed (e.g. Kafka performs a two-phase commit to support transactions) and that results in lower throughput and increased latency.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, it's often much simpler, and more common, to settle for at-least-once semantic and just de-duplicate messages on the consumer side. Especially in cases where application processing is either expensive, or more involved and consists of other actions (e.g. REST calls and DB writes). It's important to remember there is a transaction boundary gap between a DB transaction and a Kafka transaction, more on that later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Achieving Idempotent Processing with Kafka
&lt;/h2&gt;

&lt;p&gt;This will depend on the nature of processing, and on the shape of the output. To enable idempotent processing, the trigger for the processing - whether it be a Kafka message or an HTTP request - must carry a unique identifier (i.e. an idempotency key).&lt;/p&gt;

&lt;h3&gt;
  
  
  Idempotent Consumer Pattern
&lt;/h3&gt;

&lt;p&gt;An &lt;a href="https://microservices.io/patterns/communication-style/idempotent-consumer.html" rel="noopener noreferrer"&gt;Idempotent Consumer Pattern&lt;/a&gt; ensures that a Kafka consumer can handle duplicate messages correctly. Consumer can be made idempotent by recording in the database the IDs of the messages that it has processed successfully. When processing a message, a consumer can detect and discard duplicates by querying the database. To illustrate that with pseudocode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;kafkaMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;consume&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isDuplicate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaMessage&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processMessageIdempotently&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaMessage&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;updateAndRecordProcessed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;kafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;commitOffset&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaMessage&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Ordering of Messages
&lt;/h3&gt;

&lt;p&gt;Choosing an appropriate topic key can help to ensure ordering guarantees within the same Kafka partition. For example, if messages are being processed in the context of a customer, using a customer ID as the topic key will ensure that messages for any individual customer will always be processed in the correct order.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retry Handling
&lt;/h3&gt;

&lt;p&gt;Kafka's offset commits can be used to create a "transaction boundary" (not to be confused with Kafka transactions mentioned before) for retrying message processing in case of failure. The same message can then be consumed again until the consumer offset is committed. Retry handling is a complex topic and various strategies can be employed depending on the specific requirements of the application. Confluent has written about &lt;a href="https://www.confluent.io/en-gb/blog/error-handling-patterns-in-kafka/" rel="noopener noreferrer"&gt;Kafka Error Handling Patterns&lt;/a&gt; that can be used to handle retries in a Kafka-based application.&lt;/p&gt;

&lt;h3&gt;
  
  
  Idempotent Processing and External Side Effects
&lt;/h3&gt;

&lt;p&gt;As mentioned before, there is no exactly-once guarantee for application processing. All actions occurring as part of the processing, and all external side effects, can still happen multiple times. For example, in case of REST calls to other services, calls themselves need to be idempotent, and the same idempotency key needs to be relayed over to those calls. Similarly, all database writes need to be idempotent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Publishing Output Messages to Kafka and Maintaining Data Consistency
&lt;/h2&gt;

&lt;p&gt;When it comes to publishing messages back to Kafka after processing is complete, the complexity increases. In a Microservices architecture, services along with updating their own local data store they often need to notify other services within the organization of changes that have occurred. This is where event-driven architecture shines, allowing individual services to publish changes as events to a Kafka topic that can be consumed by other services. But how can this be achieved in a way that ensures data consistency and enables idempotent processing?&lt;/p&gt;

&lt;h3&gt;
  
  
  The Simplest Solution
&lt;/h3&gt;

&lt;p&gt;Consuming from Kafka has a built-in retry mechanism. If the processing is naturally idempotent, deterministic, and does not interact with other services (i.e. all its state resides in Kafka), then the solution can be relatively simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;kafkaMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;consume&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processMessageIdempotently&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaMessage&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;kafkaOutputMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toKafkaOutputMessage&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="n"&gt;kafkaProducer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;produceAndFlush&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaOutputMessage&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;kafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;commitOffset&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaMessage&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Consume the message from a Kafka topic.&lt;/li&gt;
&lt;li&gt;Process the message.&lt;/li&gt;
&lt;li&gt;Publish the resulting message to a Kafka topic.&lt;/li&gt;
&lt;li&gt;Commit the consumer offset.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This approach ensures data consistency and enables idempotent processing. It guarantees that at least one published message is produced for every consumed message. &lt;/p&gt;

&lt;p&gt;To ensure at least-once delivery of published messages, it's also necessary to ensure that the message is &lt;em&gt;actually&lt;/em&gt; sent to the Kafka broker and that the Kafka producer has flushed its outgoing message queue.&lt;/p&gt;

&lt;h3&gt;
  
  
  Transactional Outbox Pattern
&lt;/h3&gt;

&lt;p&gt;Another approach is to utilize &lt;a href="https://microservices.io/patterns/data/transactional-outbox.html" rel="noopener noreferrer"&gt;Transactional Outbox Pattern&lt;/a&gt; which fills the gap between the database and Kafka transaction boundary by atomically updating both within the database transaction. The reason being that it is not possible to have a single transaction that spans the application’s database as well as Kafka.&lt;/p&gt;

&lt;p&gt;One possible implementation of this pattern is to have an “&lt;em&gt;outbox&lt;/em&gt;” table and instead of publishing resulting messages directly to Kafka, the messages are written to the outbox table in a compatible format (e.g. &lt;a href="https://www.confluent.io/en-gb/blog/avro-kafka-data/" rel="noopener noreferrer"&gt;Avro&lt;/a&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;kafkaMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;consume&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isDuplicate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaMessage&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processMessageIdempotently&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaMessage&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;transaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;startTransaction&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;updateAndRecordProcessed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;writeOutbox&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;transaction&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;commit&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;kafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;commitOffset&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaMessage&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, this pattern comes with additional complexity. The message must not only be written to the database but also published to Kafka. This can be implemented by a separate message relay service that continuously polls the database for new outbox messages, publishes them to Kafka, and marks them as processed. However, this approach has several drawbacks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Increased load on the database: Frequently polling the database can cause a high level of read traffic, which can lead to increased load on the database and potentially slow down other processes that are trying to access it.&lt;/li&gt;
&lt;li&gt;Latency: Depending on the interval at which the database is polled, there may be a significant delay between when a message is added to the outbox and when it is published to Kafka.&lt;/li&gt;
&lt;li&gt;Scalability: If the number of messages to be published to Kafka increases, the rate of polling will need to be increased, which can further increase the load on the database and make the system less scalable.&lt;/li&gt;
&lt;li&gt;Schema incompatibility issues: If the message schema is incompatible with a destination topic, application processing will succeed, but the poller could be unable to publish a message to Kafka. The risk of this can be minimized by verifying Avro schema with a schema registry before writing to the outbox table.&lt;/li&gt;
&lt;li&gt;Ordering of messages: Poller needs to ensure the order of messages written to the outbox tables is retained when publishing to Kafka.&lt;/li&gt;
&lt;li&gt;Missed messages: There is a chance that a message is not picked up by the poller and not published to Kafka.&lt;/li&gt;
&lt;li&gt;Lack of real-time: The messages are not published to kafka in real-time as it depends on the polling interval.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A better approach is to utilize &lt;a href="https://learn.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-ver16" rel="noopener noreferrer"&gt;CDC (change data capture)&lt;/a&gt; if your database supports it. You can use &lt;a href="https://debezium.io" rel="noopener noreferrer"&gt;Debezium&lt;/a&gt; and &lt;a href="https://docs.confluent.io/platform/current/connect/index.html" rel="noopener noreferrer"&gt;Kafka Connect&lt;/a&gt; to integrate CDC with a PostgresDB for example. That way, the database and Kafka stay in sync, and you don't have to deal with the drawbacks of database polling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Without Transactional Outbox
&lt;/h3&gt;

&lt;p&gt;However, even with the use of CDC, that will still result in another component that needs to be managed and monitored, and another possible point of failure. In certain situations it is easier to avoid the Transactional Outbox Pattern and handle writes to Kafka within the application. That can be achieved by combining the first simple solution explained above with the Idempotent Consumer pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;kafkaMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;consumeKafkaMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaClient&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isDuplicate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaMessage&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processMessageIdempotently&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaMessage&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;updateAndRecordProcessed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;readResult&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaMessage&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;kafkaOutputMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toKafkaOutputMessage&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;kafkaProducer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;produceAndFlush&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaOutputMessage&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;kafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;commitOffset&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kafkaMessage&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Consume the message from a Kafka topic.&lt;/li&gt;
&lt;li&gt;Consult the database to confirm the message has not been previously processed. 
If it has, read the stored result and proceed to step 5.&lt;/li&gt;
&lt;li&gt;Process the message, taking care to handle any external actions in an idempotent manner.&lt;/li&gt;
&lt;li&gt;Write results to the database and mark the message as successfully processed.&lt;/li&gt;
&lt;li&gt;Publish the resulting message to a Kafka topic.&lt;/li&gt;
&lt;li&gt;Commit the consumer offset.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The approach outlined above combines the use of the Idempotent Consumer pattern with direct publishing to Kafka, resulting in a streamlined solution for handling duplicate messages. &lt;/p&gt;

&lt;p&gt;Additionally, by eliminating the need for an intermediate "&lt;em&gt;outbox&lt;/em&gt;" table, this approach reduces the number of components that need to be managed and monitored, resulting in a simpler overall architecture. &lt;/p&gt;

&lt;p&gt;Furthermore, it also benefits from reduced latency in message publishing as it avoids the added step of writing to a database before publishing to Kafka.&lt;/p&gt;

&lt;p&gt;This approach has some downsides to consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It might simplify overall architecture but it increases the complexity of processing within the application.&lt;/li&gt;
&lt;li&gt;The addition of a Kafka publish step can cause a performance overhead and prolong overall processing time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How it compares to Synchronous REST APIs
&lt;/h2&gt;

&lt;p&gt;Similarly to the Idempotent Consumer Pattern, in case of a REST API, received message IDs could also be tracked in a database to handle idempotency. However, there are drawbacks to using REST call as a trigger for processing, namely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The retry strategy is out of the control of the application, and the caller is responsible for retrying the operation. That makes it more susceptible to failure scenarios and inconsistent states.&lt;/li&gt;
&lt;li&gt;There is no ordering guarantee when responding to HTTP calls, and additional care must be taken to avoid certain race conditions during processing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Publishing output messages to Kafka in a way that maintains data consistency can be achieved by using &lt;a href="https://microservices.io/patterns/data/transactional-outbox.html" rel="noopener noreferrer"&gt;Transactional Outbox Pattern&lt;/a&gt; to atomically update the database and publish a message to Kafka.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Kafka is an ideal platform for implementing idempotent processing in your application, and it offers several key advantages over traditional synchronous processing methods such as REST APIs. Its built-in retry mechanism and ordering guarantees are essential for ensuring idempotence and maintaining data consistency in the presence of failures.&lt;/p&gt;

&lt;p&gt;When it comes to message delivery guarantees, the exactly-once semantics offered by Kafka can be a powerful tool to guard against duplicate messages. However, it's important to understand the intricacies of this feature, the requirements for its implementation, and its limitations. Additionally, the performance impact and complexity of exactly-once semantics should be taken into consideration.&lt;/p&gt;

&lt;p&gt;Achieving idempotent processing requires a thorough understanding of the triggers, actions, and outputs of the processing. Different approaches such as &lt;a href="https://microservices.io/patterns/communication-style/idempotent-consumer.html" rel="noopener noreferrer"&gt;Idempotent Consumer Pattern&lt;/a&gt; and &lt;a href="https://microservices.io/patterns/data/transactional-outbox.html" rel="noopener noreferrer"&gt;Transactional Outbox Pattern&lt;/a&gt; can be used to ensure that messages are processed correctly and that data consistency is maintained. It's important to weigh the complexity and potential drawbacks of each approach before deciding on the best solution for your application. As we have seen, Transactional Outbox is not always necessary.&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>architecture</category>
      <category>eventdriven</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Avoid Tight Coupling of Tests to Implementation Details</title>
      <dc:creator>Nejc Korasa</dc:creator>
      <pubDate>Tue, 10 Jan 2023 13:17:54 +0000</pubDate>
      <link>https://dev.to/nejckorasa/avoid-tight-coupling-of-tests-to-implementation-details-a7</link>
      <guid>https://dev.to/nejckorasa/avoid-tight-coupling-of-tests-to-implementation-details-a7</guid>
      <description>&lt;p&gt;Building backend systems today will likely involve building many small, independent services that communicate and coordinate with one another to form a distributed system. While there are many resources available discussing the pros and cons of microservices, the architecture, and when it is appropriate to use, I want to focus on the functional testing of microservices and how it differs from traditional approaches.&lt;/p&gt;

&lt;p&gt;In my experience, the "best testing practices" have evolved with the introduction of microservices, and traditional &lt;em&gt;testing pyramids&lt;/em&gt; may not be the most effective or even potentially harmful in this context. In my work on various projects and companies, including the development of new digital banks and the migration of older systems to microservices as they scale, I have often encountered disagreements about the most appropriate testing strategies for microservices.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why do we have tests?
&lt;/h3&gt;

&lt;p&gt;As software engineers, we rely on testing to verify that our code functions as expected. Testing should support refactoring, but it can sometimes make it more difficult. The purpose of testing is to define the intended behavior of the code, rather than the details of its implementation. In summary, tests should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Confirm that the code does what it should.&lt;/li&gt;
&lt;li&gt;Provide fast, accurate, reliable, and predictable feedback.&lt;/li&gt;
&lt;li&gt;Make maintenance easier, which is often overlooked when writing tests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Effective testing is crucial for building reliable software, and it is important to keep these goals in mind when writing tests. By focusing on the intended behavior of the code and the needs of maintenance, we can write tests that give us confidence in our code and make the development process more efficient.&lt;/p&gt;

&lt;h4&gt;
  
  
  Common Mistakes
&lt;/h4&gt;

&lt;p&gt;It is not uncommon to come across codebases with a large number of tests and high test coverage percentages, only to find that the code is not truly tested and that refactoring or adding new features is difficult. In my experience, this is often due to the following pitfalls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Overreliance on unit tests&lt;/li&gt;
&lt;li&gt;Tight coupling of tests to implementation details&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Avoiding these mistakes is key to writing effective tests that support the development process and ensure the reliability of the code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do you need Unit Tests?
&lt;/h3&gt;

&lt;p&gt;One common approach to testing is the belief that all classes, functions, and methods must be tested. This can lead to a large number of unit tests and a high test coverage percentage. However, an excess of unit tests can make it difficult to change the code without also having to modify the tests. This can undermine the confidence in the code and negate the benefits of testing if the tests must be rewritten every time the code is changed.&lt;/p&gt;

&lt;p&gt;In the case of microservices, which are small and independent by definition, it could be argued that the microservice itself is a unit and should be tested as an isolated component through its contracts, or in a black-box fashion. In this sense, the term "unit tests" for microservices can be thought of as implementation detail tests. Instead of focusing on unit tests, it may be more effective to consider the testing of microservices at a higher level, such as through integration tests.&lt;/p&gt;

&lt;h3&gt;
  
  
  Don't couple you tests to Implementation Details
&lt;/h3&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--LZhImt4T--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/1448561325668454401/u11Xlu8j_normal.jpg" alt="Nejc Korasa profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Nejc Korasa
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        &lt;a class="mentioned-user" href="https://dev.to/nejckorasa"&gt;@nejckorasa&lt;/a&gt;
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      Please don't couple your tests to implementation details. Tests should support refactoring, not make it harder. &lt;a href="https://twitter.com/hashtag/SoftwareEngineering"&gt;#SoftwareEngineering&lt;/a&gt; &lt;a href="https://twitter.com/hashtag/testing"&gt;#testing&lt;/a&gt; &lt;a href="https://t.co/oeHhbXD2Pc"&gt;twitter.com/KentBeck/statu…&lt;/a&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      13:31 PM - 04 Dec 2022
    &lt;/div&gt;

      &lt;div class="ltag__twitter-tweet__quote"&gt;
        &lt;div class="ltag__twitter-tweet__quote__header"&gt;
          &lt;span class="ltag__twitter-tweet__quote__header__name"&gt;
            Kent Beck 🌻
          &lt;/span&gt;
          @KentBeck
        &lt;/div&gt;
        Tests should be coupled to the behavior of code and decoupled from the structure of code. Seeing tests that fail on both counts.
      &lt;/div&gt;

    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=1599395920281743361" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=1599395920281743361" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=1599395920281743361" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;
 

&lt;p&gt;When writing tests, it is important to avoid coupling them to implementation details. This ensures that tests serve as a reliable safety net, allowing you to refactor the internals of your microservice without having to modify the tests. &lt;/p&gt;

&lt;h3&gt;
  
  
  Focus on Integration Tests
&lt;/h3&gt;

&lt;p&gt;To avoid testing implementation details we should test from the edges of microservices, by examining the inputs and outputs of the service and verifying their correctness in an isolated manner while focusing on the interaction points and making them very explicit. &lt;/p&gt;

&lt;h4&gt;
  
  
  Define Inputs and Outputs
&lt;/h4&gt;

&lt;p&gt;Look at the entrypoint of the service (e.g. a REST API, Kafka consumer) to define the inputs for your tests and find the corresponding outputs (e.g. HTTP response, published Kafka message). It may be necessary to assert multiple outputs for a single input, as processing an HTTP request could result in a database update, new kafka message, and HTTP response&lt;/p&gt;

&lt;h4&gt;
  
  
  Test the Microservice as an Isolated Component (Unit)
&lt;/h4&gt;

&lt;p&gt;Spin up the microservice and all necessary infrastructure components, such as web servers and databases, and send inputs to verify the outputs. Tools like &lt;a href="https://www.testcontainers.org"&gt;Testcontainers for Java&lt;/a&gt; can help by running the application in a short-lived test mode with dependencies, such as databases and message queues, running in Docker containers.&lt;/p&gt;

&lt;p&gt;By setting up specific infrastructure components in a separate test setup stage, you can isolate them from the actual tests, allowing you to change the underlying infrastructure without modifying the test methods themselves (e.g. replacing the database from PostgreSQL to NoSQL).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This approach is similar to &lt;a href="https://en.wikipedia.org/wiki/Hexagonal_architecture_(software)"&gt;hexagonal architecture&lt;/a&gt;, which decouples infrastructure and domain logic, but the testing strategy will differ.&lt;/p&gt;

&lt;p&gt;There is a cost to it as it adds some complexity, but I have seen codebases where the benefits were worth it. Ultimately, the decision of how much complexity to add through isolation should be based on how often you anticipate changing the infrastructure of the service and whether the added complexity is justified.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Clear Definition of Microservice Behavior through Testing
&lt;/h4&gt;

&lt;p&gt;A test suite with a focus on integration tests will likely have fewer tests overall, but they will clearly define the expected behavior of the microservice. When examining the test suite, you should be able to get a clear understanding of what the microservice is intended to do.&lt;/p&gt;

&lt;h3&gt;
  
  
  There still is a place for Implementation Details Tests
&lt;/h3&gt;

&lt;p&gt;There will be parts of the code that are domain specific and only contain business logic. Those naturally isolated parts have an internal complexity of their own and this is where implementation details tests should be used. Testing all variations and edge cases will be cumbersome and too heavy to test through integration tests. &lt;/p&gt;

&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://engineering.atspotify.com/2018/01/testing-of-microservices/"&gt;Spotify: Testing of Microservices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.testcontainers.org"&gt;Testcontainers for Java&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>testing</category>
      <category>integrationtests</category>
      <category>microservices</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Unzip files in S3 with Java</title>
      <dc:creator>Nejc Korasa</dc:creator>
      <pubDate>Thu, 03 Nov 2022 14:43:37 +0000</pubDate>
      <link>https://dev.to/nejckorasa/stream-unzip-files-in-s3-with-java-16b4</link>
      <guid>https://dev.to/nejckorasa/stream-unzip-files-in-s3-with-java-16b4</guid>
      <description>&lt;p&gt;I've been spending a lot of time with AWS S3 recently building data pipelines and have encountered a surprisingly non-trivial challenge of unzipping files in an S3 bucket. &lt;br&gt;
A few minutes with Google and StackOverflow made it clear many others have faced the same issue.&lt;/p&gt;

&lt;p&gt;I'll explain a few options to handle the unzipping as well as the end solution which has led me to build &lt;a href="https://github.com/nejckorasa/s3-stream-unzip"&gt;nejckorasa/s3-stream-unzip&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To sum up: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;there is no support to unzip files in S3 in-line,&lt;/li&gt;
&lt;li&gt;there also is no unzip built-in api available in AWS SDK.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In order to unzip you therefore need to download the files from S3, unzip and upload decompressed files back. &lt;/p&gt;

&lt;p&gt;This solution is simple to implement with the use of &lt;a href="https://aws.amazon.com/sdk-for-java/"&gt;Java AWS SDK&lt;/a&gt;, and it probably is good enough if you are dealing with smaller files - if files are small enough you can just keep hold of decompressed files in memory and upload them back. &lt;/p&gt;

&lt;p&gt;Alternatively, in case of memory constraints, files can be persisted to disk storage. Great, that works.&lt;/p&gt;

&lt;p&gt;Problems arise with larger files. AWS Lambda, for example, &lt;a href="https://aws.amazon.com/lambda/faqs/"&gt;has a 1024MB memory and disk space limit&lt;/a&gt;. A dedicated EC2 instance will solve the disk space issue, but it requires more maintenance. I'd also argue that storing 500MB+ files to disk is not the most optimal approach. &lt;br&gt;
That will of course depend on how many files need to be unzipped as well as the run frequency of that operation - it's ok as a one-off but maybe not so if it needs to run daily. In any case, we really can do better.&lt;/p&gt;
&lt;h3&gt;
  
  
  Streaming solution
&lt;/h3&gt;

&lt;p&gt;A better approach would be to stream the file from S3, download it in chunks, unzip and upload them back to S3 utilizing multipart upload. That way you completely avoid the need for disk storage and you can minimize the memory footprint by tuning the download and upload chunk sizes.&lt;/p&gt;

&lt;p&gt;There are 2 parts of this solution that need to be integrated:&lt;/p&gt;
&lt;h4&gt;
  
  
  1) Download and uznip
&lt;/h4&gt;

&lt;p&gt;Streaming S3 objects is natively supported by AWS SDK, there is a &lt;code&gt;getObjectContent()&lt;/code&gt; method that returns the input stream containing the contents of the S3 object.&lt;/p&gt;

&lt;p&gt;Java provides &lt;a href="https://docs.oracle.com/javase/7/docs/api/java/util/zip/ZipInputStream.html"&gt;ZipInputStream&lt;/a&gt; as an input stream filter for reading files in the ZIP file format. It reads ZIP content entry-by-entry and thus allows custom handling for each entry.&lt;/p&gt;

&lt;p&gt;Streaming object content from S3 and feeding that into &lt;code&gt;ZipInputStream&lt;/code&gt; will give us decompressed chunks of object content we can buffer in memory.&lt;/p&gt;
&lt;h4&gt;
  
  
  2) Upload unzipped chunks to S3
&lt;/h4&gt;

&lt;p&gt;Uploading files to S3 is a common task and SDK supports several options to choose from, including &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html"&gt;multipart upload&lt;/a&gt;. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What is multipart upload?&lt;/p&gt;

&lt;p&gt;Multipart upload allows you to upload a single object as a set of parts. &lt;br&gt;
Each part is a contiguous portion of the object's data. You can upload these object parts independently and in any order. &lt;br&gt;
If transmission of any part fails, you can retransmit that part without affecting other parts. &lt;/p&gt;

&lt;p&gt;After all parts of your object are uploaded, Amazon S3 assembles these parts and creates the object. &lt;/p&gt;

&lt;p&gt;In general, when your object size reaches 100 MB, you should consider using multipart uploads instead of uploading the object in a single operation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  nejckorasa/s3-stream-unzip
&lt;/h3&gt;

&lt;p&gt;All that is left to do now is to integrate stream download, unzip, and multipart upload. &lt;br&gt;
I've done all the hard work and built &lt;a href="https://github.com/nejckorasa/s3-stream-unzip"&gt;nejckorasa/s3-stream-unzip&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Java library to manage unzipping of large files and data in AWS S3 without knowing the size beforehand and without keeping it all in memory or writing to disk.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Unzipping is achieved without knowing the size beforehand and without keeping it all in memory or writing to disk. That makes it suitable for large data files - it has been used to unzip files of size 100GB+.&lt;/p&gt;

&lt;p&gt;It supports different unzip strategies including an option to split zipped files (suitable for larger files, e.g. csv files). It's lightweight and only requires an AmazonS3 client to run.&lt;/p&gt;

&lt;p&gt;It has a simple API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// initialize AmazonS3 client&lt;/span&gt;
&lt;span class="nc"&gt;AmazonS3&lt;/span&gt; &lt;span class="n"&gt;s3CLient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AmazonS3ClientBuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;standard&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="c1"&gt;// customize the client&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;// create UnzipStrategy&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;NoSplitUnzipStrategy&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SplitTextUnzipStrategy&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withHeader&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withFileBytesLimit&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// or create UnzipStrategy with additional config&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;S3MultipartUpload&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Config&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withThreadCount&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withQueueSize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withAwaitTerminationTimeSeconds&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withCannedAcl&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;CannedAccessControlList&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;BucketOwnerFullControl&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withUploadPartBytesLimit&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;MB&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withCustomizeInitiateUploadRequest&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// customize request&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="o"&gt;});&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;NoSplitUnzipStrategy&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// create S3UnzipManager&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;um&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;S3UnzipManager&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s3Client&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;um&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;S3UnzipManager&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s3Client&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withContentTypes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"application/zip"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

&lt;span class="c1"&gt;// unzip options&lt;/span&gt;
&lt;span class="n"&gt;um&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;unzipObjects&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bucket-name"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"input-path"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"output-path"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;um&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;unzipObjectsKeyMatching&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bucket-name"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"input-path"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"output-path"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;".*\\.zip"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;um&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;unzipObjectsKeyContaining&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"bucket-name"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"input-path"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"output-path"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"-part-of-object-"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;um&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;unzipObject&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s3Object&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"output-path"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Library is available on &lt;a href="https://search.maven.org/artifact/io.github.nejckorasa/s3-stream-unzip/1.0.1/jar"&gt;Maven Central&lt;/a&gt; and on &lt;a href="https://github.com/nejckorasa/s3-stream-unzip"&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can see the original blog post here: &lt;a href="https://nejckorasa.github.io/posts/s3-unzip/"&gt;https://nejckorasa.github.io/posts/s3-unzip/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>java</category>
      <category>aws</category>
      <category>s3</category>
      <category>unzip</category>
    </item>
    <item>
      <title>Open source: Instagram Analyzer</title>
      <dc:creator>Nejc Korasa</dc:creator>
      <pubDate>Tue, 17 Jul 2018 11:51:11 +0000</pubDate>
      <link>https://dev.to/nejckorasa/open-source-instagram-analyzer-1obd</link>
      <guid>https://dev.to/nejckorasa/open-source-instagram-analyzer-1obd</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/nejckorasa/instagram-analyzer"&gt;instagram-analyzer&lt;/a&gt; is an application written in Python that analyzes geotags using reverse geocoding in user's Instagram photos and videos. &lt;/p&gt;

&lt;p&gt;It provides the data of specific locations, countries and cities you've visited so far, as well as how many times and which Instagram posts match the location.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I want to hear feedback, good or bad, so please go check it out!&lt;/strong&gt;&lt;br&gt;
Thanks&lt;/p&gt;
&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;
&lt;h3&gt;
  
  
  📍Store all instagram media data 📷
&lt;/h3&gt;

&lt;p&gt;Application loads all user's instagram media and saves it in JSON format. This data includes all media metadata, including likes, location, tagged users, comments, image url-s ...&lt;/p&gt;
&lt;h3&gt;
  
  
  📍Store all instagram location data 📊
&lt;/h3&gt;

&lt;p&gt;Analyzes geotags and saves locations in JSON forma. This data includes occurrence for each location as well as image and instagram media url-s ...&lt;/p&gt;
&lt;h3&gt;
  
  
  📍Store all instagram countries and cities location data
&lt;/h3&gt;

&lt;p&gt;Countries and cities are additionally analyzed using reverse geocoding with &lt;a href="https://locationiq.com"&gt;LocationIQ API&lt;/a&gt;. Data is saved in JSON files.&lt;/p&gt;
&lt;h3&gt;
  
  
  📍Prints occurrences for location, country and city ✈️
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You have visited 99 different locations
You have visited 7  different countries
You have visited 32 different cities
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Print table view of most visited location, countries and cities 🌍
&lt;/h3&gt;

&lt;p&gt;For example, when executed for &lt;a href="https://www.instagram.com/nejckorasa"&gt;nejckorasa&lt;/a&gt; print for countries looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Countries: 

+------+-----------------+-------------+
| rank | country         | occurrences |
+------+-----------------+-------------+
|  1   | Slovenia        |     51      |
+------+-----------------+-------------+
|  2   | The Netherlands |     12      |
+------+-----------------+-------------+
|  3   | Spain           |      8      |
+------+-----------------+-------------+
|  4   | Poland          |      8      |
+------+-----------------+-------------+
|  5   | Russia          |      7      |
+------+-----------------+-------------+
|  6   | Croatia         |      7      |
+------+-----------------+-------------+
|  7   | Hungary         |      6      |
+------+-----------------+-------------+

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Similar tables are printed for specific locations and cities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;To install instagram-analyzer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;instagram-analyzer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To update instagram-analyzer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ pip install instagram-analyzer --upgrade
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;p&gt;Once installed, import it, configure it and run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;instagram_analyzer&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InstaAnalyzer&lt;/span&gt;

&lt;span class="n"&gt;InstaAnalyzer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;insta_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'&amp;lt;INSTAGRAM_TOKEN_HERE&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;location_iq_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'&amp;lt;LOCATION_IQ_TOKEN_HERE&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before you run it, see &lt;a href="https://github.com/nejckorasa/instagram-analyzer/blob/master/README.md#configuration--options"&gt;Configuration &amp;amp; Options&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration &amp;amp; Options
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Acquire Tokens
&lt;/h3&gt;

&lt;h5&gt;
  
  
  Acquire Instagram Access Token
&lt;/h5&gt;

&lt;p&gt;Go to &lt;a href="http://instagram.pixelunion.net/"&gt;Pixelunion&lt;/a&gt;, generate token, don't forget the token!&lt;/p&gt;

&lt;h5&gt;
  
  
  Acquire Location IQ Access Token
&lt;/h5&gt;

&lt;p&gt;Go to &lt;a href="https://locationiq.com/"&gt;Location IQ&lt;/a&gt;, sign up, get the token, don't forget the token!&lt;/p&gt;

&lt;h3&gt;
  
  
  Configure and run
&lt;/h3&gt;

&lt;p&gt;Create &lt;code&gt;InstaAnalyzer&lt;/code&gt; instance with token values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;analyzer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;InstaAnalyzer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;insta_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'&amp;lt;INSTAGRAM_TOKEN_HERE&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;location_iq_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'&amp;lt;LOCATION_IQ_TOKEN_HERE&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_media_from_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Once instagram media data is stored in JSON, you can read it from there, instead of loading it again via Instagram API (API is limited to 200 request per hour). Set &lt;code&gt;analyzer.read_media_from_file = True&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Options
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;location_iq_token&lt;/code&gt; is optional. If not set only basic location analysis will be run and saved to file.&lt;/li&gt;
&lt;li&gt;Once &lt;code&gt;InstaAnalyzer&lt;/code&gt; has been run all data is available to access:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Configure InstaAnalyzer
&lt;/span&gt;&lt;span class="n"&gt;analyzer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;InstaAnalyzer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;insta_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'&amp;lt;INSTAGRAM_TOKEN_HERE&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;location_iq_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'&amp;lt;LOCATION_IQ_TOKEN_HERE&amp;gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Run InstaAnalyzer    
&lt;/span&gt;&lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Access cities, countries and location data
&lt;/span&gt;&lt;span class="n"&gt;cities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cities&lt;/span&gt;
&lt;span class="n"&gt;countires&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;countires&lt;/span&gt;
&lt;span class="n"&gt;locations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;locations&lt;/span&gt;

&lt;span class="c1"&gt;# Access instagram media data
&lt;/span&gt;&lt;span class="n"&gt;instagram_media&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;insta_media_data&lt;/span&gt;

&lt;span class="c1"&gt;# Print locations later
&lt;/span&gt;&lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;print_locations&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Stored data examples
&lt;/h2&gt;

&lt;p&gt;When executed for &lt;a href="https://www.instagram.com/nejckorasa"&gt;nejckorasa&lt;/a&gt; data for one country item (Spain) looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"Spain"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"media_items"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;post_id&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scontent.cdninstagram.com/vp/e7705068da5e289f5e44c0c396c08f74/5BD54C95/t51.2885-15/sh0.08/e35/p640x640/36149213_609452269436842_8766778259800064000_n.jpg?efg=eyJ1cmxnZW4iOiJ1cmxnZW5fZnJvbV9pZyJ9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"link"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.instagram.com/p/Bkh3-KfgxL9/"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;post_id&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scontent.cdninstagram.com/vp/2b239894a363f6bbe93d604ab2cdfa8a/5BE953CD/t51.2885-15/sh0.08/e35/p640x640/33941046_171665143683479_8766885676932136960_n.jpg?efg=eyJ1cmxnZW4iOiJ1cmxnZW5fZnJvbV9pZyJ9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"link"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.instagram.com/p/Bj7Uj56gxBs/"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;post_id&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scontent.cdninstagram.com/vp/9d7003f674af9ca05accf9961df893a6/5BE28FDA/t51.2885-15/sh0.08/e35/p640x640/33120615_197967877520708_8731075699906969600_n.jpg?efg=eyJ1cmxnZW4iOiJ1cmxnZW5fZnJvbV9pZyJ9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"link"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.instagram.com/p/Bjmp-6bAYus/"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;post_id&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scontent.cdninstagram.com/vp/1e7ca79fc44823ff3ef8b24e6dd55e61/5BD1E8C3/t51.2885-15/sh0.08/e35/p640x640/33608474_597094857325212_724188974242856960_n.jpg?efg=eyJ1cmxnZW4iOiJ1cmxnZW5fZnJvbV9pZyJ9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"link"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.instagram.com/p/BjR_9lpAqpc/"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;post_id&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scontent.cdninstagram.com/vp/1b046c05b1cbe9708f57f5e591b68d1c/5BD8E039/t51.2885-15/sh0.08/e35/p640x640/32947036_172314443452529_4611639929133334528_n.jpg?efg=eyJ1cmxnZW4iOiJ1cmxnZW5fZnJvbV9pZyJ9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"link"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.instagram.com/p/BjNEIwiA6Py/"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;post_id&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scontent.cdninstagram.com/vp/5ac0e05fb60700cba4c41d6d1216eb5b/5BC8A9DB/t51.2885-15/e15/10802615_318814311644936_1896556761_n.jpg?efg=eyJ1cmxnZW4iOiJ1cmxnZW5fZnJvbV9pZyJ9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"link"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.instagram.com/p/vdWuHBkwuY/"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;post_id&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scontent.cdninstagram.com/vp/40620d8f5e7e01a546e2b958d18bd42a/5BE9E99F/t51.2885-15/e15/10784835_319487204924131_388050040_n.jpg?efg=eyJ1cmxnZW4iOiJ1cmxnZW5fZnJvbV9pZyJ9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"link"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.instagram.com/p/vYybQyEwiA/"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;post_id&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scontent.cdninstagram.com/vp/b733c0bdf312ee5c21bb3fd6148e6221/5BE263EA/t51.2885-15/e15/10802986_691193854310946_2042620114_n.jpg?efg=eyJ1cmxnZW4iOiJ1cmxnZW5fZnJvbV9pZyJ9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"link"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.instagram.com/p/vc9ZFakwrq/"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;post_id&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scontent.cdninstagram.com/vp/875bff08c310444273eae90a67e525dd/5BC8F29F/t51.2885-15/e15/928044_671144066338855_1666493611_n.jpg?efg=eyJ1cmxnZW4iOiJ1cmxnZW5fZnJvbV9pZyJ9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"link"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.instagram.com/p/vaWbQLEwqX/"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Of course, &lt;code&gt;&amp;lt;post_id&amp;gt;&lt;/code&gt; will be an actual post ID.&lt;/p&gt;

&lt;p&gt;Data for cities is almost the same. For specific location one location item looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"236678869"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"latitude"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;45.7925&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"longitude"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;15.1647&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Novo Mesto"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;236678869&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"media_items"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;post_id&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scontent.cdninstagram.com/vp/6941d16b164ec488dd3a303004344f78/5BE40DE8/t51.2885-15/sh0.08/e35/p640x640/31270267_1592482480868234_8257495365851283456_n.jpg?efg=eyJ1cmxnZW4iOiJ1cmxnZW5fZnJvbV9pZyJ9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"link"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.instagram.com/p/Bij24yzAdHB/"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;post_id&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scontent.cdninstagram.com/vp/3189c0f2e5931f47b4506046ff26afff/5BDB6109/t51.2885-15/e15/10724200_1496985983889525_746072573_n.jpg?efg=eyJ1cmxnZW4iOiJ1cmxnZW5fZnJvbV9pZyJ9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"link"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.instagram.com/p/uDDPHekwtW/"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;post_id&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scontent.cdninstagram.com/vp/fbf31b5c410c9036ce43862012249d02/5BEC3F36/t51.2885-15/e15/10488704_250740985124191_1862853011_n.jpg?efg=eyJ1cmxnZW4iOiJ1cmxnZW5fZnJvbV9pZyJ9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"link"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.instagram.com/p/q94LWMkwlk/"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;post_id&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://scontent.cdninstagram.com/vp/27c6681709c7b71fc86d8477c11d2b88/5BCAD041/t51.2885-15/e15/10013254_641464529259998_1091484863_n.jpg?efg=eyJ1cmxnZW4iOiJ1cmxnZW5fZnJvbV9pZyJ9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"link"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.instagram.com/p/mKDvsikwsC/"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"city"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Novo mesto"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"additional_data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"place_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"113385772"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"licence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\u&lt;/span&gt;&lt;span class="s2"&gt;00a9 LocationIQ.org CC BY 4.0, Data &lt;/span&gt;&lt;span class="se"&gt;\u&lt;/span&gt;&lt;span class="s2"&gt;00a9 OpenStreetMap contributors, ODbL 1.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"osm_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"way"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"osm_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"167321715"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"lat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"45.7897769"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"lon"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"15.1680662"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"display_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Krka, Novo mesto, Jugovzhodna Slovenija, 8000, Slovenia"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"address"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"suburb"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Krka"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"town"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Novo mesto"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"state_district"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Jugovzhodna Slovenija"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"postcode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"8000"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"country"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Slovenia"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"country_code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"si"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"boundingbox"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"45.7858017"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"45.7927137"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"15.1640388"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"15.1725268"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice &lt;code&gt;additional_data&lt;/code&gt; field, this data is populated using &lt;a href="https://locationiq.com"&gt;Location IQ API&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Why does it take so long to load additional location data?
&lt;/h4&gt;

&lt;p&gt;For reverse geocoding, Location IQ API is used. Free version of that API si rate limited to 1 request per second. That is why additional data loading takes &lt;code&gt;&amp;lt;different_location_count&amp;gt;&lt;/code&gt; seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Go check it out, leave feedback 🙏
&lt;/h2&gt;

&lt;p&gt;Here's a link to github: &lt;a href="https://github.com/nejckorasa/instagram-analyzer"&gt;instagram-analyzer&lt;/a&gt;&lt;/p&gt;

</description>
      <category>showdew</category>
      <category>opensource</category>
      <category>python</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
