<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Romina Elena Mendez Escobar</title>
    <description>The latest articles on DEV Community by Romina Elena Mendez Escobar (@r_elena_mendez_escobar).</description>
    <link>https://dev.to/r_elena_mendez_escobar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F719582%2F2d700dae-2335-4c2f-9a32-4435184a4f4f.jpeg</url>
      <title>DEV Community: Romina Elena Mendez Escobar</title>
      <link>https://dev.to/r_elena_mendez_escobar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/r_elena_mendez_escobar"/>
    <language>en</language>
    <item>
      <title>More women in Tech. Fewer women leading</title>
      <dc:creator>Romina Elena Mendez Escobar</dc:creator>
      <pubDate>Tue, 31 Mar 2026 13:41:07 +0000</pubDate>
      <link>https://dev.to/r_elena_mendez_escobar/more-women-in-tech-fewer-women-leading-5f9e</link>
      <guid>https://dev.to/r_elena_mendez_escobar/more-women-in-tech-fewer-women-leading-5f9e</guid>
      <description>&lt;p&gt;Every March invites me to pause, and on a personal level, it’s a moment to acknowledge the progress made toward equality, but also to reflect honestly on the challenges that still remain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8238lzz0pm8nq5dm01jm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8238lzz0pm8nq5dm01jm.png" alt=" " width="780" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In recent years, we have seen encouraging signs: more women are pursuing careers in technology, science, and data. At the same time, initiatives to promote diversity within organizations have grown, along with conversations around female leadership and inclusion programs across the sector.&lt;/p&gt;

&lt;p&gt;However, when we look at who occupies decision-making roles in technology (who leads teams, defines strategy, or drives innovation) the reality still reflects an uneven path.&lt;/p&gt;

&lt;p&gt;From my experience working in IT, one question keeps coming up: if more women are studying STEM fields (science, technology, engineering, and mathematics) and developing technical skills, why is it still so difficult to see them in technical leadership roles?&lt;/p&gt;

&lt;p&gt;With that question in mind, I reviewed several recent reports and what I found is that there is no single cause, but rather a combination of structural and cultural factors that reinforce one another.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding them together is key to explaining why progress remains so slow.&lt;/strong&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  A persistent gap: the numbers behind the reality
&lt;/h1&gt;

&lt;p&gt;To frame the conversation, it is worth starting with a few recent data points: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;〰️ &lt;strong&gt;(1)&lt;/strong&gt; Globally, women represent around 50% of the working-age population, yet they hold only 40% of total employment and approximately 35.4% of management positions, according to the International Labour Organization &lt;strong&gt;[1]&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;〰️ &lt;strong&gt;(2)&lt;/strong&gt; Within the technology sector, the situation is even more pronounced. In Europe, women account for fewer than one in five tech workers [6], and according to McKinsey’s analysis, their presence in core technical roles has not only failed to improve over time but has actually declined: from 22% in earlier reports to approximately 19% in more recent ones. This suggests that, rather than closing, the gap may in fact be widening &lt;strong&gt;[2]&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;〰️ &lt;strong&gt;(3)&lt;/strong&gt; At the highest levels, the numbers are equally telling. In 2025, women lead just 11% of Fortune 500 companies, compared to 10.4% the previous year [3],  a modest increase that, in perspective, highlights the slow pace of progress.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;〰️ &lt;strong&gt;(4)&lt;/strong&gt; According to the 2025 Women’s Power Gap report, of the 64 new CEOs appointed in the S&amp;amp;P 500 in 2024, only 11 were women (17% of the total), and none were founders of the companies they were set to lead &lt;strong&gt;[4]&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;〰️ &lt;strong&gt;(5)&lt;/strong&gt; The gender pay gap adds another layer to this picture: in the European Union, women earn on average around 12% less than men &lt;strong&gt;[9]&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;〰️ &lt;strong&gt;(6)&lt;/strong&gt; The 82% of the female leaders surveyed say they have had to change companies at least once in order to take the next step in their professional career &lt;strong&gt;[9]&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These figures describe the outcome, but not the process. To understand why this situation persists, we need to look inside organizations and examine the mechanisms shaping women’s career progression.&lt;/p&gt;




&lt;h1&gt;
  
  
  The broken rung: when careers start at a disadvantage
&lt;/h1&gt;

&lt;p&gt;One of the most useful concepts for explaining this gap is called &lt;strong&gt;Broken Rung&lt;/strong&gt;. The image is precise: it is not about a glass ceiling preventing women from reaching the top, but rather a damaged step at the very beginning that makes it harder for many women to take their first step into leadership.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo9isf5azz7typ9w0zhya.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo9isf5azz7typ9w0zhya.png" alt=" " width="780" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;According to a McKinsey study conducted in the United States, for every 100 men promoted to their first management role, only around 80 women achieve the same advancement [2]. At first glance, this may seem like a small difference, but its consequences compound over time. If fewer women reach the first step of leadership, there will also be fewer candidates at the next level, and even fewer at the level above.&lt;/p&gt;

&lt;p&gt;With each promotion, the starting pool shrinks, and female representation gradually diminishes as one moves up the hierarchy.&lt;/p&gt;

&lt;p&gt;This cascading effect largely explains why executive levels in technology companies show such limited representation. The problem is not  the final barrier before reaching roles such as CEO or CTO, it lies in that initial moment when decisions are made about who takes on early management and leadership responsibilities and who does not.&lt;/p&gt;

&lt;p&gt;At this point, it is worth adding another insight highlighted by McKinsey: 49% of women in the European technology sector reported experiencing sexism or bias in the past year, and 82% said they feel the need to prove their competence more than their male peers in order to be recognized [2].&lt;/p&gt;

&lt;p&gt;These are not just individual experiences; they are indicators of an environment where the standards of evaluation are not the same for everyone, and where promotion decisions may be influenced by different expectations based on gender.&lt;/p&gt;




&lt;h1&gt;
  
  
  Invisible work: tasks that consume time without building careers
&lt;/h1&gt;

&lt;p&gt;Alongside the broken rung, there is a second mechanism that operates more quietly but just as effectively: &lt;code&gt;non-promotable work&lt;/code&gt;. This refers to all the tasks that are necessary for the day-to-day functioning of organizations but are not recognized in performance evaluations nor contribute to career advancement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fosxiypkj6fga72mtou83.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fosxiypkj6fga72mtou83.png" alt=" " width="780" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The list is familiar to anyone who has worked in an organization: taking meeting notes, organizing team events, coordinating onboarding logistics for new hires, managing recognition initiatives or gifts, or participating in committees that have no direct impact on the business. These tasks are essential, yet they are not reflected in any performance metric and, when it comes to evaluating promotions, they simply do not count.&lt;/p&gt;

&lt;p&gt;The issue is not only that these tasks go unrecognized, but also that they are not distributed equitably. According to an analysis published by The Guardian in 2022 [7], women tend to take on these responsibilities more frequently. This results in less time available for strategic projects and reduced visibility within the organization. In some cases, this difference can amount to nearly a month of work per year spent on tasks that do not contribute to professional growth, compared to their male counterparts.&lt;/p&gt;

&lt;p&gt;Over time, this pattern not only limits individual development but also structurally reinforces the gap in access to leadership roles.&lt;/p&gt;




&lt;h1&gt;
  
  
  Learning to stay relevant: the challenge of continuous upskilling
&lt;/h1&gt;

&lt;p&gt;In this context, one of the most important responses is reskilling: the ability to learn new skills and adapt to ongoing market transformations. Developing capabilities in areas such as AI, data, cloud, infrastructure, cloud computing, DevOps and security will be critical in the coming years for those who want to remain relevant and grow professionally.&lt;/p&gt;

&lt;p&gt;However, technical training, while necessary, is not sufficient on its own. It is equally essential to develop a deep understanding of the industries where technology is applied: understanding the real challenges organizations face, identifying the most appropriate solutions for each context, and being able to design realistic implementation paths. In this sense, training in project management, agile methodologies, and research and development practices is not an optional complement, but a core component of the professional profile the market will demand.&lt;/p&gt;

&lt;p&gt;As Meirav Oren, CEO and co-founder of Versatile, noted during the World Economic Forum:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp3j2xr23mvhu7jvu0785.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp3j2xr23mvhu7jvu0785.png" alt=" " width="780" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This insight points to a well-documented phenomenon: many women tend to apply for new positions only when they feel they meet all the requirements, whereas men often apply when they meet only part of them. This is not a difference in capability, but rather a reflection of how the environment has shaped confidence and risk perception.&lt;/p&gt;

&lt;p&gt;For this reason, fostering environments where women can take on challenges, learn through the process, and make their work visible is just as important as any technical training program.&lt;/p&gt;




&lt;h1&gt;
  
  
  Systemic barriers in transition: the added impact of AI
&lt;/h1&gt;

&lt;p&gt;When viewed together, what emerges is not a list of isolated issues, but a system of barriers that reinforce one another. The broken rung reduces, from the outset, the number of women who enter leadership, while non-promotable work consumes the time and energy that could otherwise be invested in building visibility and career progression.&lt;/p&gt;

&lt;p&gt;And to this already complex system, we must now add a new and accelerating force: artificial intelligence.&lt;/p&gt;

&lt;p&gt;AI is redefining skills, roles, and organizational dynamics. As new opportunities emerge, others evolve or transform at an increasing pace.&lt;/p&gt;

&lt;p&gt;However, this transformation also presents a specific challenge for women's participation in technology. In many teams, women have historically had stronger representation in areas such as design, user experience, and product management. According to McKinsey, &lt;strong&gt;women represent approximately 53% of design roles and 39% of product management positions [2]&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;These same areas are among those most affected by the adoption of AI-driven tools. In particular, early-career roles are already showing signs of decline, &lt;strong&gt;with a 3% decrease in design and a 2% decrease in product roles between 2024 and 2025 [2]&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This does not mean these roles will disappear, but rather that they are evolving rapidly and demanding new technical and strategic capabilities. Entry-level profiles, in particular, face greater challenges, as they require structured support, continuous learning, and real opportunities to adapt.&lt;/p&gt;

&lt;p&gt;In this context, the risk is not technological but structural: if women do not have equitable access to reskilling, upskilling, and leadership opportunities within these transformations, the gap may widen even further in the coming years.&lt;/p&gt;

&lt;p&gt;None of these dynamics operate in isolation. Rather, it is their combination that explains why, despite the growing number of women entering the technology sector, representation in leadership roles remains so limited.&lt;/p&gt;

&lt;p&gt;And precisely because the problem is systemic, the solutions must be as well.&lt;/p&gt;




&lt;h1&gt;
  
  
  Building the future of technology is also a matter of diversity
&lt;/h1&gt;

&lt;p&gt;Technological progress opens up enormous opportunities for society, but it also raises a question we cannot ignore: who is designing the systems we will use in the future?&lt;/p&gt;

&lt;p&gt;Algorithms, digital platforms, and artificial intelligence systems are not neutral. They are shaped by the decisions, experiences, and contexts of those who build them.&lt;/p&gt;

&lt;p&gt;In software architecture, there is a principle known as &lt;strong&gt;Conway’s Law&lt;/strong&gt;, which states that organizations design systems that mirror their communication structures. Applied to diversity, this means that if technology teams are not diverse — or if communication is hierarchical and limited — those same constraints may be reflected in the solutions we create.&lt;/p&gt;

&lt;p&gt;This is not only a matter of equality, but also of innovation, social impact, and the quality of the technology we bring into the world. Diverse teams make better decisions, consider more perspectives, and ultimately build more robust solutions.&lt;/p&gt;

&lt;p&gt;March 8 serves as a reminder that, although progress has been made, the path toward equitable participation in technology leadership is still ongoing. And this challenge does not belong to a single day or a single sector:  it is part of an ongoing responsibility.&lt;/p&gt;

&lt;p&gt;Promoting inclusion, supporting the professional development of women in technology, and creating real pathways to leadership are not just goals. They are ways of building teams where different perspectives can coexist and enrich the decisions we shape — now more than ever — in technology.&lt;/p&gt;

&lt;p&gt;Because the future of technology will not only be defined by what we build... but by who is given the opportunity to build it.&lt;/p&gt;




&lt;h1&gt;
  
  
  📚References
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;[1]&lt;/strong&gt; Deloitte. (n.d.). Women at work: Global outlook. &lt;a href="https://www.deloitte.com/global/en/issues/work/content/women-at-work-global-outlook.html" rel="noopener noreferrer"&gt;https://www.deloitte.com/global/en/issues/work/content/women-at-work-global-outlook.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;[2]&lt;/strong&gt; McKinsey &amp;amp; Company. (n.d.). Women in tech and AI in Europe: Can the region close its gender gap? &lt;a href="https://www.mckinsey.com/capabilities/mckinsey-technology/our-insights/women-in-tech-and-ai-in-europe-can-the-region-close-its-gender-gap#/" rel="noopener noreferrer"&gt;https://www.mckinsey.com/capabilities/mckinsey-technology/our-insights/women-in-tech-and-ai-in-europe-can-the-region-close-its-gender-gap#/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;[3]&lt;/strong&gt; Fortune. (2025, June 2). Fortune 500 female CEOs 2025. &lt;a href="https://fortune.com/2025/06/02/fortune-500-female-ceos-2025/" rel="noopener noreferrer"&gt;https://fortune.com/2025/06/02/fortune-500-female-ceos-2025/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;[4]&lt;/strong&gt; Women’s Power Gap. (2025). CEO report 2025. &lt;a href="https://www.womenspowergap.org/wp-content/uploads/2025/05/WPG_CEO-Report_2025.pdf" rel="noopener noreferrer"&gt;https://www.womenspowergap.org/wp-content/uploads/2025/05/WPG_CEO-Report_2025.pdf&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;[5]&lt;/strong&gt; Council of the European Union. (n.d.). The EU’s gender pay gap: Facts and figures. &lt;a href="https://www.consilium.europa.eu/en/policies/the-eu-s-gender-pay-gap-facts-and-figures/" rel="noopener noreferrer"&gt;https://www.consilium.europa.eu/en/policies/the-eu-s-gender-pay-gap-facts-and-figures/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;[6]&lt;/strong&gt; Euronews. (2026, March 8). Why women are disappearing from Europe’s tech workforce. &lt;a href="https://www.euronews.com/next/2026/03/08/why-women-are-disappearing-from-europes-tech-workforce" rel="noopener noreferrer"&gt;https://www.euronews.com/next/2026/03/08/why-women-are-disappearing-from-europes-tech-workforce&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;[7]&lt;/strong&gt;  The Guardian. (2022, May 9). They feel guilty: Why women should say no to office housework. &lt;a href="https://www.theguardian.com/society/2022/may/09/they-feel-guilty-why-women-should-say-no-to-office-housework" rel="noopener noreferrer"&gt;https://www.theguardian.com/society/2022/may/09/they-feel-guilty-why-women-should-say-no-to-office-housework&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;[8]&lt;/strong&gt; World Economic Forum. (2025, June). What to know about AI and the gender gap. &lt;a href="https://www.weforum.org/stories/2025/06/amnc25-what-to-know-about-ai-and-the-gender-gap/" rel="noopener noreferrer"&gt;https://www.weforum.org/stories/2025/06/amnc25-what-to-know-about-ai-and-the-gender-gap/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;[9]&lt;/strong&gt; KPMG. (2025). Global female leaders outlook 2025. &lt;a href="https://assets.kpmg.com/content/dam/kpmgsites/pt/pdf/kpmg-global-female-leaders-outlook-2025.pdf.coredownload.inline.pdf" rel="noopener noreferrer"&gt;https://assets.kpmg.com/content/dam/kpmgsites/pt/pdf/kpmg-global-female-leaders-outlook-2025.pdf.coredownload.inline.pdf&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>womenintech</category>
      <category>inclusion</category>
      <category>career</category>
    </item>
    <item>
      <title>AI in healthcare: how OpenAI is transforming medical care</title>
      <dc:creator>Romina Elena Mendez Escobar</dc:creator>
      <pubDate>Mon, 19 Jan 2026 10:17:37 +0000</pubDate>
      <link>https://dev.to/r_elena_mendez_escobar/ai-in-healthcare-how-openai-is-transforming-medical-care-ffn</link>
      <guid>https://dev.to/r_elena_mendez_escobar/ai-in-healthcare-how-openai-is-transforming-medical-care-ffn</guid>
      <description>&lt;p&gt;Introduction&lt;br&gt;
Artificial intelligence is increasingly being adopted in highly regulated industries, and &lt;strong&gt;healthcare&lt;/strong&gt; is a clear example of how this technology can improve processes, access to information, and the quality of care.&lt;br&gt;
According to OpenAI’s latest product announcements, more than &lt;strong&gt;230 million people worldwide&lt;/strong&gt; use ChatGPT every week to ask questions related to health and wellbeing. This growing adoption reflects a broader shift in how individuals and professionals seek medical information and support.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fspeqkq4wi310dqdf73vf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fspeqkq4wi310dqdf73vf.png" alt=" " width="800" height="703"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Healthcare systems face significant challenges: clinical staff are often overwhelmed, medical knowledge is highly fragmented, and administrative complexity continues to grow. &lt;strong&gt;AI is beginning to address these issues&lt;/strong&gt; by supporting decision-making, reducing operational burdens, and making medical information more accessible.&lt;/p&gt;

&lt;p&gt;This month, OpenAI introduced new &lt;strong&gt;healthcare-focused capabilities&lt;/strong&gt; designed to support both medical professionals and patients. These services aim to bring trusted information and care-related workflows closer to people, while prioritizing &lt;strong&gt;security, compliance, and responsible use&lt;/strong&gt; in one of the most sensitive and regulated industries.&lt;/p&gt;




&lt;h1&gt;
  
  
  OpenAI for Healthcare: Operationalizing AI in Healthcare Organizations
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;OpenAI for Healthcare&lt;/strong&gt; is specifically designed for healthcare organizations such as hospitals, research centers, clinic networks, and integrated health systems. Its primary goal is to provide a &lt;strong&gt;secure, enterprise-grade platform&lt;/strong&gt; that enables these institutions to deliver more consistent, high-quality care, while reducing the administrative burden that consumes a significant amount of clinicians’ time.&lt;/p&gt;

&lt;p&gt;One of the platform’s most distinctive capabilities is its &lt;strong&gt;evidence retrieval with clear citations&lt;/strong&gt;. Responses are grounded in trusted medical sources, including millions of peer-reviewed studies, public health guidelines, and up-to-date clinical directives. This allows healthcare professionals to verify information more easily and support clinical decisions with &lt;strong&gt;reliable, evidence-based insights&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Another particularly valuable feature is the use of &lt;strong&gt;reusable templates to streamline workflows&lt;/strong&gt;. These shared templates support common tasks such as drafting discharge summaries, patient instructions, clinical letters, and prior authorization requests. As a result, clinical teams spend less time rewriting repetitive documentation and searching for information, while patients benefit from clearer guidance and smoother transitions of care.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F93kdyc81rgb2gbqbmeyr.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F93kdyc81rgb2gbqbmeyr.webp" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;image source: &lt;a href="https://openai.com/es-419/index/openai-for-healthcare/" rel="noopener noreferrer"&gt;https://openai.com/es-419/index/openai-for-healthcare/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Below is an overview of the main capabilities offered by this solution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flk2r8aogpvf5pymucp3j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flk2r8aogpvf5pymucp3j.png" alt=" " width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  ChatGPT Health: A Smarter Way to Understand Your Health
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;ChatGPT Health&lt;/strong&gt; is designed for individual users who want to better understand their own health and navigate a complex healthcare system. Health is already one of the most popular topics on ChatGPT, because every week, users ask questions about health and wellbeing.&lt;/p&gt;

&lt;p&gt;Users can securely &lt;strong&gt;connect personal health data from multiple sources&lt;/strong&gt;, including electronic health records and wellness apps. They can also &lt;strong&gt;upload their own documents or images&lt;/strong&gt;, such as lab results or medical reports. This centralization allows ChatGPT Health to provide &lt;strong&gt;more relevant, personalized responses&lt;/strong&gt;, helping users interpret information, summarize results, and prepare for appointments.&lt;/p&gt;

&lt;p&gt;The tool is designed for practical, everyday use. It can help users review lab results, prepare questions for medical visits, provide guidance on diet, exercise, or wellness routines, and support understanding of insurance options based on personal health habits. It also includes features like voice input, dictation, and advanced search, making the experience more accessible and tailored to individual needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fehz5mbi4t2uzotwpe102.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fehz5mbi4t2uzotwpe102.webp" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;image source: &lt;a href="https://openai.com/es-ES/index/introducing-chatgpt-health/" rel="noopener noreferrer"&gt;https://openai.com/es-ES/index/introducing-chatgpt-health/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Below is an overview of the types of data sources users can integrate with ChatGPT Health.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1wlevgx4jwf3w10z7a5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1wlevgx4jwf3w10z7a5.png" alt=" " width="800" height="362"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Comparative Overview
&lt;/h1&gt;

&lt;p&gt;A side-by-side look at how OpenAI for Healthcare and ChatGPT Health support clinical teams and individual users with AI-driven health insights.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fotitpi3t2g888as5cdyy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fotitpi3t2g888as5cdyy.png" alt=" " width="800" height="528"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;The introduction of AI in healthcare is showing &lt;strong&gt;real potential&lt;/strong&gt;, not only as a tool to support clinical workflows, but also as a way to provide reliable information and guidance to people who may not have easy access to specialized care. &lt;strong&gt;OpenAI for Healthcare and ChatGPT Health&lt;/strong&gt; represent a major step forward in applying AI to one of the most regulated and sensitive industries.&lt;/p&gt;

&lt;p&gt;Currently, these tools are &lt;strong&gt;limited in availability&lt;/strong&gt;: OpenAI for Healthcare serves select institutions, and ChatGPT Health operates through a waitlist. How and when these solutions expand to smaller clinics, rural areas, or other countries will be key in determining their ability to truly democratize access to high-quality health support.&lt;/p&gt;

&lt;p&gt;Healthcare is constantly evolving, with &lt;strong&gt;new scientific evidence, clinical guidelines, and regulatory updates&lt;/strong&gt;. AI solutions like these can help by keeping pace with these changes, providing relevant and accurate information over time.&lt;/p&gt;

&lt;p&gt;While AI will not &lt;strong&gt;replace healthcare professionals&lt;/strong&gt;, these tools offer opportunities to reduce administrative burdens, improve efficiency, and empower both clinicians and patients with personalized insights. By making healthcare more accessible, understandable, and responsive, AI can &lt;strong&gt;complement human care&lt;/strong&gt;, helping to achieve better outcomes while supporting professionals rather than replacing them.&lt;/p&gt;




&lt;h1&gt;
  
  
  📚Referencias
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;OpenAI. (2025)&lt;/strong&gt;. OpenAI for Healthcare. OpenAI. &lt;a href="https://openai.com/es-419/index/openai-for-healthcare/" rel="noopener noreferrer"&gt;https://openai.com/es-419/index/openai-for-healthcare/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI. (2025)&lt;/strong&gt;. Introducing ChatGPT Health. OpenAI. &lt;a href="https://openai.com/es-ES/index/introducing-chatgpt-health" rel="noopener noreferrer"&gt;https://openai.com/es-ES/index/introducing-chatgpt-health&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  📌 How to cite this article
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;APA style&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Mendez Escobar, Romina Elena. (2025). &lt;strong&gt;AI in healthcare: how OpenAI is transforming medical care&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
&lt;a href="https://dev.to/r_elena_mendez_escobar/ai-in-healthcare-how-openai-is-transforming-medical-care-ffn"&gt;https://dev.to/r_elena_mendez_escobar/ai-in-healthcare-how-openai-is-transforming-medical-care-ffn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BibTeX&lt;/strong&gt;&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
@article{mendez2025aihealthcare,
  title  = {AI in healthcare: how OpenAI is transforming medical care},
  author = {Mendez Escobar, Romina Elena},
  year   = {2025},
  url    = {https://dev.to/r_elena_mendez_escobar/ai-in-healthcare-how-openai-is-transforming-medical-care-ffn}
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>openai</category>
      <category>ai</category>
      <category>data</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>TOON vs JSON for LLM Prompts: Can We Reduce Token Usage Without Losing Response Quality?</title>
      <dc:creator>Romina Elena Mendez Escobar</dc:creator>
      <pubDate>Mon, 05 Jan 2026 08:41:45 +0000</pubDate>
      <link>https://dev.to/r_elena_mendez_escobar/toon-vs-json-for-llm-prompts-can-we-reduce-token-usage-without-losing-response-quality-59ed</link>
      <guid>https://dev.to/r_elena_mendez_escobar/toon-vs-json-for-llm-prompts-can-we-reduce-token-usage-without-losing-response-quality-59ed</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Over the past months, I came across several articles claiming that &lt;strong&gt;TOON&lt;/strong&gt; can significantly reduce token usage in LLM prompts compared to traditional &lt;strong&gt;JSON&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmwoxyu8jbloeqvnasc0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmwoxyu8jbloeqvnasc0.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That raised a few questions for me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does TOON still provide benefits with &lt;strong&gt;real-world API responses?&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;How much does it actually reduce tokens?&lt;/li&gt;
&lt;li&gt;And more importantly: &lt;strong&gt;does changing the format affect how an LLM interprets the data or the quality of the response?&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Answering these questions isn’t simple, and the results can vary depending on the dataset, the structure of the data, and even the LLM itself. It’s also not a simple matter of counting token, different formats may influence how the model understands and processes the information. &lt;/p&gt;

&lt;p&gt;In this article, I aim to run a practical benchmark to explore whether TOON could be useful in production pipelines, in what contexts it performs best, and whether it works well across different types of JSON.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This article walks through the experiment, the results, and the conclusions.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h1&gt;
  
  
  What Is TOON (and How Is It Different from JSON)?
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;TOON (Terse Object-Oriented Notation)&lt;/strong&gt; is a data serialization format designed specifically for LLM prompts. The goal is simple: reduce syntactic overhead while remaining readable for both humans and machines.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5fqktut0uee255noit04.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5fqktut0uee255noit04.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  The Experiment
&lt;/h1&gt;

&lt;p&gt;This experiment evaluates whether alternative data serialization formats can reduce token usage in LLM prompts without degrading response quality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08j9ix49vdatle7whlcz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08j9ix49vdatle7whlcz.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The experiment follows four main stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Dataset Fetching&lt;/strong&gt;: Data is retrieved from public APIs and prepared for downstream processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Benchmarking&lt;/strong&gt;: Each dataset is encoded in JSON and TOON, and token counts are computed using a tokenizer to measure size differences across formats.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM Interaction:&lt;/strong&gt; The serialized data is sent to an LLM via Amazon Bedrock to generate responses and embeddings under deterministic settings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Evaluation:&lt;/strong&gt; Outputs generated from JSON and TOON prompts are compared using semantic (cosine similarity) and lexical (ROUGE, BLEU) metrics to assess equivalence.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The goal is not to optimize prompt content, but to isolate the impact of serialization format on token efficiency and response consistency.&lt;/p&gt;




&lt;h2&gt;
  
  
  Datasets
&lt;/h2&gt;

&lt;p&gt;In this experiment, I wanted to test TOON with &lt;strong&gt;realistic, publicly available data&lt;/strong&gt;, rather than small, manually created datasets. Using real API responses allows us to see how token savings and LLM behavior hold up in practical scenarios.&lt;br&gt;
I selected &lt;strong&gt;two public APIs with very different characteristics&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GitHub Events API&lt;/strong&gt;: Returns a &lt;strong&gt;stream of recent public events on GitHub&lt;/strong&gt;, such as pushes, pull requests, issues, and comments.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🔗 URL&lt;/strong&gt;: &lt;a href="https://api.github.com/events" rel="noopener noreferrer"&gt;https://api.github.com/events&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🧩 Data structure&lt;/strong&gt;: Deeply nested, heterogeneous objects with multiple levels of dictionaries and arrays.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;💡 Why this matters&lt;/strong&gt;: Represents the kind of &lt;strong&gt;complex operational API&lt;/strong&gt; data you might send to an LLM in real projects.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Wikipedia Page Views API&lt;/strong&gt;:Returns the** top-viewed articles on English Wikipedia** for a given day.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🔗 URL&lt;/strong&gt;: &lt;a href="https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia/all-access/2024/01/01" rel="noopener noreferrer"&gt;https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia/all-access/2024/01/01&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🧩 Data structure&lt;/strong&gt;: Flat, repetitive lists of articles, each with numeric metrics (title, views, category).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;💡 Why this matters&lt;/strong&gt;: Ideal for testing TOON’s efficiency with &lt;strong&gt;flat, repetitive data&lt;/strong&gt;, where token savings are expected to be highest.
Using these two APIs allows us to evaluate TOON in both &lt;strong&gt;complex nested&lt;/strong&gt; and &lt;strong&gt;flat list scenarios&lt;/strong&gt;, giving a more comprehensive view of its performance in real-world LLM prompts.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  Fetching the Data
&lt;/h2&gt;

&lt;p&gt;To extract data from these APIs, we created the following utility class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DatasetFetcher&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch datasets from different sources&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_github_events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch recent GitHub events&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.github.com/events&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[:&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_wikipedia_pages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch popular Wikipedia pages&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User-Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOON-Benchmark/1.0 (Research)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia/all-access/2024/01/01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;articles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;articles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][:&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;This class allows you to quickly fetch &lt;strong&gt;sample datasets&lt;/strong&gt; for testing token efficiency with TOON and JSON formats.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h1&gt;
  
  
  Part 1: Token Reduction
&lt;/h1&gt;
&lt;h2&gt;
  
  
  🏗️ Methodology
&lt;/h2&gt;

&lt;p&gt;To measure token usage, I used &lt;a href="https://pypi.org/project/tiktoken/" rel="noopener noreferrer"&gt;tiktoken&lt;/a&gt;, the same tokenizer employed by many OpenAI-compatible models. This allows us to estimate how many tokens are consumed by the &lt;strong&gt;prompt payload itself&lt;/strong&gt;, independent of the model’s output.&lt;br&gt;
For TOON generation, I used the &lt;strong&gt;toon-format&lt;/strong&gt; library, which converts Python objects into TOON while preserving structure and ordering.&lt;br&gt;
The following &lt;strong&gt;classes&lt;/strong&gt; implement token counting and incremental benchmarking using these libraries:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TokenCounter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Count tokens using tiktoken&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tiktoken&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encoding_for_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Count tokens in a text string&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This class allows you to quickly count tokens in any string, whether it’s JSON, TOON, or plain text.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TokenBenchmark&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Benchmark token reduction: JSON vs TOON&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BenchmarkConfig&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TokenCounter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;incremental_benchmark&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dataset_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Perform incremental benchmark comparing JSON vs TOON

        Args:
            data_list: List of objects to analyze
            dataset_name: Name of dataset for identification

        Returns:
            DataFrame with benchmark results
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;accum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;accum&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Encode in both formats
&lt;/span&gt;            &lt;span class="n"&gt;json_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;accum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ensure_ascii&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;toon_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;toon_encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;accum&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Count tokens
&lt;/span&gt;            &lt;span class="n"&gt;json_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;toon_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;toon_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Calculate reduction
&lt;/span&gt;            &lt;span class="n"&gt;saved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json_tokens&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;toon_tokens&lt;/span&gt;
            &lt;span class="n"&gt;reduction_pct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;saved&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;json_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;json_tokens&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;num_items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JSON_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOON_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;toon_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tokens_saved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;saved&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reduction_pct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reduction_pct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dataset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dataset_name&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;These &lt;strong&gt;classes&lt;/strong&gt; allow us to &lt;strong&gt;incrementally benchmark token usage&lt;/strong&gt;, providing a detailed view of how much TOON reduces tokens compared to JSON as items accumulate in a prompt.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  🧪 Results: Token Reduction Metrics
&lt;/h2&gt;

&lt;p&gt;In the code available in the repository, you can see the classes used to compute these results.&lt;br&gt;&lt;br&gt;
What stands out, however, is that token reduction is &lt;strong&gt;not uniform across datasets&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;dataset&lt;/th&gt;
&lt;th&gt;mean&lt;/th&gt;
&lt;th&gt;std&lt;/th&gt;
&lt;th&gt;min&lt;/th&gt;
&lt;th&gt;max&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;github_events&lt;/td&gt;
&lt;td&gt;2.77&lt;/td&gt;
&lt;td&gt;0.26&lt;/td&gt;
&lt;td&gt;2.60&lt;/td&gt;
&lt;td&gt;4.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;wikipedia_pages&lt;/td&gt;
&lt;td&gt;42.61&lt;/td&gt;
&lt;td&gt;6.66&lt;/td&gt;
&lt;td&gt;13.64&lt;/td&gt;
&lt;td&gt;46.70&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Events (complex, nested data) - Average token reduction:&lt;/strong&gt; ~3%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wikipedia Pages (flat, repetitive data) - Average token reduction:&lt;/strong&gt; ~43%&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  💡 Why the difference?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;For &lt;strong&gt;GitHub Events&lt;/strong&gt;, the reduction is only ~3%, which means that using TOON instead of JSON &lt;strong&gt;does not significantly reduce token usage&lt;/strong&gt;. The reason is that &lt;strong&gt;deep nesting and heterogeneous keys limit how much syntactic overhead can be removed&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;Wikipedia Pages&lt;/strong&gt;, the reduction is ~43% because &lt;strong&gt;flat, repetitive lists benefit greatly from removing braces, commas, and repeated field names&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  Part 2: Does Response Quality Stay the Same?
&lt;/h1&gt;

&lt;p&gt;The second experiment focuses on &lt;strong&gt;response quality&lt;/strong&gt;, the goal is to verify whether using the &lt;strong&gt;same prompt&lt;/strong&gt;, but providing the data encoded in &lt;strong&gt;JSON&lt;/strong&gt; versus &lt;strong&gt;TOON&lt;/strong&gt;, produces equivalent outputs from the LLM.&lt;br&gt;
For this experiment, I used the &lt;strong&gt;Wikipedia dataset&lt;/strong&gt;, since it showed the highest token reduction (~45%). This makes it an ideal candidate to evaluate whether aggressive token savings have any negative impact on output quality.&lt;br&gt;
To compare the responses, I generated outputs using both formats and evaluated them using several &lt;strong&gt;text similarity metrics&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  🧪 Results: Evaluation Metrics
&lt;/h2&gt;

&lt;p&gt;To assess output quality, I used the following metrics, each capturing a different aspect of similarity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7d5bmoorq3lz77gze66.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7d5bmoorq3lz77gze66.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  LLM and Embeddings Setup (AWS Bedrock)
&lt;/h2&gt;

&lt;p&gt;All responses and embeddings were generated using AWS Bedrock, Amazon’s fully managed service for accessing foundation models.&lt;br&gt;
The following models were used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;⚡ Amazon Nova Lite (amazon.nova-lite-v1:0)&lt;/strong&gt;: A lightweight, cost-efficient LLM optimized for fast inference.  In this experiment, it was used for prompt completion and response generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⚡ Amazon Titan Embeddings (amazon.titan-embed-text-v2:0):&lt;/strong&gt; A text embedding model that converts text into high-dimensional vectors. It was used to generate vector representations of the responses for semantic similarity comparison.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Bedrock Client Implementation
&lt;/h2&gt;

&lt;p&gt;The following class encapsulates interaction with &lt;strong&gt;AWS Bedrock&lt;/strong&gt; for both &lt;strong&gt;prompt generation and embedding extraction.&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  &lt;code&gt;invoke_prompt&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;This method sends a prompt to the LLM and returns the generated response.&lt;br&gt;
It accepts the following parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;💬 prompt&lt;/strong&gt;: The base instruction or question provided to the model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📄 dataset&lt;/strong&gt;: The data to analyze, encoded either in &lt;strong&gt;JSON&lt;/strong&gt; or &lt;strong&gt;TOON&lt;/strong&gt;, which is appended to the prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🌡️ temperature&lt;/strong&gt;: Controls the randomness of the model’s output.&lt;/li&gt;
&lt;/ul&gt;
&lt;h5&gt;
  
  
  🌡️ Why &lt;code&gt;temperature = 0&lt;/code&gt;?
&lt;/h5&gt;

&lt;p&gt;The temperature parameter with this value is due to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It reduces randomness&lt;/strong&gt; in model outputs&lt;/li&gt;
&lt;li&gt;It makes responses &lt;strong&gt;deterministic&lt;/strong&gt; across multiple runs&lt;/li&gt;
&lt;li&gt;It ensures that any differences in the outputs are due to the &lt;strong&gt;input format (JSON vs TOON)&lt;/strong&gt;, not sampling variability&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Without fixing the temperature, it would be impossible to reliably attribute differences in response quality to the serialization format alone.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4&gt;
  
  
  &lt;code&gt;get_embeddings&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;This method generates &lt;strong&gt;vector embeddings&lt;/strong&gt; for a given text using the embedding model.&lt;br&gt;
The resulting vectors are later used to compute &lt;strong&gt;cosine similarity&lt;/strong&gt;, allowing us to measure &lt;strong&gt;semantic equivalence&lt;/strong&gt; between responses generated from &lt;code&gt;JSON&lt;/code&gt; and &lt;code&gt;TOON&lt;/code&gt; inputs.&lt;/p&gt;

&lt;p&gt;Overall, these parameters allow us to &lt;strong&gt;control model behavior&lt;/strong&gt; and isolate the impact of input serialization, with &lt;strong&gt;temperature&lt;/strong&gt; being the most important variable for this experiment.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AWSBedrockClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Client to interact with AWS Bedrock&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_embedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="n"&gt;aws_access_key_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;aws_secret_access_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_prompt&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_embedding&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;aws_access_key_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;aws_access_key_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;aws_secret_access_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;aws_secret_access_key&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Invoke model with prompt

        Args:
            prompt: Base prompt
            dataset: Data to analyze (JSON or TOON encoded)
            temperature: 0 = deterministic, higher = more random
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;prompt_final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt_final&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inferenceConfig&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_new_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;response_body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response_body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error invoking prompt model: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_embeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate embeddings for a text&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputText&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;response_body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response_body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error generating embeddings: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Experimental Setup
&lt;/h3&gt;

&lt;p&gt;The experiment is based on the following principles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Same prompt structure&lt;/strong&gt;, changing only the data serialization format (&lt;strong&gt;JSON vs TOON&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;25 independent runs per format&lt;/strong&gt; to capture variability and compute robust statistics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temperature = 0&lt;/strong&gt; to minimize randomness and ensure deterministic model behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup allows us to isolate the impact of the serialization format on the model’s output.&lt;/p&gt;


&lt;h3&gt;
  
  
  Prompt Design
&lt;/h3&gt;

&lt;p&gt;The following is the prompt we will use for testing. The same prompt will be used in all executions, and we will only modify the data attached to the prompt for testing with toon and json format.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fimwknb6os5rkrqerdg4a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fimwknb6os5rkrqerdg4a.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By concatenating the dataset directly to the prompt, we ensure that the &lt;strong&gt;instruction remains identical&lt;/strong&gt;, and any differences in the response are attributable solely to the input format.&lt;/p&gt;


&lt;h2&gt;
  
  
  Evaluation Procedure
&lt;/h2&gt;

&lt;p&gt;To assess response equivalence between &lt;code&gt;JSON&lt;/code&gt; and &lt;code&gt;TOON&lt;/code&gt;, the experiment relies on the &lt;strong&gt;SemanticEvaluator&lt;/strong&gt; class, which encapsulates response generation and similarity evaluation.&lt;br&gt;
At the core of the evaluation is the comparison of &lt;strong&gt;two responses per run&lt;/strong&gt;, generated using the same prompt but different data encodings (JSON vs TOON), with temperature fixed at 0 to ensure deterministic behavior.&lt;br&gt;
The evaluation is structured as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;cosine_similarity&lt;/strong&gt; computes semantic similarity between the two responses using embedding vectors generated by Amazon Titan. This metric captures meaning-level equivalence and is insensitive to surface-level wording changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;evaluate_single_run&lt;/strong&gt; performs a full comparison for one run. It invokes the &lt;strong&gt;LLM&lt;/strong&gt; twice (&lt;strong&gt;JSON&lt;/strong&gt; and &lt;strong&gt;TOON&lt;/strong&gt;), generates embeddings, and computes cosine similarity along with lexical overlap metrics (&lt;strong&gt;ROUGE-1&lt;/strong&gt;, &lt;strong&gt;ROUGE-2&lt;/strong&gt;, &lt;strong&gt;ROUGE-L&lt;/strong&gt;) and BLEU. The output is a consolidated set of similarity scores for that run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;evaluate_multiple_runs&lt;/strong&gt; repeats the single-run evaluation 25 times using the same prompt and dataset. Results from all runs are aggregated into a DataFrame, enabling statistical analysis such as mean values, variance, and stability across runs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This design allows us to determine whether TOON’s token savings preserve response quality, both semantically and lexically, across multiple deterministic evaluations.&lt;/p&gt;


&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;After running &lt;strong&gt;25 deterministic evaluations&lt;/strong&gt; (temperature = 0), the analysis focused exclusively on &lt;strong&gt;response equivalence&lt;/strong&gt;, measuring whether JSON and TOON produce comparable outputs when token savings are significant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic Equivalence (Cosine Similarity ≈ 0.991)&lt;/strong&gt;&lt;br&gt;
The most important signal comes from &lt;strong&gt;cosine similarity&lt;/strong&gt;, computed using embeddings generated by Amazon Titan.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An average score of &lt;strong&gt;0.991&lt;/strong&gt; indicates that, for the LLM, responses generated from TOON-encoded data are &lt;strong&gt;semantically equivalent&lt;/strong&gt; to those generated from JSON.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Despite the removal of structural syntax such as braces, quotes, and repeated field names, the model preserved its ability to reason over the data and extract the same insights.&lt;br&gt;
Across all runs, the meaning of the responses remained consistent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo5c0lx4sd6qq687hw6ou.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo5c0lx4sd6qq687hw6ou.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5c9ute9bf3emm3ckozt0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5c9ute9bf3emm3ckozt0.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Lexical Variability vs. Data Accuracy
&lt;/h2&gt;

&lt;p&gt;Lexical similarity metrics such as ROUGE-1 and BLEU report lower absolute values:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ROUGE-1 F1&lt;/strong&gt; = 0.747&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ROUGE-L F1&lt;/strong&gt; = 0.608&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BLEU&lt;/strong&gt; = 0.563&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These scores indicate a moderate degree of lexical and structural variation between responses generated from JSON and TOON inputs. In particular, ROUGE-1 suggests partial overlap at the word level, while the lower ROUGE-L score highlights differences in sentence structure and ordering, consistent with paraphrasing and reformulation rather than content loss. Similarly, BLEU, which is sensitive to exact n-gram matches and word order, penalizes these variations even when responses remain correct and informative.&lt;/p&gt;

&lt;p&gt;Importantly, these lexical differences do not correspond to a degradation in response quality. When inspecting the actual content of the responses, including rankings, averages, and detected trends, the results were numerically and logically consistent across formats.&lt;/p&gt;



&lt;p&gt;🗂️ Code repository&lt;br&gt;
If you want to analyze my code and see all these experiments performed, you can consult them from my repository, where all the code is available.&lt;br&gt;
If you find this tutorial useful, do not forget to leave a star ⭐️ on the repository and follow me to receive notifications about new articles. Your support helps keep creating valuable technical content for the community 🚀&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/RominaElenaMendezEscobar" rel="noopener noreferrer"&gt;
        RominaElenaMendezEscobar
      &lt;/a&gt; / &lt;a href="https://github.com/RominaElenaMendezEscobar/experiment-toon-vs-json" rel="noopener noreferrer"&gt;
        experiment-toon-vs-json
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      This repository contains a practical benchmark comparing JSON and TOON (Terse Object Oriented Notation) as data serialization formats for LLM prompts.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;&lt;a href="https://www.buymeacoffee.com/r0mymendez" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/b96fd4ea89ea15fcec30a4f86382eef0bbd17454aa3a8d4de8c8c5e92b55cf6c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4275792532304d6525323041253230436f666665652d737570706f72742532306d79253230776f726b2d4646444430303f7374796c653d666c6174266c6162656c436f6c6f723d313031303130266c6f676f3d6275792d6d652d612d636f66666565266c6f676f436f6c6f723d7768697465" alt="Buy Me A Coffee"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;TOON vs JSON for LLM Prompts: Can We Reduce Token Usage Without Losing Response Quality?&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;&lt;em&gt;A practical benchmark comparing TOON and JSON formats for LLM prompts&lt;/em&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;|&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;llm&lt;/code&gt;, &lt;code&gt;ai&lt;/code&gt;, &lt;code&gt;optimization&lt;/code&gt;, &lt;code&gt;python&lt;/code&gt;|&lt;/h2&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/RominaElenaMendezEscobar/experiment-toon-vs-json/img/preview.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FRominaElenaMendezEscobar%2Fexperiment-toon-vs-json%2Fimg%2Fpreview.png" alt="img-preview"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Introduction&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;Over the past months, I came across several articles claiming that &lt;strong&gt;TOON&lt;/strong&gt; can significantly reduce token usage in &lt;strong&gt;LLM&lt;/strong&gt; prompts compared to traditional &lt;strong&gt;JSON&lt;/strong&gt;. Most of these examples, however, relied on small or artificial datasets.
That raised a few questions for me:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Does TOON still provide benefits with &lt;strong&gt;real-world API responses&lt;/strong&gt;?&lt;/li&gt;
&lt;li&gt;How much does it actually reduce tokens?&lt;/li&gt;
&lt;li&gt;And more importantly: &lt;strong&gt;does changing the format affect how an LLM interprets the data or the quality of the response&lt;/strong&gt;?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this article, I aim to run a &lt;strong&gt;practical benchmark&lt;/strong&gt; to explore whether TOON could be useful in production pipelines, in what contexts it performs best, and whether it works well across…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/RominaElenaMendezEscobar/experiment-toon-vs-json" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;







&lt;h1&gt;
  
  
  Conclusions
&lt;/h1&gt;

&lt;p&gt;This experiment shows that TOON can significantly reduce token usage while preserving response quality, as long as it is applied to the right type of data. For flat, repetitive structures, TOON acts as an effective form of prompt compression: the LLM retains semantic understanding, and any differences in wording are superficial rather than affecting meaning or correctness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;⚠️ &lt;code&gt;Key limitations&lt;/code&gt;:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only a single LLM was tested (Amazon Nova Lite)&lt;/li&gt;
&lt;li&gt;Only specific datasets were used (GitHub Events and Wikipedia Page Views)&lt;/li&gt;
&lt;li&gt;Evaluation was conducted in English only&lt;/li&gt;
&lt;li&gt;Prompts were simple analytical tasks, not complex reasoning scenarios&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As in any systems project, &lt;strong&gt;solutions should be carefully evaluated&lt;/strong&gt; to determine whether they are truly optimal for a given use case. Outcomes often depend on &lt;strong&gt;many variables&lt;/strong&gt;, so testing and validation in the specific context are essential before making decisions or implementing at scale.&lt;/p&gt;




&lt;h3&gt;
  
  
  📌 How to cite this article
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;APA style&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Mendez Escobar, Romina Elena. (2025). &lt;strong&gt;TOON vs JSON for LLM Prompts: Can We Reduce Token Usage Without Losing Response Quality?&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
&lt;a href="https://dev.to/r_elena_mendez_escobar/toon-vs-json-for-llm-prompts-can-we-reduce-token-usage-without-losing-response-quality-59ed"&gt;https://dev.to/r_elena_mendez_escobar/toon-vs-json-for-llm-prompts-can-we-reduce-token-usage-without-losing-response-quality-59ed&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BibTeX&lt;/strong&gt;&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
@article{mendez2025ai,
  title  = {TOON vs JSON for LLM Prompts: Can We Reduce Token Usage Without Losing Response Quality?},
  author = {Mendez Escobar, Romina Elena},
  year   = {2025},
  url    = {https://dev.to/r_elena_mendez_escobar/toon-vs-json-for-llm-prompts-can-we-reduce-token-usage-without-losing-response-quality-59ed}
}



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>database</category>
      <category>performance</category>
    </item>
    <item>
      <title>From Coffee Products to AI Search: Building a Serverless Semantic Search Architecture with Amazon S3 Vectors and Bedrock</title>
      <dc:creator>Romina Elena Mendez Escobar</dc:creator>
      <pubDate>Wed, 31 Dec 2025 10:11:22 +0000</pubDate>
      <link>https://dev.to/aws-builders/from-coffee-products-to-ai-search-building-a-serverless-semantic-search-architecture-with-amazon-5g5b</link>
      <guid>https://dev.to/aws-builders/from-coffee-products-to-ai-search-building-a-serverless-semantic-search-architecture-with-amazon-5g5b</guid>
      <description>&lt;p&gt;In recent months, we have increasingly incorporated artificial intelligence into our solutions, and with it a recurring need has emerged: searching and querying our own data using natural language efficiently.&lt;/p&gt;

&lt;p&gt;Use cases such as semantic search or building solutions based on Retrieval-Augmented Generation (RAG) are no longer optional. Today, we need to understand the meaning of text, combine it with structured filters, and do so in an efficient and scalable way.&lt;br&gt;
In this article, I explore a recent alternative within the AWS ecosystem: Amazon S3 Vectors 🪣, a serverless approach for vector storage and querying that aims to balance scalability, simplicity, and cost.&lt;/p&gt;

&lt;p&gt;To make it more concrete (and a bit more entertaining)...we will work with a dataset of coffee products ☕ and build a complete flow that goes from generating embeddings with Amazon Bedrock 🧠 to an application deployed on AWS with Streamlit ✨, which allows natural language searches combined with filters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcgqcdvuwd4t6gyeou44.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcgqcdvuwd4t6gyeou44.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h1&gt;
  
  
  A quick note on embeddings and semantic search
&lt;/h1&gt;

&lt;p&gt;Before diving into the implementation, it is worth briefly clarifying two key concepts used throughout this tutorial:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Embeddings&lt;/strong&gt; are numerical representations of text that capture semantic meaning. Instead of relying on exact word matching, embeddings map text into high-dimensional vector spaces where semantically similar pieces of text are positioned closer together. This representation allows systems to reason about intent and context rather than purely lexical similarity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Semantic search&lt;/strong&gt; builds on top of embeddings by retrieving results based on meaning rather than exact terms. A user query is first transformed into an embedding and then compared against stored vectors using similarity metrics such as cosine or Euclidean distance. This approach enables more flexible, intent-aware searches and can be further refined by combining semantic similarity with structured metadata filters to improve precision and relevance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13nzlac3a3bdqa9klo6t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13nzlac3a3bdqa9klo6t.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h1&gt;
  
  
  What is Amazon S3 Vectors?
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Amazon S3 Vectors&lt;/strong&gt; is a new type of storage within Amazon S3 designed specifically to natively &lt;strong&gt;store and query vectors&lt;/strong&gt;.&lt;br&gt;
 In addition to storing vectors, this type of bucket allows associating &lt;strong&gt;structured metadata&lt;/strong&gt;, which enables queries that combine &lt;strong&gt;semantic search&lt;/strong&gt; with filters on those attributes.&lt;br&gt;
Vector buckets support searches based on distance metrics, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cosine similarity&lt;/strong&gt;: measures how similar two vectors are based on the angle between them, and is very common in text embeddings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Euclidean distance&lt;/strong&gt;: measures the “geometric” distance between two vectors in space.
Unlike traditional vector databases, Amazon S3 Vectors makes it possible to &lt;strong&gt;implement a fully serverless architecture&lt;/strong&gt;, achieving a good balance between &lt;code&gt;scalability&lt;/code&gt;, &lt;code&gt;operational&lt;/code&gt; &lt;code&gt;simplicity&lt;/code&gt;, and &lt;code&gt;cost&lt;/code&gt;.
Below are some of the main benefits of using this functionality:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqta0037jxq5gc1cju7uo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqta0037jxq5gc1cju7uo.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  How do vectors work in Amazon S3?
&lt;/h2&gt;

&lt;p&gt;Amazon S3 Vectors is based on the following main components:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🪣 1. Vector buckets&lt;/strong&gt;&lt;br&gt;
These are specialized buckets optimized for vector storage.&lt;br&gt;
They support encryption and organize data internally through &lt;strong&gt;vector indexes&lt;/strong&gt;, which enables efficient large-scale searches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🧭 2. Vector indexes&lt;/strong&gt;&lt;br&gt;
An index defines how vectors are stored and queried within the bucket.&lt;br&gt;
In addition to the vector, it allows associating &lt;strong&gt;metadata&lt;/strong&gt;, which can later be used in queries through filters with a syntax similar to well-known operators, such as those used in MongoDB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔍 3. Queries&lt;/strong&gt;&lt;br&gt;
Queries are based on &lt;strong&gt;similarity searches&lt;/strong&gt;, using the distance metric configured when creating the index, such as &lt;strong&gt;cosine&lt;/strong&gt; or &lt;strong&gt;Euclidean&lt;/strong&gt;.&lt;br&gt;
These searches can be combined with metadata filters to refine results and reduce ambiguities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;⚙️ 4. API&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Amazon S3 Vectors&lt;/strong&gt; exposes an API that allows querying data through operations such as &lt;code&gt;QueryVectors&lt;/code&gt;.&lt;br&gt;
These queries can be executed using tools like the &lt;strong&gt;AWS CLI&lt;/strong&gt; or &lt;strong&gt;Boto3&lt;/strong&gt;, combining a query vector with metadata-based filters and parameters such as the number of results to return or whether to include the distance between vectors.&lt;/p&gt;


&lt;h1&gt;
  
  
  Process Flow
&lt;/h1&gt;

&lt;p&gt;The previous image shows the complete workflow to implement semantic search using Amazon S3 Vectors, divided into three main stages:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfq3zqssetmuur7kdgx3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfq3zqssetmuur7kdgx3.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  1️⃣ Generate Vector Embeddings
&lt;/h2&gt;

&lt;p&gt;The process starts from the input documents. These documents are sent to an embeddings model, in this case &lt;strong&gt;AWS Titan&lt;/strong&gt; through &lt;strong&gt;Amazon Bedrock&lt;/strong&gt;, which transforms the text into numerical vectors.&lt;br&gt;
At this stage, not only are the vectors generated, but metadata describing each document is also associated.&lt;/p&gt;
&lt;h2&gt;
  
  
  2️⃣ Store Vector Data
&lt;/h2&gt;

&lt;p&gt;The generated vectors, together with their metadata, are stored in an &lt;strong&gt;S3 Vector Bucket&lt;/strong&gt;.&lt;br&gt;
Within the bucket, the data is organized through one or more &lt;strong&gt;vector indexes&lt;/strong&gt;, defined with a specific distance metric.&lt;br&gt;
Being integrated into AWS, this data can be consumed by other services such as &lt;strong&gt;Amazon Bedrock&lt;/strong&gt;, &lt;strong&gt;Amazon SageMaker&lt;/strong&gt;, or &lt;strong&gt;Amazon OpenSearch&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  3️⃣ Semantic Search via Vector Index
&lt;/h2&gt;

&lt;p&gt;To perform a search, a natural language query is transformed again into a vector using the same embeddings model.&lt;br&gt;
This query vector, together with metadata filters and the topK parameter, is used to query the vector index and retrieve the most semantically similar results.&lt;/p&gt;


&lt;h1&gt;
  
  
  Reference Architecture
&lt;/h1&gt;

&lt;p&gt;In this tutorial, the use case is based on processing data initially stored in &lt;strong&gt;JSON&lt;/strong&gt; format, which is transformed into &lt;strong&gt;Parquet&lt;/strong&gt; as part of a data preparation workflow. From this processed data, the &lt;strong&gt;Amazon Titan&lt;/strong&gt; model is invoked through &lt;strong&gt;Amazon Bedrock&lt;/strong&gt; to generate embeddings, which are then stored together with their metadata in an &lt;strong&gt;Amazon S3 Vectors bucket&lt;/strong&gt;, thus enabling semantic queries over the information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F31ha7s4qltdjph27c99p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F31ha7s4qltdjph27c99p.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Data processing is carried out through an &lt;strong&gt;Amazon Glue job in Python&lt;/strong&gt;, where a typical clean data stage of any production data pipeline is implemented. In this phase, only the relevant fields are selected, text descriptions are normalized and corrected when necessary, and only after this cleaning is completed is the Titan model invoked. This approach helps optimize costs and performance by avoiding unnecessary model calls on data that will not be used later.&lt;/p&gt;

&lt;p&gt;Finally, the data stored in the vector bucket is consumed by an application developed with &lt;strong&gt;Streamlit&lt;/strong&gt;, which is deployed on &lt;strong&gt;AWS Elastic Beanstalk&lt;/strong&gt; within a VPC. The application allows user queries to be transformed back into embeddings and used to query the vector index, combining semantic search with metadata-based filters, while access to services and system observability are managed through &lt;strong&gt;IAM&lt;/strong&gt; roles and &lt;strong&gt;CloudWatch&lt;/strong&gt; Logs.&lt;/p&gt;


&lt;h1&gt;
  
  
  Amazon Bedrock and Amazon Titan
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Amazon Bedrock&lt;/strong&gt; is a fully managed service that allows developers to build, deploy, and scale applications powered by artificial intelligence without the need to manage infrastructure. Through a unified API, Bedrock provides access to foundation models from different providers, making their integration into cloud architectures simple and secure.&lt;/p&gt;

&lt;p&gt;For this tutorial, we use &lt;strong&gt;Amazon Titan Text Embeddings V2&lt;/strong&gt;, a model available in Bedrock that can process up to &lt;code&gt;8,192 tokens&lt;/code&gt; or &lt;code&gt;50,000 characters&lt;/code&gt; and generate &lt;code&gt;1,024-dimensional vectors&lt;/code&gt;. This model is optimized for information retrieval tasks, semantic search, similarity measurement, and clustering, making it a suitable choice for RAG scenarios and large-scale text analysis.&lt;/p&gt;


&lt;h1&gt;
  
  
  Amazon Elastic Beanstalk
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Amazon Elastic Beanstalk&lt;/strong&gt; is a managed service that allows you to deploy and run web applications without the need to directly manage the underlying infrastructure. It automatically handles resource provisioning, load balancing, scaling, and monitoring, allowing the focus to remain on application development rather than operations.&lt;br&gt;
In this tutorial, we use &lt;strong&gt;Elastic Beanstalk&lt;/strong&gt; to deploy the application developed with &lt;strong&gt;Streamlit&lt;/strong&gt;, taking advantage of its native integration with services such as &lt;strong&gt;EC2&lt;/strong&gt;, &lt;strong&gt;Auto Scaling&lt;/strong&gt;, and &lt;strong&gt;CloudWatch&lt;/strong&gt;, which enables a fast, secure, and scalable deployment.&lt;/p&gt;

&lt;p&gt;Below is a summary of some of the main benefits of using this solution:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfruc4crs9yw9iw6jry7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfruc4crs9yw9iw6jry7.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h1&gt;
  
  
  📊 Dataset
&lt;/h1&gt;

&lt;p&gt;The dataset used in this tutorial was obtained from the &lt;strong&gt;Amazon Reviews 2023 project&lt;/strong&gt;, presented in the paper Bridging Language and Items for Retrieval and Recommendation (Hou et al., 2024). This dataset contains reviews and metadata for Amazon products, including titles, descriptions, categories, stores, and ratings.&lt;br&gt;
For this use case, only the &lt;strong&gt;“Grocery_and_Gourmet_Food”&lt;/strong&gt; category was selected, and within it, products related to coffee were filtered. This allows us to work with rich textual information and structured attributes that are ideal for semantic search scenarios.&lt;br&gt;
The project repository includes both the filtered coffee product datasets and the already processed versions containing vector embeddings, making it easier to reproduce the tutorial and analyze the complete workflow.&lt;/p&gt;


&lt;h1&gt;
  
  
  Use Case
&lt;/h1&gt;

&lt;p&gt;The use case presented in this tutorial starts from a simple but representative scenario: a user who wants to query &lt;strong&gt;coffee products&lt;/strong&gt; using &lt;strong&gt;natural language&lt;/strong&gt;, exploring the available catalog in a more flexible and intuitive way than a traditional search.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs7jteavn4iyfvf17nlut.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs7jteavn4iyfvf17nlut.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To enable this type of query, different textual attributes of the product are used, such as the &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;, and &lt;code&gt;category&lt;/code&gt;, which helps better capture user intent. Within the dataset, several coffee-related &lt;strong&gt;categories&lt;/strong&gt; are included, such as Coffee, Instant Coffee, Ground Coffee, Whole Coffee Beans, Single-Serve Capsules &amp;amp; Pods, Iced Coffee &amp;amp; Cold-Brew, among others.&lt;/p&gt;

&lt;p&gt;Based on this, an application is designed in which the user can interact primarily through natural language, while complementing the search with structured filters to reduce ambiguities. These filters include, for example, &lt;strong&gt;product rating&lt;/strong&gt;, &lt;strong&gt;store name&lt;/strong&gt; (a detail that users often do not know or remember precisely), and &lt;strong&gt;price&lt;/strong&gt;, allowing more accurate and relevant results without relying exclusively on a textual query.&lt;/p&gt;


&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;
&lt;h3&gt;
  
  
  (1) 🗂️ Code repository
&lt;/h3&gt;

&lt;p&gt;To follow this tutorial, it is necessary to &lt;strong&gt;clone the project repository&lt;/strong&gt;, where the complete solution code is available.&lt;br&gt;
In the following sections, the most relevant aspects of the implementation and design decisions are highlighted, rather than providing an exhaustive walkthrough of the entire source code.&lt;br&gt;
If you find this tutorial useful, do not forget to leave &lt;strong&gt;a star ⭐️&lt;/strong&gt; on the repository and follow me to receive notifications about new articles. Your support helps keep creating valuable technical content for the community 🚀&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/RominaElenaMendezEscobar" rel="noopener noreferrer"&gt;
        RominaElenaMendezEscobar
      &lt;/a&gt; / &lt;a href="https://github.com/RominaElenaMendezEscobar/s3-vector-coffee-tutorial" rel="noopener noreferrer"&gt;
        s3-vector-coffee-tutorial
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      S3 Vector tutorial using cafe data and creating a Streamlit app deployed on Elastic Beanstalk
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;&lt;a href="https://www.buymeacoffee.com/r0mymendez" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/b96fd4ea89ea15fcec30a4f86382eef0bbd17454aa3a8d4de8c8c5e92b55cf6c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4275792532304d6525323041253230436f666665652d737570706f72742532306d79253230776f726b2d4646444430303f7374796c653d666c6174266c6162656c436f6c6f723d313031303130266c6f676f3d6275792d6d652d612d636f66666565266c6f676f436f6c6f723d7768697465" alt="Buy Me A Coffee"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;From Coffee Products to AI Search: Building a Serverless Semantic Search Architecture with Amazon S3 Vectors and Bedrock&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/RominaElenaMendezEscobar/s3-vector-coffee-tutorial/img/1-preview.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FRominaElenaMendezEscobar%2Fs3-vector-coffee-tutorial%2Fimg%2F1-preview.png" alt="img"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In recent months, we have increasingly incorporated artificial intelligence into our solutions, and with it a recurring need has emerged: searching and querying our own data using natural language efficiently.&lt;/p&gt;
&lt;p&gt;Use cases such as semantic search or building solutions based on Retrieval-Augmented Generation (RAG) are no longer optional. Today, we need to understand the meaning of text, combine it with structured filters, and do so in an efficient and scalable way
In this article, I explore a recent alternative within the AWS ecosystem: Amazon S3 Vectors 🪣, a serverless approach for vector storage and querying that aims to balance scalability, simplicity, and cost.&lt;/p&gt;
&lt;p&gt;To make it more concrete (and a bit more entertaining)...we will work with a dataset of coffee products ☕ and build a complete flow that goes from generating embeddings…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/RominaElenaMendezEscobar/s3-vector-coffee-tutorial" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;







&lt;h3&gt;
  
  
  (2) 🪣 Create Amazon S3 buckets
&lt;/h3&gt;

&lt;p&gt;As part of this workflow, we need two Amazon S3 buckets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A standard bucket&lt;/strong&gt; to store raw and processed data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An Amazon S3 Vectors bucket&lt;/strong&gt; to store vectors and their metadata.
In this tutorial, the following names are used as references:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;AWS_BUCKET_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coffee-products-tutorial-full-data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;AWS_BUCKET_VECTOR_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coffee-products-tutorial&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;AWS_INDEX_VECTOR_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idx-coffee-products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  (2.1) 🪣 Creating the S3 Vectors bucket
&lt;/h4&gt;

&lt;p&gt;The first step is to create the &lt;strong&gt;vector bucket&lt;/strong&gt; from the Amazon S3 console, in the Vector buckets section, select Create vector bucket and define a unique name for the bucket.&lt;br&gt;
In the encryption configuration, you can use Amazon S3–managed encryption (SSE-S3), which is sufficient for this use case. It is worth noting that this setting cannot be modified later, so it is important to define it correctly from the beginning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0733eqqqeacx544zlg7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0733eqqqeacx544zlg7.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  (2.2) 🧭 Creating the vector index
&lt;/h4&gt;

&lt;p&gt;Once the bucket is created, the next step is to define a &lt;strong&gt;vector index&lt;/strong&gt;, which will be responsible for organizing and querying the vectors efficiently.&lt;/p&gt;

&lt;p&gt;During this configuration, three key aspects must be specified:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Index name&lt;/strong&gt;, which must be unique within the bucket.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector dimension&lt;/strong&gt;, which must match the output of the embeddings model (in this case, 1,024 dimensions for Amazon Titan).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distance metric&lt;/strong&gt;, where you can choose between cosine or Euclidean. For text embeddings, cosine similarity is usually the most commonly used option.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Like the bucket, the index also inherits the encryption configuration, and this cannot be modified once it has been created.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5k6vwaia33hbyx0l6eyh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5k6vwaia33hbyx0l6eyh.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h4&gt;
  
  
  (3) 🔐 Policies
&lt;/h4&gt;

&lt;p&gt;To work on this project, it is necessary to configure a set of &lt;strong&gt;IAM policies&lt;/strong&gt; that allow access to the different services involved in the workflow.&lt;/p&gt;

&lt;p&gt;In particular, the following are required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Titan policy:&lt;/strong&gt; allows invoking the Amazon Titan embeddings model through Amazon Bedrock to generate vectors from text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon S3 policy:&lt;/strong&gt; enables reading and writing data in the Amazon S3 bucket used to store raw and processed data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon S3 Vectors policy:&lt;/strong&gt; allows writing and querying vectors, along with their metadata, in the Amazon S3 Vectors bucket.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Finally, these policies are attached to an &lt;strong&gt;IAM role&lt;/strong&gt; that is used by the application deployed on &lt;strong&gt;AWS Elastic Beanstalk&lt;/strong&gt;, ensuring controlled and secure access to the required resources.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;All the policies mentioned are available in the &lt;strong&gt;project repository&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h1&gt;
  
  
  🛠️ Implementation Guide
&lt;/h1&gt;
&lt;h2&gt;
  
  
  ✅ Step 1: Dataset
&lt;/h2&gt;

&lt;p&gt;As mentioned earlier, we start from a dataset in &lt;strong&gt;JSON&lt;/strong&gt; format, which we download and then process into &lt;strong&gt;Parquet&lt;/strong&gt;, since this format is more efficient for reading, storage, and processing in data pipelines.&lt;br&gt;
The dataset used in this tutorial is available in my repository, inside the &lt;code&gt;data/&lt;/code&gt; folder.&lt;/p&gt;


&lt;h2&gt;
  
  
  ⚙️ Step 2: Process data (embedding generation)
&lt;/h2&gt;

&lt;p&gt;To generate the &lt;strong&gt;embeddings&lt;/strong&gt;, we use a class that I created to simplify the code and encapsulate the interaction with &lt;strong&gt;Amazon Bedrock&lt;/strong&gt;. By default, the class uses the amazon.titan-embed-text-v2:0 model, although the design allows it to be easily changed if you want to try another model.&lt;/p&gt;

&lt;p&gt;This class includes three main methods:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;create_client():&lt;/strong&gt; creates the Bedrock Runtime client with Boto3, using region and credentials.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;get_embeddings(text):&lt;/strong&gt; invokes the Titan model by sending the text and returns the generated vector.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;generate_embeddings_batch(texts):&lt;/strong&gt; generates embeddings in batches by iterating over a list of texts and showing progress with tqdm.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;EmbeddingsGenerator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amazon.titan-embed-text-v2:0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;
                &lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;
       &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt;
       &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;
       &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AWS_REGION&lt;/span&gt;


   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
               &lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;aws_access_key_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;aws_secret_access_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;
           &lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;

   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_embeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
       &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
           &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
               &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputText&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
           &lt;span class="p"&gt;})&lt;/span&gt;
       &lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="n"&gt;response_body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
       &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response_body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;

   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_embeddings_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="n"&gt;embeddings_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
       &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
           &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_embeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="n"&gt;embeddings_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;embeddings_list&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;To run it locally, you need a &lt;code&gt;.env&lt;/code&gt; file with your credentials and region:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;AWS_ACCESS_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YOUR_ACCESS_KEY
&lt;span class="nv"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YOUR_AWS_SECRET_ACCESS_KEY
&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YOUR_AWS_REGION
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And a minimal usage example would be the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;


&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="n"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AWS_ACCESS_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;AWS_REGION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AWS_REGION&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;emb_generator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;EmbeddingsGenerator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;input_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;instant coffee sweet creamy vanilla flavor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;emb_generator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_embeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;input_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🪣 Step 3: Store data (S3 + S3 Vectors)
&lt;/h2&gt;

&lt;p&gt;To simplify data ingestion, I created an &lt;strong&gt;S3&lt;/strong&gt; class that encapsulates access to both the standard S3 bucket and the &lt;strong&gt;Amazon S3 Vectors bucket&lt;/strong&gt;. The idea is to keep the code clean and reusable, separating connection logic from write logic.&lt;/p&gt;

&lt;p&gt;This class includes three main methods:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;create_client()&lt;/strong&gt;: creates a Boto3 client for the specified service (&lt;strong&gt;s3&lt;/strong&gt; or &lt;strong&gt;s3vectors&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;upload_file()&lt;/strong&gt;: uploads files to the standard S3 bucket (useful for raw and processed data).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;upload_vector_data()&lt;/strong&gt;: loads vectors into S3 Vectors using * &lt;strong&gt;put_vectors&lt;/strong&gt;, sending them in batches to respect the per-request limit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;query_embedding()&lt;/strong&gt;: enables semantic search by querying the vector index using an embedding and optional metadata filters, returning the most relevant results ranked by similarity.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;lass&lt;/span&gt; &lt;span class="n"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Class to handle S3 operations including uploading files and vector data&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;AWS_BUCKET_NAME&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;AWS_BUCKET_VECTOR_NAME&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;AWS_INDEX_VECTOR_NAME&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;
                &lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt;
       &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;
       &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AWS_REGION&lt;/span&gt;
       &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_BUCKET_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AWS_BUCKET_NAME&lt;/span&gt;
       &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_BUCKET_VECTOR_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AWS_BUCKET_VECTOR_NAME&lt;/span&gt;
       &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_INDEX_VECTOR_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AWS_INDEX_VECTOR_NAME&lt;/span&gt;


   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
       Create a boto3 client for the specified AWS service.
       &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
       &lt;span class="n"&gt;s3_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
           &lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;aws_access_key_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;aws_secret_access_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;
       &lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;s3_client&lt;/span&gt;


   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;object_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
       Upload a file to an S3 bucket.
       &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
       &lt;span class="n"&gt;s3_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
       &lt;span class="n"&gt;s3_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_BUCKET_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;object_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; uploaded to bucket &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_BUCKET_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; as &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;object_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upload_vector_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
       Upload vector data to S3 Vectors in batches with tqdm for progress tracking.
       batchsize: it is the number of vectors per batch to avoid exceeding maximum size.
       &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
       &lt;span class="n"&gt;s3_vector_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3vectors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


       &lt;span class="c1"&gt;# Helper for chunking data into batches
&lt;/span&gt;       &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chunked&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
           &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lst&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
               &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;lst&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;


       &lt;span class="n"&gt;batches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;chunked&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;


       &lt;span class="c1"&gt;# see the progress of the upload
&lt;/span&gt;       &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batches&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Uploading batches&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
           &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
               &lt;span class="n"&gt;s3_vector_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_vectors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                   &lt;span class="n"&gt;vectorBucketName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_BUCKET_VECTOR_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="n"&gt;indexName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_INDEX_VECTOR_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;
               &lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
               &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error uploading batch &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="n"&gt;filter_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Perform complete search with text and filters&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
       &lt;span class="n"&gt;s3_vector_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3vectors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

       &lt;span class="c1"&gt;# Prepare base parameters
&lt;/span&gt;       &lt;span class="n"&gt;query_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
           &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vectorBucketName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_BUCKET_VECTOR_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;indexName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AWS_INDEX_VECTOR_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;queryVector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float32&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
           &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;returnDistance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;returnMetadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
       &lt;span class="p"&gt;}&lt;/span&gt;

       &lt;span class="c1"&gt;# Only add filter if exists
&lt;/span&gt;       &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;filter_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
           &lt;span class="n"&gt;query_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;filter_data&lt;/span&gt;

       &lt;span class="c1"&gt;# Execute search
&lt;/span&gt;       &lt;span class="n"&gt;query_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3_vector_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_vectors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;query_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;query_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;vectors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To upload vectors to S3 Vectors, we first need to build the structure expected by &lt;strong&gt;put_vectors&lt;/strong&gt;. Each item must include a &lt;strong&gt;key&lt;/strong&gt; (a unique identifier in string format), the vector in data.float32, and a &lt;strong&gt;metadata&lt;/strong&gt; object with the attributes that we will later use as filters in queries.&lt;br&gt;
In addition, since &lt;strong&gt;no more than 100 vectors can be sent per request&lt;/strong&gt;, the upload is performed in batches controlled by the &lt;strong&gt;batch_size&lt;/strong&gt; parameter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;vector_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;


&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_coffee_filter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
   &lt;span class="n"&gt;vector_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
       &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_coffee_filter&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;  &lt;span class="c1"&gt;# always need to be string
&lt;/span&gt;       &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
           &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float32&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data_coffee_filter&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;embeddings&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
       &lt;span class="p"&gt;},&lt;/span&gt;
       &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
           &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;average&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_coffee_filter&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;average_rating&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
           &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rating_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_coffee_filter&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rating_number&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
           &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_coffee_filter&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
           &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shop_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_coffee_filter&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;shop_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
       &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="p"&gt;})&lt;/span&gt;


&lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_ACCESS_KEY_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_SECRET_ACCESS_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;AWS_BUCKET_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_BUCKET_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;AWS_BUCKET_VECTOR_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_BUCKET_VECTOR_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;AWS_INDEX_VECTOR_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_INDEX_VECTOR_NAME&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_vector_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🔍 Step 4: Retrieve (QueryVectors + filters)
&lt;/h2&gt;

&lt;p&gt;To retrieve results from Amazon S3 Vectors, the flow is always the same. First, we convert a natural language query into an embedding (vector) using the same model that was used during indexing. Then, we execute &lt;strong&gt;query_vectors&lt;/strong&gt;, passing that vector as &lt;strong&gt;queryVector&lt;/strong&gt;. From there, the service returns the &lt;strong&gt;top K&lt;/strong&gt; most similar vectors according to the distance metric configured in the index (&lt;code&gt;Cosine&lt;/code&gt; or &lt;code&gt;Euclidean&lt;/code&gt;) and optionally, we can apply metadata filters to reduce ambiguity and improve precision.&lt;/p&gt;

&lt;p&gt;The most important query_vectors parameters are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;queryVector&lt;/strong&gt;: the embedding of the search text (in the format {"float32": [...]}).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;topK&lt;/strong&gt;: how many results we want to retrieve.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;filter&lt;/strong&gt;: filters based on the metadata stored together with the vector (for example shop_name, average, price).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;returnDistance&lt;/strong&gt;: whether to return the distance or similarity for each result. This is useful for applying a &lt;strong&gt;threshold&lt;/strong&gt; and discarding results that are close but not very relevant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;returnMetadata&lt;/strong&gt;: whether to also return the metadata associated with the vector, to display information in the app or apply additional logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;To reduce the complexity of query implementation, a helper method is provided and encapsulated within the S3 utility class. This abstraction centralizes the interaction with Amazon S3 Vectors, simplifying semantic search execution and making the codebase cleaner, more reusable, and easier to maintain.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Amazon S3 Vectors, simplifying semantic search execution and making the codebase cleaner, more reusable, and easier to maintain.&lt;/p&gt;




&lt;h3&gt;
  
  
  Query Examples with Metadata Filters
&lt;/h3&gt;

&lt;h4&gt;
  
  
  🔎 Query by Single Metadata Field (Exact Match)
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Example: filter by &lt;code&gt;shop_name&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;filter_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shop_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nescafé&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;response&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;distance&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.41610199213027954&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;de46725d-ef52-47ca-80e2-f1ba82c0353d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;11.48&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;shop_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;nescafé&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;average&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;4.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rating_number&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;248&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
 &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;distance&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.47703248262405396&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;03915b9f-e592-40ec-b806-bd06b4213d90&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;13.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;average&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;3.6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;shop_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;nescafé&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rating_number&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;471&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
 &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;distance&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.514411211013794&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;5037ea28-b789-427a-9b1f-d825ad68dd2d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rating_number&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3052&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;shop_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;nescafé&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;average&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;4.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;17.75&lt;/span&gt;&lt;span class="p"&gt;}}]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  🔢 Query Using Comparison Operators
&lt;/h4&gt;

&lt;p&gt;In filters, you can use comparison operators, for example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;$gt&lt;/strong&gt;: greater than&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$gte&lt;/strong&gt;: greater than or equal&lt;/li&gt;
&lt;li&gt;(and others such as $lt, $lte, $eq, $ne depending on the case)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here you can find more information about the commands you can use:&lt;br&gt;
&lt;a href="https://docs.aws.amazon.com/es_es/AmazonS3/latest/userguide/s3-vectors-metadata-filtering.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/es_es/AmazonS3/latest/userguide/s3-vectors-metadata-filtering.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: average rating greater than or equal to 4.2&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;filter_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;average&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$gte&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;4.2&lt;/span&gt;&lt;span class="p"&gt;}})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  🔗 Query with Combined Conditions
&lt;/h4&gt;

&lt;p&gt;When you need more than one condition, you can combine filters with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;$and&lt;/strong&gt;: logical AND between multiple conditions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$or&lt;/strong&gt;: logical OR between multiple conditions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example: average rating ≥ 4.2 AND price ≤ 20&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;filter_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
       &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$and&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
           &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;average&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$gte&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;4.2&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
           &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$lte&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;20.0&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
       &lt;span class="p"&gt;]&lt;/span&gt;
   &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🖥️ Step 5: App (Streamlit)
&lt;/h2&gt;

&lt;p&gt;While developing this tutorial, I realized that although it is possible to run the entire flow directly from Python code, it is &lt;strong&gt;not the most comfortable approach for an end user&lt;/strong&gt;. For this reason, I decided to build &lt;strong&gt;a web application using Streamlit&lt;/strong&gt;, a framework that allows you to create interactive interfaces in Python with very few lines of code.&lt;/p&gt;

&lt;p&gt;In the repository, you will find a single file called &lt;strong&gt;app.py&lt;/strong&gt;, which contains all the application logic. This makes it easy to clearly see how embedding generation, querying Amazon S3 Vectors, and result visualization are integrated, while keeping the focus on a simple and straightforward flow.&lt;/p&gt;

&lt;p&gt;Streamlit provides an API with many interactive components such as text inputs, selectors, sliders, and chat-oriented elements. These components are ideal for this use case. For more details about the available components, you can check the official documentation:&lt;br&gt;
&lt;a href="https://docs.streamlit.io/develop/api-reference" rel="noopener noreferrer"&gt;https://docs.streamlit.io/develop/api-reference&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6jjg0ahs6eu5814pifs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6jjg0ahs6eu5814pifs.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2k85u2pampxjwplwdhn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2k85u2pampxjwplwdhn.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zye4vjcd74rjkh92lk1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zye4vjcd74rjkh92lk1.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvec10fp201kgzhzlnc80.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvec10fp201kgzhzlnc80.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  🚀 Step 6: Configure the project to deploy the app (Elastic Beanstalk)
&lt;/h2&gt;

&lt;p&gt;To deploy the application on &lt;strong&gt;AWS Elastic Beanstalk&lt;/strong&gt;, we will package the project into a &lt;code&gt;.zip&lt;/code&gt; with a specific structure. &lt;strong&gt;Beanstalk&lt;/strong&gt; uses these files to configure the environment, install dependencies, and define how the app is executed when the instance starts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;    app.zip
    |__ 📂.ebextensions/
    |    |__ 📄 iam-role.config
    |    |__ 📄 securitygroup.config
    |__ 📂img/
    |    |__🏞️ preview_app.png
    |__ 📄 .ebignore
    |__ 📄 app.py
    |__ 📄 Procfile
    |__ 📄requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  📁 .ebextensions/iam-role.config (instance IAM role)
&lt;/h3&gt;

&lt;p&gt;This file configures which &lt;strong&gt;IAM Instance Profile&lt;/strong&gt; the &lt;strong&gt;Elastic Beanstalk&lt;/strong&gt; instance will use. It is key because that role is what allows your app to have permissions to invoke Bedrock and query S3 and S3 Vectors (based on the policies you defined).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;option_settings:
  aws:autoscaling:launchconfiguration:
    IamInstanceProfile: ElasticBeanstalk-CoffeeApp-Role
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  🔒 .ebextensions/securitygroup.config (restrict access by IP)
&lt;/h3&gt;

&lt;p&gt;By default, the app is publicly accessible (depending on how the environment is configured). In this case, this configuration restricts access to the application only to your IP by adding inbound rules to the Beanstalk security group for HTTP (80) and HTTPS (443). This is useful in test environments or demos to prevent unwanted access.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Tip: you can get your public IP by searching “what is my ip” and replace &lt;strong&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Resources:
  httpSecurityGroupIngress: 
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      GroupId: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"Fn::GetAtt"&lt;/span&gt; : &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"AWSEBSecurityGroup"&lt;/span&gt;, &lt;span class="s2"&gt;"GroupId"&lt;/span&gt;&lt;span class="o"&gt;]}&lt;/span&gt;
      IpProtocol: tcp
      ToPort: 80
      FromPort: 80
      CidrIp: &amp;lt;your_ip&amp;gt;/32

  httpsSecurityGroupIngress:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      GroupId: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"Fn::GetAtt"&lt;/span&gt; : &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"AWSEBSecurityGroup"&lt;/span&gt;, &lt;span class="s2"&gt;"GroupId"&lt;/span&gt;&lt;span class="o"&gt;]}&lt;/span&gt;
      IpProtocol: tcp
      ToPort: 443
      FromPort: 443
      CidrIp: &amp;lt;your_ip&amp;gt;/32
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  🚫 .ebignore
&lt;/h3&gt;

&lt;p&gt;This file works like a .gitignore, but for deployment. It indicates which files should not be uploaded to Elastic Beanstalk. This helps avoid including credentials, system junk, or unnecessary files that increase the package size.&lt;/p&gt;




&lt;h3&gt;
  
  
  🖥️ app.py (Streamlit application)
&lt;/h3&gt;

&lt;p&gt;This is the main application file, where the Streamlit interface and the logic to generate &lt;code&gt;embeddings&lt;/code&gt;, query &lt;strong&gt;S3 Vectors&lt;/strong&gt;, and display results are defined. In this tutorial, the entire app lives in this single file to keep it simple and easy to follow.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧾 Procfile (startup command)
&lt;/h3&gt;

&lt;p&gt;Elastic Beanstalk needs to know which command to run to start your application. The &lt;strong&gt;Procfile&lt;/strong&gt; defines that entrypoint. In this case, we start &lt;strong&gt;Streamlit&lt;/strong&gt; listening on &lt;code&gt;0.0.0.0&lt;/code&gt; to accept external traffic, and using a port defined for the environment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;web: streamlit run app.py &lt;span class="nt"&gt;--server&lt;/span&gt;.port&lt;span class="o"&gt;=&lt;/span&gt;8000 &lt;span class="nt"&gt;--server&lt;/span&gt;.address&lt;span class="o"&gt;=&lt;/span&gt;0.0.0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  📦 requirements.txt (dependencies)
&lt;/h3&gt;

&lt;p&gt;This file lists the libraries required for the app to run. Beanstalk installs them automatically during deployment.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Step 7: Deploy the solution
&lt;/h2&gt;

&lt;h3&gt;
  
  
  (1) Create a new application
&lt;/h3&gt;

&lt;p&gt;In this step, a new application is created in &lt;strong&gt;AWS Elastic Beanstalk&lt;/strong&gt;, which acts as the logical container for the project.&lt;br&gt;
You only need to define an &lt;strong&gt;application&lt;/strong&gt; &lt;strong&gt;name&lt;/strong&gt; and, optionally, a short &lt;strong&gt;description&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  (2) Environment
&lt;/h3&gt;

&lt;p&gt;In this step, &lt;strong&gt;the environment&lt;/strong&gt; where the application will be deployed is configured. For this use case, a &lt;strong&gt;Web server environment&lt;/strong&gt; is selected, since it is a web application built with Streamlit that exposes an HTTP interface for users.&lt;/p&gt;

&lt;p&gt;By default, Elastic Beanstalk suggests an &lt;strong&gt;environment name&lt;/strong&gt; based on the application name, which is sufficient for this tutorial. This environment will be responsible for running the app, handling traffic, and applying scaling and monitoring configurations in the following steps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8nxxcvn4qxgwiwrqfjzo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8nxxcvn4qxgwiwrqfjzo.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  (2) Environment – Step 1: Configure environment
&lt;/h4&gt;

&lt;p&gt;In this step, the basic environment parameters are defined:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Environment tier:&lt;/strong&gt; select Web server environment, since the application exposes a web interface over HTTP.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Application name:&lt;/strong&gt; automatically filled with the name defined in the previously created application.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment name:&lt;/strong&gt; name of the environment; the default suggested value can be used.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain:&lt;/strong&gt;  can be left empty so that Elastic Beanstalk automatically generates the subdomain.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Platform: Python&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Platform branch&lt;/code&gt;: Python 3.11 running on 64bit Amazon Linux 2023&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Platform version&lt;/code&gt;: leave the default recommended version.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Application code:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select Upload your code.&lt;/li&gt;
&lt;li&gt;Upload the &lt;code&gt;.zip&lt;/code&gt; file generated previously.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Presets:&lt;/strong&gt; Select Single instance (free tier eligible) for this tutorial.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;--&lt;/p&gt;

&lt;h4&gt;
  
  
  (2) Environment – Step 2: Configure service access
&lt;/h4&gt;

&lt;p&gt;In this step, the &lt;strong&gt;IAM roles&lt;/strong&gt; that allow Elastic Beanstalk and EC2 instances to access AWS resources are configured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Service role&lt;/strong&gt;: the role that Elastic Beanstalk uses to create and manage the environment (Auto Scaling, Load Balancer, logs, etc.).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EC2 instance profile:&lt;/strong&gt; the role used by the EC2 instances where the application runs.This role must include the necessary policies to access Amazon Bedrock, Amazon S3, and Amazon S3 Vectors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EC2 key pair (optional):&lt;/strong&gt; can be omitted if SSH access to the instances is not required.
With this configuration, the application is correctly authorized to interact with AWS services in a secure manner.&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  (2) Environment – Step 3: Set up networking, database, and tags (optional)
&lt;/h4&gt;

&lt;p&gt;In this step, the network where the environment will run is configured. For this tutorial, the default VPC values are used, making only the following adjustments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;VPC&lt;/strong&gt;: select the account’s default VPC to simplify the configuration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Public IP address:&lt;/strong&gt; enable it so the application is accessible from the Internet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instance subnets:&lt;/strong&gt; select two subnets in different Availability Zones, as shown in the image.
Selecting more than one subnet allows Elastic Beanstalk to distribute instances across multiple Availability Zones, improving resilience and fault tolerance, even when using a simple deployment for tests or demos.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The remaining options (database and tags) can be left unconfigured for this use case.&lt;/p&gt;




&lt;h4&gt;
  
  
  (2) Environment – Step 4: Configure instance traffic and scaling
&lt;/h4&gt;

&lt;p&gt;In this step, how the application runs and what type of resources it uses are defined:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Environment type&lt;/strong&gt;: select Single instance, which is sufficient for this tutorial and helps reduce costs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fleet composition&lt;/strong&gt;: use On-Demand instance, avoiding the complexity of Spot instances.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture&lt;/strong&gt;: choose x86_64 to ensure compatibility with all Python dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instance type&lt;/strong&gt;: select a lightweight type such as t3.small, suitable for running a low-consumption Streamlit application.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring and metadata&lt;/strong&gt;: keep the default values, enabling CloudWatch metrics and using IMDSv2.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This configuration allows the application to be deployed in a simple, stable, and cost-effective way, ideal for tests, demos, and development environments.&lt;/p&gt;




&lt;h4&gt;
  
  
  (2) Environment – Step 5: Configure updates, monitoring, and logging
&lt;/h4&gt;

&lt;p&gt;In this step, monitoring, update, and observability options for the environment are configured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring:&lt;/strong&gt; enable basic or enhanced monitoring so Elastic Beanstalk reports instance metrics to CloudWatch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health reporting:&lt;/strong&gt; allows you to visualize the application status and detect failures early.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed platform updates:&lt;/strong&gt; automatic environment updates (minor and patch) can be enabled during a defined weekly window.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Email notifications:&lt;/strong&gt; allows configuring an email address to receive notifications about relevant environment events.&lt;/li&gt;
&lt;li&gt;**Rolling updates and deployments: **defines how deployments and configuration changes are applied (for this tutorial, default values can be used).&lt;/li&gt;
&lt;li&gt;**Logs: **enable sending instance logs to CloudWatch Logs to facilitate debugging and observability.&lt;/li&gt;
&lt;li&gt;*&lt;em&gt;Environment properties: *&lt;/em&gt; here you can define environment variables required by the application (for example AWS region, bucket names, or other configuration values the app needs).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this configuration, the environment is prepared to operate in a stable and observable way, with controlled updates and no additional adjustments required for this use case.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Step 7: Validate the application deployment (Elastic Beanstalk)
&lt;/h2&gt;

&lt;p&gt;Once the application is deployed, it is important to validate that everything is working correctly:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmvjnbtwqppluq8jhhmn1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmvjnbtwqppluq8jhhmn1.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  (1) Environment status
&lt;/h3&gt;

&lt;p&gt;The first step is to verify that the environment status is &lt;strong&gt;Health: OK&lt;/strong&gt;. This indicates that Elastic Beanstalk was able to start the application correctly and that no critical errors were detected during deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  (2) Application access
&lt;/h3&gt;

&lt;p&gt;If the status is correct, you can click on the** environment domain** to access the application from the browser and confirm that the Streamlit interface loads correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  (3) Log review
&lt;/h3&gt;

&lt;p&gt;If the application does not work as expected or the status is not OK, go to the Logs tab. From there, you can &lt;strong&gt;request logs&lt;/strong&gt;, and it is recommended to download &lt;strong&gt;the last 100 records&lt;/strong&gt; to make error analysis easier.&lt;/p&gt;

&lt;h3&gt;
  
  
  (4) Deploy a new version
&lt;/h3&gt;

&lt;p&gt;If an issue is detected in the logs and the code needs to be fixed, you can deploy a new version using the &lt;strong&gt;Upload and deploy&lt;/strong&gt; button. In this step, you only need to upload the updated &lt;code&gt;.zip&lt;/code&gt; file and assign a new application version.&lt;/p&gt;




&lt;h1&gt;
  
  
  🧩 Conclusions
&lt;/h1&gt;

&lt;p&gt;This tutorial presents a complete workflow for processing and querying data through semantic search, where it is essential not to lose sight of &lt;strong&gt;best practices&lt;/strong&gt; in data cleaning and the correct definition of metadata. &lt;strong&gt;Metadata&lt;/strong&gt; plays a fundamental role in guiding searches, reducing the amount of information queried, and significantly improving the relevance of results.&lt;/p&gt;

&lt;p&gt;During the tests performed,** query performance** was notably fast, to the point that in some cases the spinner implemented in the application barely had time to appear. This shows that &lt;strong&gt;Amazon S3 Vectors&lt;/strong&gt; can deliver suitable performance even for interactive, end-user–oriented scenarios.&lt;/p&gt;

&lt;p&gt;When exploring the &lt;strong&gt;Boto3 API&lt;/strong&gt;, it becomes apparent that some features commonly found in traditional databases are still missing, such as aggregated statistics or an equivalent of &lt;strong&gt;count(*)&lt;/strong&gt;. Currently, to determine the number of stored vectors, it is necessary to use operations like &lt;strong&gt;list_vectors&lt;/strong&gt; with pagination. This suggests that, as a relatively new feature, there are clear opportunities for improvement in future versions of the service.&lt;/p&gt;

&lt;p&gt;On the other hand, &lt;strong&gt;AWS Elastic Beanstalk&lt;/strong&gt; proves to be a very good solution for deploying this type of application quickly and easily. However, in production scenarios, combining it with tools such as &lt;strong&gt;Terraform&lt;/strong&gt; and &lt;strong&gt;CI/CD&lt;/strong&gt; pipelines would allow deployments to be automated and manual intervention to be further reduced. In this tutorial, a console-based deployment was chosen to keep complexity under control and focus on the main use case.&lt;/p&gt;

&lt;p&gt;Finally, this approach demonstrates how unstructured &lt;strong&gt;text analysis&lt;/strong&gt; use cases, combined with structured data, offer a very compelling balance. In particular, building a chat-like interface that does not rely exclusively on &lt;strong&gt;natural language&lt;/strong&gt;, but also incorporates explicit filters, makes it possible to create a hybrid model that improves precision, reduces ambiguity, and enriches the search experience.&lt;/p&gt;




&lt;h1&gt;
  
  
  📚 References
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Web Services.&lt;/strong&gt; (s.f.). Amazon S3 Vectors: Revolutionizing AI data storage with use cases. AWS re:Post.
&lt;a href="https://repost.aws/articles/ARY9EKiGFISfisAyvigDX3lQ/amazon-s3-vectors-revolutionizing-ai-data-storage-with-use-cases" rel="noopener noreferrer"&gt;https://repost.aws/articles/ARY9EKiGFISfisAyvigDX3lQ/amazon-s3-vectors-revolutionizing-ai-data-storage-with-use-cases&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Web Services.&lt;/strong&gt; (s.f.). Amazon S3 Vectors.
&lt;a href="https://aws.amazon.com/es/s3/features/vectors/" rel="noopener noreferrer"&gt;https://aws.amazon.com/es/s3/features/vectors/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Web Services.&lt;/strong&gt; (s.f.). Vector buckets for Amazon S3.
&lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-buckets-details.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-buckets-details.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Web Services.&lt;/strong&gt; (s.f.). Metadata filtering for Amazon S3 Vectors.
&lt;a href="https://docs.aws.amazon.com/es_es/AmazonS3/latest/userguide/s3-vectors-metadata-filtering.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/es_es/AmazonS3/latest/userguide/s3-vectors-metadata-filtering.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hou, Y., Li, J., He, Z., Yan, A., Chen, X., &amp;amp; McAuley, J.&lt;/strong&gt; (2024). Bridging language and items for retrieval and recommendation.
&lt;a href="https://amazon-reviews-2023.github.io/" rel="noopener noreferrer"&gt;https://amazon-reviews-2023.github.io/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streamlit Inc. (s.f.).&lt;/strong&gt; Streamlit API reference.
&lt;a href="https://docs.streamlit.io/develop/api-reference" rel="noopener noreferrer"&gt;https://docs.streamlit.io/develop/api-reference&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Web Services. (s.f.).&lt;/strong&gt; Amazon S3 Vectors.
&lt;a href="https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3vectors.html" rel="noopener noreferrer"&gt;https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3vectors.html&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;







&lt;h3&gt;
  
  
  📌 How to cite this article
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;APA style&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Mendez Escobar, Romina Elena. (2025). &lt;strong&gt;From Coffee Products to AI Search: Building a Serverless Semantic Search Architecture with Amazon S3 Vectors and Bedrock&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
&lt;a href="https://dev.to/aws-builders/from-coffee-products-to-ai-search-building-a-serverless-semantic-search-architecture-with-amazon-5g5b"&gt;https://dev.to/aws-builders/from-coffee-products-to-ai-search-building-a-serverless-semantic-search-architecture-with-amazon-5g5b&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BibTeX&lt;/strong&gt;&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
@article{mendez2025aiawscoffee,
  title  = {From Coffee Products to AI Search: Building a Serverless Semantic Search Architecture with Amazon S3 Vectors and Bedrock},
  author = {Mendez Escobar, Romina Elena},
  year   = {2025},
  url    = {https://dev.to/aws-builders/from-coffee-products-to-ai-search-building-a-serverless-semantic-search-architecture-with-amazon-5g5b}
}



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>python</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Data-Driven Project Analysis: Analyzing Trello Kanban Projects with AI on AWS Bedrock</title>
      <dc:creator>Romina Elena Mendez Escobar</dc:creator>
      <pubDate>Tue, 23 Dec 2025 11:22:10 +0000</pubDate>
      <link>https://dev.to/aws-builders/data-driven-project-analysis-analyzing-trello-kanban-projects-with-ai-on-aws-bedrock-15f4</link>
      <guid>https://dev.to/aws-builders/data-driven-project-analysis-analyzing-trello-kanban-projects-with-ai-on-aws-bedrock-15f4</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Modern software projects often involve multiple distributed teams working on high-complexity initiatives, with frequent releases and ongoing production fixes. While tools like Kanban boards help organize tasks, epics, and workflows, they also generate large volumes of unstructured data in the form of comments, status changes, and timelines.&lt;br&gt;
As the number of interdependent tasks and contributors grows, understanding the real state of a project, and identifying early risks or bottlenecks, becomes increasingly difficult. As a result, manual analysis is time-consuming and often subjective, limiting timely and objective decision-making.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fesn1w1ecuclllif54qcf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fesn1w1ecuclllif54qcf.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this article, I present a practical use case that leverages AWS services and generative AI to enhance project analysis and interpretation. By analyzing task metadata and detecting semantic patterns in comments (such as ambiguity, implicit dependencies, missing definitions, or scope creep) AI enables more objective insights, early warnings, and data-driven decision-making&lt;/p&gt;


&lt;h1&gt;
  
  
  Understanding Kanban Board and Trello
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Kanban&lt;/strong&gt; is a visual project management methodology that originated in Toyota’s manufacturing system. It focuses on limiting work in progress and enabling continuous delivery by representing work items across different stages of a workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trello&lt;/strong&gt; is a widely used web-based project management tool that implements &lt;strong&gt;Kanban principles&lt;/strong&gt; through &lt;code&gt;boards&lt;/code&gt;, &lt;code&gt;lists&lt;/code&gt;, and &lt;code&gt;cards&lt;/code&gt;. Each card typically represents a task, feature, or user story, and includes not only a status but also descriptive text, comments, and historical changes over time.&lt;br&gt;
While Kanban boards are primarily designed for human collaboration, they also generate a rich source of textual and contextual data that can be analyzed programmatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatd9nn8wtka3bhxine6k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatd9nn8wtka3bhxine6k.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  User Stories as a Data Structure
&lt;/h2&gt;

&lt;p&gt;A well-defined user story usually follows a consistent structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Who&lt;/strong&gt;: the requester (As a…)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What&lt;/strong&gt;: the objective (I want to…)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why&lt;/strong&gt;: the purpose (So that…)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acceptance Criteria&lt;/strong&gt;: explicit conditions for completion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7qgp0qi40sysunylpl5v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7qgp0qi40sysunylpl5v.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This structure is not only useful for aligning teams, it also provides a clear semantic pattern that can be leveraged by AI models. When tasks are written consistently, the model can more easily understand intent, scope, dependencies, and completion expectations.&lt;br&gt;
In other words, &lt;strong&gt;writing better user stories improves both human understanding and machine interpretation&lt;/strong&gt;, making it a best practice for data-driven project analysis.&lt;/p&gt;


&lt;h1&gt;
  
  
  AWS Bedrock and Amazon Nova
&lt;/h1&gt;

&lt;p&gt;For this tutorial, we leverage Amazon’s generative AI services, which provide a variety of pre-trained foundation models accessible through a single, unified platform.&lt;br&gt;
&lt;strong&gt;AWS Bedrock&lt;/strong&gt; is a fully managed service that allows developers to build, deploy, and scale AI-powered applications without the overhead of managing infrastructure. It provides seamless access to state-of-the-art foundation models from leading AI providers, all through a simple API.&lt;br&gt;
For our implementation, we use &lt;strong&gt;Amazon Nova&lt;/strong&gt;, AWS’s family of foundation models designed for tasks such as text generation, analysis, and summarization. In particular, &lt;strong&gt;Nova Lite offers&lt;/strong&gt; a balanced combination of ⚡️performance and 💰cost-efficiency, making it ideal for analyzing project data and generating actionable insights.&lt;br&gt;
In the following sections, we will demonstrate how to implement this service in Python, showing how AI can be applied to extract meaningful insights from Kanban project data.&lt;/p&gt;


&lt;h1&gt;
  
  
  Reference Architecture
&lt;/h1&gt;

&lt;p&gt;Before diving into the implementation details, it is useful to understand the overall architecture that supports this use case. The following reference architecture illustrates how project data flows from Trello through AWS services and into an AI-powered analysis pipeline.&lt;/p&gt;

&lt;p&gt;The entire process is executed through an AWS Glue job implemented in Python, which orchestrates data extraction, transformation, AI inference, and report generation in a scalable and automated manner. &lt;/p&gt;

&lt;p&gt;At a high level, the architecture ingests Kanban project data from Trello, enriches it with temporal and contextual metadata, applies semantic analysis using generative AI models on AWS Bedrock, and produces structured, human-readable reports for project stakeholders.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzb03plv9bn7xa7xh1je.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzb03plv9bn7xa7xh1je.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Core Components
&lt;/h2&gt;
&lt;h4&gt;
  
  
  (1). 📋Trello Integration Class
&lt;/h4&gt;

&lt;p&gt;Connects to Trello boards via the Trello API&lt;br&gt;
Retrieves boards, lists, and cards with enriched metadata&lt;br&gt;
Calculates time-based metrics (e.g., days until due date)&lt;br&gt;
Exports structured data to Amazon S3 in JSON format&lt;/p&gt;
&lt;h4&gt;
  
  
  (2). ✨AWS Bedrock Integration
&lt;/h4&gt;

&lt;p&gt;Invokes the Amazon Nova model using custom prompts&lt;br&gt;
Processes project datasets to generate semantic insights&lt;br&gt;
Uses configurable inference parameters to balance cost and accuracy&lt;/p&gt;
&lt;h4&gt;
  
  
  (3).📊 Report Generation (MarkdownPDFReport)
&lt;/h4&gt;

&lt;p&gt;Converts AI-generated markdown into professional PDF reports&lt;br&gt;
Applies custom styling for readability and consistency&lt;br&gt;
Supports tables, lists, and structured summaries&lt;/p&gt;
&lt;h4&gt;
  
  
  (4). Supporting Services
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;🔐 &lt;strong&gt;AWS Secrets Manager&lt;/strong&gt;: securely stores Trello API credentials&lt;/li&gt;
&lt;li&gt;🪣 &lt;strong&gt;Amazon S3&lt;/strong&gt;: stores datasets, prompts, and generated reports&lt;/li&gt;
&lt;li&gt;📩 &lt;strong&gt;Amazon SES&lt;/strong&gt;: distributes automated reports via email&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  Implementation Guide
&lt;/h1&gt;

&lt;p&gt;The use case presented in this guide is based on a simulated Trello board representing a e-commerce software project. The board includes typical development activities such as feature implementation, backlog items, in-progress tasks, and delivery milestones, closely mirroring how Kanban is used in production environments.&lt;br&gt;
This example is intentionally designed to resemble a realistic project scenario, allowing us to analyze both structured data (task metadata, statuses, due dates) and unstructured data (descriptions and comments). The following diagram illustrates the initial project setup and serves as the input for the implementation steps described in the next sections.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fstp0kct4qtjyy6606xco.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fstp0kct4qtjyy6606xco.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before running the solution, a few &lt;strong&gt;AWS&lt;/strong&gt; and &lt;strong&gt;Trello&lt;/strong&gt; prerequisites must be in place. These prerequisites ensure secure access to project data, proper execution of the Glue job, and automated report delivery.&lt;/p&gt;
&lt;h3&gt;
  
  
  (1). 🔑 Trello API credentials
&lt;/h3&gt;

&lt;p&gt;To access Trello boards and cards programmatically, you need valid Trello API credentials, consisting of an API key and an access token.&lt;/p&gt;
&lt;h5&gt;
  
  
  Step 1: Obtain the API key
&lt;/h5&gt;

&lt;p&gt;The API key can be generated from the Trello Power-Ups administration page:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;https://trello.com/power-ups/admin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h5&gt;
  
  
  Step 2: Generate the access token
&lt;/h5&gt;

&lt;p&gt;Once you have the API key, you must authorize your application and generate a token using the following endpoint (replace {API_KEY} with your own key):&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;https://trello.com/1/authorize?expiration&lt;span class="o"&gt;=&lt;/span&gt;never&amp;amp;name&lt;span class="o"&gt;=&lt;/span&gt;MyApp&amp;amp;scope&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;read&lt;/span&gt;,write&amp;amp;response_type&lt;span class="o"&gt;=&lt;/span&gt;token&amp;amp;key&lt;span class="o"&gt;={&lt;/span&gt;API_KEY&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;This authorization flow grants read and write access to Trello resources and returns a token that will be used by the application to query boards, lists, cards, and comments. Both the API key and token should be treated as sensitive credentials.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  (2). ⚙️ AWS IAM role
&lt;/h3&gt;

&lt;p&gt;On the AWS side, an IAM role is required to execute the AWS Glue job and interact with the supporting services used in this solution.&lt;br&gt;
The role must include permissions for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;AWS Glue&lt;/code&gt; (job execution)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Amazon S3&lt;/code&gt; (data storage and retrieval)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AWS Secrets Manager&lt;/code&gt; (secure storage of Trello credentials)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Amazon Bedrock&lt;/code&gt; (AI model)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Amazon SES&lt;/code&gt; (email delivery)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;A complete example IAM policy with the required permissions is provided in the project repository. You can attach this policy to the IAM role used by the Glue job to ensure the pipeline runs end to end without permission issues.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  (3). 📩 Amazon SES configuration
&lt;/h3&gt;

&lt;p&gt;Finally, Amazon Simple Email Service (SES) must be configured to enable automated report delivery.&lt;br&gt;
This includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;☑️ Verifying at least one sender email address or domain (SES identities)&lt;/li&gt;
&lt;li&gt;☑️ Ensuring your AWS account has sufficient sending limits&lt;/li&gt;
&lt;li&gt;☑️ Confirming the SES region matches the region used by the Glue job&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once configured, SES will be used to send the generated PDF reports to stakeholders automatically as part of the pipeline execution.&lt;/p&gt;


&lt;h1&gt;
  
  
  Implementation Steps
&lt;/h1&gt;

&lt;p&gt;The following steps describe the end-to-end implementation of the solution, from secure credential management to AI-driven analysis and automated report distribution.&lt;/p&gt;
&lt;h2&gt;
  
  
  🔐 Step 1: Configure Secrets Manager
&lt;/h2&gt;

&lt;p&gt;Store your Trello credentials securely in AWS Secrets Manager and this avoids hardcoding sensitive information and follows AWS security best practices. For this reason the secret should contain the Trello API key and token in JSON format.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1pbsy6o14tivnrx5yji.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj1pbsy6o14tivnrx5yji.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  ⚙️ Step 2: Set Up the AWS Glue Environment
&lt;/h2&gt;

&lt;p&gt;For this tutorial, the solution is implemented using an AWS Glue Python notebook, which provides a fully managed, serverless environment for running data processing jobs. Therefore, the complete source code is available in the project repository, because in the following sections shighlights the most relevant implementation details and design decisions rather than providing a full code walkthrough.&lt;/p&gt;

&lt;p&gt;If you find this tutorial helpful, feel free to leave a star ⭐️ and follow me to get notified about new articles. Your support helps me grow within the tech community and create more valuable content! 🚀&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/RominaElenaMendezEscobar" rel="noopener noreferrer"&gt;
        RominaElenaMendezEscobar
      &lt;/a&gt; / &lt;a href="https://github.com/RominaElenaMendezEscobar/aws-trello-ai-tutorial" rel="noopener noreferrer"&gt;
        aws-trello-ai-tutorial
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      End-to-end AWS Glue pipeline for extracting Trello Kanban data, analyzing it with Amazon Bedrock, and generating automated PDF reports.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;&lt;a href="https://www.buymeacoffee.com/r0mymendez" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/b96fd4ea89ea15fcec30a4f86382eef0bbd17454aa3a8d4de8c8c5e92b55cf6c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4275792532304d6525323041253230436f666665652d737570706f72742532306d79253230776f726b2d4646444430303f7374796c653d666c6174266c6162656c436f6c6f723d313031303130266c6f676f3d6275792d6d652d612d636f66666565266c6f676f436f6c6f723d7768697465" alt="Buy Me A Coffee"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;🏷️ Data-Driven Project Analysis: Analyzing Trello Kanban Projects with AI on AWS Bedrock&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Introduction&lt;/h2&gt;
&lt;/div&gt;

&lt;p&gt;Modern software projects often involve multiple distributed teams working on high-complexity initiatives, with frequent releases and ongoing production fixes. While tools like Kanban boards help organize tasks, epics, and workflows, they also generate large volumes of unstructured data in the form of comments, status changes, and timelines
As the number of interdependent tasks and contributors grows, understanding the real state of a project, and identifying early risks or bottlenecks, becomes increasingly difficult. Manual analysis is time-consuming and often subjective.&lt;/p&gt;

&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/RominaElenaMendezEscobar/aws-trello-ai-tutorial/img/trello-aws-preview.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FRominaElenaMendezEscobar%2Faws-trello-ai-tutorial%2Fimg%2Ftrello-aws-preview.png" alt="preview"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this repository, I present a practical use case that leverages AWS services and generative AI to enhance project analysis and interpretation. By analyzing task metadata and detecting semantic patterns in comments (such as ambiguity, implicit dependencies, missing definitions, or scope creep) AI enables more objective insights, early warnings, and data-driven decision-making&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;🗂️ Folder Structure&lt;/h2&gt;

&lt;/div&gt;
&lt;p&gt;The repository…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/RominaElenaMendezEscobar/aws-trello-ai-tutorial" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;







&lt;h3&gt;
  
  
  📦 Step 2.1: Installing Additional Python Packages
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AWS Glue&lt;/strong&gt; comes with a predefined Python environment, but this solution requires additional libraries to interact with AWS services, process text, and generate reports.&lt;/p&gt;

&lt;p&gt;The following directive installs the required dependencies at runtime:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Install required Python packages&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;%additional_python_modules boto3==1.34.34,botocore==1.34.34,markdown==3.5.2,beautifulsoup4==4.12.3,reportlab==4.0.8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;These packages are used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;boto3 / botocore&lt;/strong&gt;: AWS SDK for Python, used to interact with services such as S3, Secrets Manager, Bedrock, and SES&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;markdown&lt;/strong&gt;: Converts AI-generated Markdown into HTML&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;beautifulsoup4&lt;/strong&gt;: Parses and transforms HTML content before PDF generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;reportlab&lt;/strong&gt;: Generates styled PDF documents programmatically
Installing only the required dependencies helps keep the Glue job lightweight and efficient.&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  📋 Step 2.2: Trello Data Extraction Class
&lt;/h3&gt;

&lt;p&gt;The Trello class encapsulates all interactions with the Trello REST API and is responsible for retrieving, enriching, and preparing project data for AI analysis.&lt;/p&gt;
&lt;h4&gt;
  
  
  Key input parameters
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BUCKET_NAME&lt;/strong&gt;: Target S3 bucket for exporting processed data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API_KEY / API_TOKEN&lt;/strong&gt;: Trello credentials retrieved securely from Secrets Manager&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S3&lt;/strong&gt;: Helper class instance used to write data to Amazon S3&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Dataset design considerations
&lt;/h4&gt;

&lt;p&gt;Although Trello provides a large number of fields, the implementation intentionally selects a &lt;strong&gt;minimal but meaningful subset&lt;/strong&gt; of columns:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;self.DATAFRAME_COLUMNS = [
    'id', 'dueComplete', 'desc', 'listName', 'name',
    'start', 'checkItems', 'checkItemsChecked', 'due', 'time_to_due']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This design choice offers several benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduces token usage during AI inference (lower cost)&lt;/li&gt;
&lt;li&gt;Avoids passing empty or unused fields&lt;/li&gt;
&lt;li&gt;Improves model focus and processing efficiency&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Temporal enrichment
&lt;/h4&gt;

&lt;p&gt;The class automatically calculates the number of days remaining until each task’s due date (time_to_due). This temporal context helps the AI model reason about urgency, delays, and potential risks.&lt;br&gt;
Finally, the data can be exported to Amazon S3 in CSV format or returned as filtered JSON, typically limited to tasks in To Do and Doing states.&lt;/p&gt;


&lt;h3&gt;
  
  
  🧩 Step 2.3: AWS Helper Classes (boto3 Abstractions)
&lt;/h3&gt;

&lt;p&gt;To keep the AWS Glue notebook readable, modular, and maintainable, all AWS service interactions are encapsulated into small helper classes built on top of boto3.&lt;/p&gt;
&lt;h4&gt;
  
  
  aws_s3
&lt;/h4&gt;

&lt;p&gt;Handles all Amazon S3 operations, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reading prompt templates and input files&lt;/li&gt;
&lt;li&gt;Writing intermediate datasets&lt;/li&gt;
&lt;li&gt;Persisting generated PDF reports&lt;/li&gt;
&lt;li&gt;Automatically partitioning outputs by execution date&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  aws_secrets_manager
&lt;/h4&gt;

&lt;p&gt;Responsible for securely retrieving sensitive configuration from AWS Secrets Manager, in our use case is the Trello API credentials.&lt;/p&gt;
&lt;h4&gt;
  
  
  aws_ses
&lt;/h4&gt;

&lt;p&gt;Manages email delivery workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reads the generated PDF report from S3&lt;/li&gt;
&lt;li&gt;Renders an HTML email body (template stored in the repository)&lt;/li&gt;
&lt;li&gt;Attaches the PDF report&lt;/li&gt;
&lt;li&gt;Sends emails to configured recipients&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  🧠 Step 2.4: AWS Bedrock Integration and Inference Strategy
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;AWSBedrock&lt;/code&gt; class manages the interaction with &lt;strong&gt;Amazon Bedrock&lt;/strong&gt;, invoking the &lt;strong&gt;Amazon Nova&lt;/strong&gt; Lite model to analyze Trello project data.&lt;/p&gt;
&lt;h4&gt;
  
  
  Model inputs
&lt;/h4&gt;

&lt;p&gt;The model receives:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A filtered dataset (JSON) containing only relevant tasks and fields&lt;/li&gt;
&lt;li&gt;A custom prompt defining the analysis objectives, expected insights, and report structure&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both the dataset and the prompt can be adjusted to fit different team practices or project types. The prompt used in this tutorial is provided in the repository as a reference example.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AWSBedrock&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;PROMPT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;DATASET&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
&lt;span class="n"&gt;REGION&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amazon.nova-lite-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PROMPT&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DATASET&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt_final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;REGION&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_bedrock_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;bedrock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_payload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt_final&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inferenceConfig&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_new_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;bedrock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_bedrock_client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_payload&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;response_body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response_body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
             &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Inference configuration
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inferenceConfig&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_new_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;max_new_tokens (5000)&lt;/strong&gt;: Allows the model to generate detailed, structured reports&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;temperature (0.4)&lt;/strong&gt;: Ensures consistent and reliable analysis while preserving enough flexibility to detect patterns and nuances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;top_p (0.9)&lt;/strong&gt;: Enables controlled diversity in model responses&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;A temperature of &lt;strong&gt;0.4&lt;/strong&gt; was selected after iterative testing, as higher values introduced unnecessary variability, while lower values reduced the model’s ability to surface implicit risks and insights.&lt;br&gt;
Before finalizing this configuration, multiple test runs were performed, refining both the dataset and the prompt to ensure the output aligned with the intended project analysis goals.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you want to learn more about how these parameters work, I've included this article.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/r_elena_mendez_escobar/genai-foundations-chapter-2-prompt-engineering-in-action-unlocking-better-ai-responses-l28" class="crayons-story__hidden-navigation-link"&gt;GenAI Foundations – Chapter 2: Prompt Engineering in Action – Unlocking Better AI Responses&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/r_elena_mendez_escobar" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F719582%2F2d700dae-2335-4c2f-9a32-4435184a4f4f.jpeg" alt="r_elena_mendez_escobar profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/r_elena_mendez_escobar" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Romina Elena Mendez Escobar
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Romina Elena Mendez Escobar
                
              
              &lt;div id="story-author-preview-content-2828216" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/r_elena_mendez_escobar" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F719582%2F2d700dae-2335-4c2f-9a32-4435184a4f4f.jpeg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Romina Elena Mendez Escobar&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/r_elena_mendez_escobar/genai-foundations-chapter-2-prompt-engineering-in-action-unlocking-better-ai-responses-l28" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Sep 9 '25&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/r_elena_mendez_escobar/genai-foundations-chapter-2-prompt-engineering-in-action-unlocking-better-ai-responses-l28" id="article-link-2828216"&gt;
          GenAI Foundations – Chapter 2: Prompt Engineering in Action – Unlocking Better AI Responses
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/openai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;openai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/data"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;data&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/r_elena_mendez_escobar/genai-foundations-chapter-2-prompt-engineering-in-action-unlocking-better-ai-responses-l28#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            16 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;/div&gt;








&lt;h3&gt;
  
  
  📄 Step 2.5: Report Generation and Distribution
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;MarkdownPDFReport&lt;/strong&gt; class converts AI-generated Markdown into a professional, styled PDF document.&lt;/p&gt;

&lt;h4&gt;
  
  
   Input parameters
&lt;/h4&gt;

&lt;p&gt;The class requires only:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Markdown text generated by the AI model&lt;/li&gt;
&lt;li&gt;An optional output path (in-memory or file-based)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Key features
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Custom heading hierarchies and typography&lt;/li&gt;
&lt;li&gt;Styled tables and lists&lt;/li&gt;
&lt;li&gt;Emoji-to-symbol mapping for visual status indicators&lt;/li&gt;
&lt;li&gt;Fully customizable styles defined in internal methods&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All visual styles are centralized and can be easily adapted to match organizational branding or reporting standards.&lt;/p&gt;

&lt;p&gt;Once generated, the PDF is stored in &lt;strong&gt;🪣 Amazon S3&lt;/strong&gt;* and sent via 📩 email using the previously described SES class, the email HTML template used for embedding the report is also available in the repository and can be modified as needed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlaldv4v7fl6zqbce07n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlaldv4v7fl6zqbce07n.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  📄 Example Output: Email and Report Preview
&lt;/h2&gt;

&lt;p&gt;Below is an example of the report generated by the solution. The complete output consists of a six-page PDF, but for illustration purposes, the following screenshots show the cover page and a selection of summary tables used to highlight key project insights.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsovka6pvhhzwtorm0qi5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsovka6pvhhzwtorm0qi5.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Conclusions
&lt;/h1&gt;

&lt;p&gt;This article demonstrates how combining Kanban project data with generative AI can significantly enhance the way teams understand, communicate, and manage complex software projects. Beyond the technical implementation, several key insights and lessons emerged from this use case.&lt;/p&gt;

&lt;h3&gt;
  
  
  📉 Reducing Bias and Improving Decision-Making
&lt;/h3&gt;

&lt;p&gt;One of the main benefits of this approach is the ability to reduce subjective bias in project analysis. By evaluating task metadata, timelines, and written communication through AI-driven semantic analysis, teams gain a more objective view of project status, risks, and bottlenecks.&lt;br&gt;
This enables more focused stakeholder discussions and allows follow-up meetings to be based on concrete, data-driven insights rather than individual perceptions.&lt;/p&gt;

&lt;h3&gt;
  
  
  🗣️ Enhancing Stakeholder Communication
&lt;/h3&gt;

&lt;p&gt;In projects with a large number of tasks and contributors, explaining delays or risks can be challenging. Automatically generated reports help translate complex project data into clear, structured summaries, making it easier to communicate issues, dependencies, and priorities to non-technical stakeholders and leadership teams.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔄 Dataset and Tooling Flexibility
&lt;/h3&gt;

&lt;p&gt;Although this example is based on Trello, the same approach can be applied to other project management tools such as Jira, Azure DevOps, Odoo, or similar platforms. By adapting the data extraction layer, teams can reuse the same analysis and reporting pipeline across different tools and project types.&lt;br&gt;
Selecting only relevant fields remains critical, as passing unnecessary or empty data increases token usage without improving insight quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  💬 Prompt Design as a Key Success Factor
&lt;/h3&gt;

&lt;p&gt;Prompt engineering plays a central role in the quality of the generated insights. Providing better context—such as project goals, roadmap expectations, risks, or delivery constraints—helps the model produce more accurate and actionable conclusions.&lt;br&gt;
During experimentation, iterative prompt refinement proved essential. In some cases, enforcing a strict output format (such as JSON) reduced the depth of the analysis, whereas allowing freer, unstructured responses resulted in richer conclusions. This highlights the importance of testing different prompt strategies rather than assuming a single optimal format.&lt;/p&gt;

&lt;h3&gt;
  
  
  📑 Output Formats and Performance Considerations
&lt;/h3&gt;

&lt;p&gt;While this solution generates Markdown and converts it into a PDF report, alternative output formats such as JSON can also be produced. However, structured formats may negatively impact model performance if they overly constrain the response. Choosing the right output format depends on the downstream use case—human consumption, system integration, or further automation.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧩 Model Selection Matters
&lt;/h3&gt;

&lt;p&gt;Model choice significantly affects the quality of insights. Initial experiments using Amazon Titan did not produce sufficiently meaningful conclusions for this use case. After evaluating multiple options, Amazon Nova proved to be the best fit, offering a better balance between contextual understanding, analytical depth, and consistency.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;AI should not replace project management practices, but it can act as a powerful decision-support layer, helping teams identify risks earlier, communicate more effectively, and focus discussions on what truly matters. With careful dataset selection, prompt design, and model evaluation, this approach can be adapted to a wide range of project environments and organizational needs.&lt;/p&gt;




&lt;p&gt;📚References&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Web Services&lt;/strong&gt;. (n.d.). AWS Glue documentation. &lt;a href="https://docs.aws.amazon.com/glue/" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/glue/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Web Services&lt;/strong&gt;. (n.d.). AWS Bedrock.
&lt;a href="https://aws.amazon.com/en/bedrock/" rel="noopener noreferrer"&gt;https://aws.amazon.com/en/bedrock/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Web Services&lt;/strong&gt;. (n.d.). Amazon Nova: Generative AI models.
&lt;a href="https://aws.amazon.com/es/ai/generative-ai/nova/" rel="noopener noreferrer"&gt;https://aws.amazon.com/es/ai/generative-ai/nova/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Asana. (n.d.)&lt;/strong&gt;. What is Kanban?.
&lt;a href="https://asana.com/es/resources/what-is-kanban" rel="noopener noreferrer"&gt;https://asana.com/es/resources/what-is-kanban&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kanban Tool&lt;/strong&gt;. (n.d.). Kanban history and evolution.
&lt;a href="https://kanbantool.com/kanban-guide/kanban-history" rel="noopener noreferrer"&gt;https://kanbantool.com/kanban-guide/kanban-history&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  📌 How to cite this article
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;APA style&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Mendez Escobar, Romina Elena. (2025). &lt;strong&gt;Data-Driven Project Analysis: Analyzing Trello Kanban Projects with AI on AWS Bedrock&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
&lt;a href="https://dev.to/aws-builders/data-driven-project-analysis-analyzing-trello-kanban-projects-with-ai-on-aws-bedrock-15f4"&gt;https://dev.to/aws-builders/data-driven-project-analysis-analyzing-trello-kanban-projects-with-ai-on-aws-bedrock-15f4&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BibTeX&lt;/strong&gt;&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
@article{mendez2025aiawstrello,
  title  = {Data-Driven Project Analysis: Analyzing Trello Kanban Projects with AI on AWS Bedrock},
  author = {Mendez Escobar, Romina Elena},
  year   = {2025},
  url    = {https://dev.to/aws-builders/data-driven-project-analysis-analyzing-trello-kanban-projects-with-ai-on-aws-bedrock-15f4}
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>machinelearning</category>
      <category>python</category>
    </item>
    <item>
      <title>From Raw Clinical Data to AI: Building a Modern Healthcare Data Platform on AWS</title>
      <dc:creator>Romina Elena Mendez Escobar</dc:creator>
      <pubDate>Tue, 09 Dec 2025 10:04:02 +0000</pubDate>
      <link>https://dev.to/aws-builders/from-raw-clinical-data-to-ai-building-a-modern-healthcare-data-platform-on-aws-1mi7</link>
      <guid>https://dev.to/aws-builders/from-raw-clinical-data-to-ai-building-a-modern-healthcare-data-platform-on-aws-1mi7</guid>
      <description>&lt;p&gt;The &lt;strong&gt;OMOP&lt;/strong&gt; Common Data Model (&lt;code&gt;CDM&lt;/code&gt;) is a standard for observational health data that allows the analysis of clinical data in a consistent and reproducible way. Implementing &lt;strong&gt;OMOP CDM&lt;/strong&gt; in &lt;strong&gt;AWS&lt;/strong&gt; requires a &lt;code&gt;robust architecture&lt;/code&gt; that handles everything from data ingestion to advanced AI analysis, maintaining the highest standards of security and regulatory compliance, especially &lt;code&gt;HIPAA&lt;/code&gt; for health data.&lt;/p&gt;

&lt;p&gt;This guide describes a set of components in an architecture within AWS, and these do not define the only possible solution, I am only presenting a proposal of a series of components that you can use among the many services that this platform has available.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbbqidpxxcryscuszrm0k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbbqidpxxcryscuszrm0k.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;────────────────────────────────&lt;/p&gt;

&lt;h1&gt;
  
  
  🗂️ What is OMOP CDM?
&lt;/h1&gt;

&lt;p&gt;The &lt;strong&gt;OMOP Common Data Model (CDM)&lt;/strong&gt; is a standard designed by the OHDSI community to represent observational health data in a uniform way. Its main objective is to enable the &lt;strong&gt;standardization of medical data&lt;/strong&gt; where different institutions, clinical systems and databases speak the same “language,” in order to facilitate reproducible analysis, cohort comparisons and multicenter studies.&lt;br&gt;
The model is based on a set of normalized tables, standardized vocabularies and modeling conventions that define how patients, diagnoses, procedures, medication, clinical measurements, visits and temporal events should be represented.&lt;/p&gt;

&lt;p&gt;────────────────────────────────&lt;/p&gt;
&lt;h1&gt;
  
  
  👤 Model Structure: Patient as Central Entity
&lt;/h1&gt;

&lt;p&gt;OMOP organizes the information around &lt;strong&gt;the patient&lt;/strong&gt;, who acts as the central unit of the model, and this structure allows the reconstruction of the patient’s clinical timeline and the analysis of their events in a temporal way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1m175vfadtsnyajpou9w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1m175vfadtsnyajpou9w.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;────────────────────────────────&lt;/p&gt;
&lt;h1&gt;
  
  
  ❤️ Standardized Vocabularies: the semantic heart of OMOP
&lt;/h1&gt;

&lt;p&gt;One of the most important strengths of the CDM is the use of standardized vocabularies, which replace the diversity of ways of writing the same text with numeric IDs. These IDs allow the representation of clinical concepts in a consistent, interoperable and computable way.&lt;br&gt;
In addition, the vocabularies have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hierarchies (for example, “type 2 diabetes mellitus” is a subconcept of “endocrine and metabolic diseases”),&lt;/li&gt;
&lt;li&gt;Semantic relationships,&lt;/li&gt;
&lt;li&gt;Standard and non-standard concepts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks to these hierarchies, an analyst can perform broad studies without knowing all the specific codes. For example, to analyze metabolic diseases, they can query the higher category and automatically include all subclasses (including different types of diabetes)&lt;/p&gt;

&lt;p&gt;────────────────────────────────&lt;/p&gt;
&lt;h1&gt;
  
  
  ☁️ OMOP in AWS
&lt;/h1&gt;

&lt;p&gt;The architecture of the &lt;strong&gt;OMOP Common Data Model&lt;/strong&gt; can be implemented in multiple environments (on-premise, hybrid or in different cloud providers). However, AWS offers a particularly robust ecosystem to address the challenges of standardization, integration, governance and advanced clinical data analysis.&lt;/p&gt;

&lt;p&gt;In this section, we explore how to combine &lt;strong&gt;AWS services&lt;/strong&gt; to build a complete pipeline that allows ingesting, transforming, standardizing and analyzing health data under the OMOP standard, maintaining high levels of security, regulatory compliance and operational efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivvbq6obrbkwo7fv17on.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivvbq6obrbkwo7fv17on.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ This approach is not intended to be the only way to implement OMOP, but a practical and modular guide that will allow you to understand which AWS services can help you in each phase of the process.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  OMOP in AWS: Services by section
&lt;/h2&gt;
&lt;h3&gt;
  
  
  (1) 📄 Data: Clinical Sources, APIs and Personal Devices
&lt;/h3&gt;

&lt;p&gt;In a modern health ecosystem, data no longer comes only from a hospital’s internal systems. Today, clinical information is distributed across multiple platforms, technologies and devices, requiring architectures capable of integrating, unifying and standardizing heterogeneous sources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffc48ok9rbor9pwo1aseh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffc48ok9rbor9pwo1aseh.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  (2) 🔧 Pipeline Services: Data Ingestion and Initial Processing
&lt;/h3&gt;

&lt;p&gt;To build a robust pipeline that enables the standardization of clinical data toward OMOP, it is essential to define how the data is extracted, ingested and prepared before transformation.&lt;br&gt;
In this stage, the main objective is to capture the data from different sources and store them in raw format in &lt;strong&gt;Amazon S3&lt;/strong&gt;, always preserving traceability and the original state of the information.&lt;br&gt;
Below are the key services used in this phase:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon MWAA (Managed Workflows for Apache Airflow)&lt;/strong&gt;&lt;br&gt;
Amazon MWAA allows running Apache Airflow DAGs without managing the underlying infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Kinesis&lt;/strong&gt;&lt;br&gt;
Hospitals and health devices generate more and more real-time data; for these scenarios, Amazon Kinesis offers a highly scalable streaming solution.&lt;br&gt;
The combined use of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kinesis Data Streams&lt;/strong&gt; (real-time ingestion)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kinesis Data Firehose&lt;/strong&gt; (automated delivery to S3) allows capturing data streams without additional infrastructure and storing them directly in the raw bucket, ready to be processed by Airflow or other services.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS Lambda&lt;/strong&gt;&lt;br&gt;
This service allows executing serverless functions without provisioning servers, which makes it ideal for small tasks and specific events within the pipeline.&lt;br&gt;
In this context, it is used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lightweight pre-validation or normalization processes before sending files to S3.&lt;/li&gt;
&lt;li&gt;Moving or restructuring files when new data arrives.&lt;/li&gt;
&lt;li&gt;Automatic triggers when new objects are detected in S3 (for example, activating notifications).&lt;/li&gt;
&lt;/ul&gt;


&lt;h4&gt;
  
  
  (3) 🗂️ RAW Storage
&lt;/h4&gt;

&lt;p&gt;Once extracted, all data will be stored initially in Amazon S3, which will act as the RAW zone of the data lake. This layer preserves the data in its original format, without transformations, to guarantee traceability, auditing and reprocessing capability.&lt;br&gt;
Storage in S3 must be complemented with a set of key practices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IAM + S3 Bucket Policies ensure role-based access.&lt;/li&gt;
&lt;li&gt;Tags help automate governance and classification.&lt;/li&gt;
&lt;li&gt;Lake Formation adds granular control at table/column level.&lt;/li&gt;
&lt;li&gt;Lifecycle policies ensure retention and cost efficiency.&lt;/li&gt;
&lt;/ul&gt;


&lt;h4&gt;
  
  
  (4) 📌 Orchestration
&lt;/h4&gt;

&lt;p&gt;In this section we describe the key DAGs we need to coordinate the different stages of the pipeline. Orchestration is essential to ensure that the extractions, transformations and loads are executed consistently, auditable and scalable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7omjg590d2uv4922bgnp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7omjg590d2uv4922bgnp.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  (5) 🧠 AI &amp;amp; Unstructured Data
&lt;/h3&gt;

&lt;p&gt;To process clinical notes and other unstructured data, we need to incorporate NLP techniques that allow extracting entities, mapping clinical concepts and automatically encoding information.&lt;br&gt;
For this type of processing, we can rely on the following AWS services:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon SageMaker&lt;/strong&gt;&lt;br&gt;
Allows training, tuning and deploying custom NLP models, from classic models to advanced transformer-based ones. It is ideal when full control of the ML pipeline, preprocessing, fine-tuning and integration with other system components is needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Comprehend Medical&lt;/strong&gt;&lt;br&gt;
Managed service that extracts clinical entities, relationships and conditions directly from medical text.&lt;br&gt;
Important: Comprehend Medical supports a limited set of languages, so it is necessary to validate documentation before integrating it into the project.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In the following article you can find a complete implementation of a batch process using this service&lt;br&gt;


&lt;/p&gt;
&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/r_elena_mendez_escobar/employing-aws-comprehend-medical-for-medical-data-extraction-in-healthcare-analytics-2dd8" class="crayons-story__hidden-navigation-link"&gt;Employing AWS Comprehend Medical for Medical Data Extraction in Healthcare Analytics&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/r_elena_mendez_escobar" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F719582%2F2d700dae-2335-4c2f-9a32-4435184a4f4f.jpeg" alt="r_elena_mendez_escobar profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/r_elena_mendez_escobar" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Romina Elena Mendez Escobar
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Romina Elena Mendez Escobar
                
              
              &lt;div id="story-author-preview-content-1947809" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/r_elena_mendez_escobar" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F719582%2F2d700dae-2335-4c2f-9a32-4435184a4f4f.jpeg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Romina Elena Mendez Escobar&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/r_elena_mendez_escobar/employing-aws-comprehend-medical-for-medical-data-extraction-in-healthcare-analytics-2dd8" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Aug 7 '24&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/r_elena_mendez_escobar/employing-aws-comprehend-medical-for-medical-data-extraction-in-healthcare-analytics-2dd8" id="article-link-1947809"&gt;
          Employing AWS Comprehend Medical for Medical Data Extraction in Healthcare Analytics
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/aws"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;aws&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/datascience"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;datascience&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/nlp"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;nlp&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/r_elena_mendez_escobar/employing-aws-comprehend-medical-for-medical-data-extraction-in-healthcare-analytics-2dd8" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;4&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/r_elena_mendez_escobar/employing-aws-comprehend-medical-for-medical-data-extraction-in-healthcare-analytics-2dd8#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            13 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;





&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Amazon Bedrock integrated with SageMaker&lt;/strong&gt;&lt;br&gt;
Although Bedrock is a separate service, it can be integrated into ML flows in SageMaker. Its main contribution is enabling foundational models and generative AI capabilities, opening the door to new use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatic classification of clinical text.&lt;/li&gt;
&lt;li&gt;Concept normalization assisted by generative models.&lt;/li&gt;
&lt;li&gt;Semantic searches and context retrieval through vector databases (for example, to enrich mapping results or suggest probable clinical codes).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;(6) 🩺 OMOP CDM&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;All processing stages converge in the implementation of the &lt;strong&gt;OMOP Common Data Model (CDM)&lt;/strong&gt;, stored in a relational database optimized for analytical and mixed workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Aurora PostgreSQL&lt;/strong&gt;&lt;br&gt;
The recommended engine for hosting the CDM is &lt;strong&gt;Amazon Aurora PostgreSQL&lt;/strong&gt;, because it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maintains full SQL compatibility and supports OHDSI ecosystem tools.&lt;/li&gt;
&lt;li&gt;Provides high availability, automatic replication, and fast recovery.&lt;/li&gt;
&lt;li&gt;Scales horizontally with read replicas, ideal for analytical and concurrent workloads.&lt;/li&gt;
&lt;li&gt;Integrates seamlessly with ETL/ELT pipelines across AWS services.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Depending on the use case, Aurora can be complemented with additional analytics-oriented services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Redshift&lt;/strong&gt;&lt;br&gt;
For advanced analytics over large datasets derived from the CDM, &lt;strong&gt;Amazon Redshift&lt;/strong&gt; offers a distributed, high-performance environment for complex analytical queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Athena&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Amazon Athena&lt;/strong&gt; enables querying raw data stored in S3 without loading it into a database. It is especially useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quick validations before loading data into the CDM.&lt;/li&gt;
&lt;li&gt;Debugging and data quality checks using SQL.&lt;/li&gt;
&lt;li&gt;Exploring semi-structured files (CSV, JSON, Parquet).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Amazon ElastiCache&lt;/strong&gt;&lt;br&gt;
When the solution requires high-frequency or computationally expensive queries on the OMOP model, adding a cache layer with &lt;strong&gt;Redis&lt;/strong&gt; or &lt;strong&gt;Memcached&lt;/strong&gt; helps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduce latency for repeated queries.&lt;/li&gt;
&lt;li&gt;Store results of heavy computations (e.g., cohort definitions, vocabulary lookups).&lt;/li&gt;
&lt;li&gt;Improve performance for dashboards and clinical applications that require fast responses.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  (7) 📊 Data Visualization
&lt;/h3&gt;

&lt;p&gt;Data visualization is essential not only to consume information but also to analyze, monitor and validate each stage of the pipeline. As we process clinical data, vocabularies, transformations and AI results, we need tools that make the quality, behavior and evolution of the data evident.&lt;/p&gt;

&lt;p&gt;Below are various options depending on the use case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon QuickSight&lt;/strong&gt;: It enables fast, interactive dashboards connected to Aurora, Redshift, Athena or S3. Its in-memory SPICE engine accelerates visualizations at scale while reducing load on source databases, making it ideal for data quality tracking and clinical monitoring.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon SageMaker Model Dashboard&lt;/strong&gt;: The SageMaker Model Dashboard centralizes observability for ML workflows, displaying metrics such as precision, recall and F1-score, along with model versions, drift indicators and execution history. This makes it easier to detect degradation early and maintain reliable NLP or predictive models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Fargate / Amazon EKS&lt;/strong&gt;: When fully custom dashboards are required—such as advanced visualizations, semantic comparisons or interactive analytics—Fargate and EKS provide the compute layer to run applications built with tools like Plotly, Dash, Streamlit or React-based libraries. This allows teams to create&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  (8) 🧭 Data Governance
&lt;/h4&gt;

&lt;p&gt;Data governance is critical when working with sensitive health information, ensuring that data remains cataloged, documented and protected throughout every stage of the pipeline. &lt;strong&gt;A strong governance layer enforces access policies&lt;/strong&gt;, allowing only authorized users to interact with clinical datasets under strict regulatory requirements. &lt;strong&gt;It also guarantees full traceability&lt;/strong&gt;, enabling auditing of how data is accessed, transformed and shared across environments. &lt;strong&gt;Finally, governance provides controlled discoverability&lt;/strong&gt;, ensuring that curated datasets can be safely searched and consumed while maintaining consistent metadata.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsrq87wstyvwunkxfbnls.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsrq87wstyvwunkxfbnls.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS Lake Formation&lt;/strong&gt;&lt;br&gt;
AWS Lake Formation centralizes governance for data stored in S3, offering fine-grained permissions at the table, column or row level, enforcing traceability and integrating tightly with the Glue Data Catalog to maintain consistent metadata.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon DataZone&lt;/strong&gt;&lt;br&gt;
Amazon DataZone supports the organized publication and controlled sharing of datasets across the organization, enabling teams to work within structured data domains—such as Clinical, NLP, OMOP or Research—while unifying cataloging, governance and collaboration in one environment.&lt;/p&gt;




&lt;h3&gt;
  
  
  (9) 🔐 Security and Networking
&lt;/h3&gt;

&lt;p&gt;Security and connectivity are fundamental pillars in any health data architecture, especially to comply with regulations such as HIPAA. In AWS, there are multiple services that protect both data and infrastructure. Below we describe the main components and their role within our OMOP CDM architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1580554hlnuonqvqp8vd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1580554hlnuonqvqp8vd.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;(10) 🎚️ Monitoring and Billing&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Monitoring and cost control are essential in health data architectures, especially when processing large clinical datasets or running AI workloads where training and inference can be resource-intensive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔍 Monitoring&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;AWS CloudWatch&lt;/strong&gt; provides centralized metrics, logs and events from all AWS services, enabling teams to track infrastructure health, Airflow DAG execution and the behavior of ETL/ELT pipelines while receiving alerts for anomalies. For deeper inspection, &lt;strong&gt;AWS X-Ray&lt;/strong&gt; traces requests across distributed systems—such as containerized services on &lt;strong&gt;ECS/EKS&lt;/strong&gt; or &lt;strong&gt;APIs&lt;/strong&gt; that expose OMOP data—making it easier to detect bottlenecks and debug complex data flows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🧾 Billing&lt;/strong&gt;&lt;br&gt;
To maintain financial visibility and prevent cost overruns, &lt;strong&gt;AWS Cost Explorer&lt;/strong&gt; offers detailed insights into usage patterns across services, including AI and data-intensive components. Complementing this, &lt;strong&gt;AWS Budgets&lt;/strong&gt; allows setting custom spending limits and automated alerts, ensuring that project costs remain predictable and aligned with operational goals.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;(11)🧱 Code &amp;amp; Deployment&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Managing code and deploying infrastructure is essential to guarantee reproducibility, traceability and security in cloud-based health projects. This includes not only provisioning resources, but also maintaining reliable pipelines, consistent environments and well-governed ML assets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔧 Infrastructure as Code&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;Terraform&lt;/code&gt; allows defining the entire AWS architecture in a declarative way, ensuring that environments remain consistent and reproducible across development, staging and production. It supports provisioning core components such as S3 buckets, VPCs, databases and IAM roles while enforcing infrastructure governance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🗂️ Versioning &amp;amp; CI/CD&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;GitHub&lt;/code&gt; serves as the central platform for code collaboration, offering pull requests, reviews and issue management. With &lt;code&gt;GitHub Advanced Security&lt;/code&gt;, teams &lt;strong&gt;can catch vulnerabilities&lt;/strong&gt; early through dependency scanning and code analysis. &lt;br&gt;
&lt;code&gt;GitHub Actions&lt;/code&gt; complements this by automating &lt;strong&gt;CI/CD pipelines&lt;/strong&gt; building containers, validating data quality, deploying Airflow DAGs or updating infrastructure definitions—ensuring that each change is tested and safely promoted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🏷️ Models &amp;amp; Containers&lt;/strong&gt;&lt;br&gt;
For containerized workloads, &lt;code&gt;Amazon ECR&lt;/code&gt; provides a secure and scalable registry for images used in &lt;code&gt;ECS&lt;/code&gt;, &lt;code&gt;EKS&lt;/code&gt; or &lt;code&gt;Fargate&lt;/code&gt;, ensuring consistency across environments. In parallel, the &lt;code&gt;Amazon SageMaker Model Registry&lt;/code&gt; manages &lt;strong&gt;ML model versions&lt;/strong&gt;, capturing lineage, approvals and metadata so that each model deployed into production remains auditable and reproducible.&lt;/p&gt;




&lt;h3&gt;
  
  
  (12) 🚀 AI Consume
&lt;/h3&gt;

&lt;p&gt;Once the data is standardized and loaded into the OMOP CDM, it becomes the foundation for advanced analytics, AI-driven insights and secure data consumption. This unlocks opportunities for clinical research, decision support and the development of intelligent health applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;☁️ Data Consumption through APIs&lt;/strong&gt;&lt;br&gt;
Standardized OMOP data can be exposed through secure API layers, enabling internal and external systems to retrieve curated clinical information. Services such as Amazon API Gateway combined with AWS Lambda provide scalable, low-latency endpoints that support both real-time and batch consumption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;📊 Advanced Analysis and Machine Learning&lt;/strong&gt;&lt;br&gt;
Amazon SageMaker enables training, evaluating and deploying Machine Learning models directly on top of OMOP data. This supports use cases such as predicting clinical risks, classifying patients by comorbidities or analyzing treatment response patterns, all while integrating seamlessly with the existing data pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🧩 Vector Search with Aurora and pgvector&lt;/strong&gt;&lt;br&gt;
By storing patient feature vectors in Aurora PostgreSQL using pgvector, the system can perform semantic similarity searches between patients or clinical cases. This capability enhances cohort discovery and enables personalized recommendation workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🧠 Generative AI with Amazon Bedrock&lt;/strong&gt;&lt;br&gt;
Amazon Bedrock provides access to foundational models that can summarize clinical notes, extract information from unstructured text or augment concept mapping processes, expanding analytical depth through generative AI.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Researchers can query patients with similar disease profiles using pgvector, deploy readmission prediction models in SageMaker or generate automated insights from clinical notes using Bedrock-powered NLP.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h1&gt;
  
  
  📚 Conclusions
&lt;/h1&gt;

&lt;p&gt;This guide presents a compact proposal for implementing OMOP CDM on AWS, showing how its services can support secure, scalable and efficient clinical data processing. The architecture is flexible and can be adapted to different project needs.&lt;/p&gt;

&lt;p&gt;AWS provides an ecosystem that covers the entire data lifecycle, allowing integration with open-source tools and containerized workloads while maintaining control over performance and costs. This balance is especially important in health and AI-driven environments.&lt;/p&gt;

&lt;p&gt;Building on strong governance and security practices, the proposed approach demonstrates that AWS enables compliant and reliable data workflows. With the right configuration, clinical data can be transformed into meaningful insights for research, analytics and innovation.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>cloud</category>
      <category>architecture</category>
    </item>
    <item>
      <title>AWS re:Invent 2025: Updates in Infrastructure, Security, and Compute + Learning Path Summary</title>
      <dc:creator>Romina Elena Mendez Escobar</dc:creator>
      <pubDate>Mon, 08 Dec 2025 09:52:24 +0000</pubDate>
      <link>https://dev.to/aws-builders/aws-reinvent-2025-updates-in-infrastructure-security-and-compute-learning-path-summary-3i72</link>
      <guid>https://dev.to/aws-builders/aws-reinvent-2025-updates-in-infrastructure-security-and-compute-learning-path-summary-3i72</guid>
      <description>&lt;h1&gt;
  
  
  📖 Introduction
&lt;/h1&gt;

&lt;p&gt;At &lt;code&gt;re:Invent 2025&lt;/code&gt;, AWS placed &lt;strong&gt;Generative AI&lt;/strong&gt; at the center, moving from simple chats to agents that understand context, execute tasks, and integrate natively with infrastructure, security, and data services. Within this approach, AWS launched in &lt;a href="https://skillbuilder.aws/" rel="noopener noreferrer"&gt;Skill Builder&lt;/a&gt; a learning path with &lt;strong&gt;33 courses&lt;/strong&gt; and more than 60 hours to learn these new services, from fundamental to advanced level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6kogvjso94nros4kahdo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6kogvjso94nros4kahdo.png" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  🔍 Why is this re:Invent a turning point?
&lt;/h1&gt;

&lt;p&gt;The big novelty this year is how generative AI stops being an isolated component and becomes a central engine that drives automation, security, infrastructure, and operations. We are entering a stage where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🤖 &lt;strong&gt;Agents&lt;/strong&gt; not only process language: they execute real actions in AWS.&lt;/li&gt;
&lt;li&gt;🔧 &lt;strong&gt;IaC&lt;/strong&gt; automation is complemented by intelligent flows that detect, decide, and act.&lt;/li&gt;
&lt;li&gt;🔓 &lt;strong&gt;Securit&lt;/strong&gt; y is transformed thanks to the ability to analyze large volumes of logs in seconds, where every minute is critical.&lt;/li&gt;
&lt;li&gt;🗂️ &lt;strong&gt;Data engineering&lt;/strong&gt; and &lt;strong&gt;observability&lt;/strong&gt; are rewritten with agents that contextualize, correlate, and recommend.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To support this technological leap, AWS launched new services (some very recent) and updated others, which motivated the design of an integrated learning path to learn them in a structured way.&lt;/p&gt;

&lt;h3&gt;
  
  
  🛠️Learning path details
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;33 total courses and more than 60 hours of content.&lt;/li&gt;
&lt;li&gt;26 fundamental-level courses, 4 intermediate, and 3 advanced, combining updates of existing services with completely new launches.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  📘 Service Overviews &amp;amp; Course Levels
&lt;/h1&gt;

&lt;p&gt;The learning path organizes 33 courses by technical depth to help learners navigate new AWS services efficiently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Link&lt;/strong&gt; 👉 &lt;a href="https://skillbuilder.aws/learning-plan/JZQY2Z8DG4/aws-reinvent-2025-announcements-learning-plan/VWQU3VK65K" rel="noopener noreferrer"&gt;https://skillbuilder.aws/learning-plan/JZQY2Z8DG4/aws-reinvent-2025-announcements-learning-plan/VWQU3VK65K&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz1jgiuw770usz5nesvs6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz1jgiuw770usz5nesvs6.png" alt=" " width="800" height="332"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Course Levels:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🟢 &lt;strong&gt;Beginner&lt;/strong&gt; (&lt;code&gt;26 courses&lt;/code&gt;): Introduces core services and fundamental concepts.&lt;/li&gt;
&lt;li&gt;🟡 &lt;strong&gt;Intermediate&lt;/strong&gt; (&lt;code&gt;4 courses&lt;/code&gt;): Covers integration, automation, and real-world deployments.&lt;/li&gt;
&lt;li&gt;🔴 &lt;strong&gt;Advanced&lt;/strong&gt; (&lt;code&gt;3 courses&lt;/code&gt;): Focuses on autonomous agents, high-performance compute, and advanced security.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Kiro
&lt;/h3&gt;

&lt;p&gt;It is a development environment (IDE) with AI agents that start from a written specification and generate code, tests, and documentation, helping to design and maintain applications more quickly and consistently.&lt;br&gt;
⏱ 3:30 hours | 📚 3 courses&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 Kiro Getting Started&lt;/li&gt;
&lt;li&gt;🟢 Introduction to Kiro powers (Update)&lt;/li&gt;
&lt;li&gt;🟡 Spec-Driven Development with Kiro&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Amazon Nova 2
&lt;/h3&gt;

&lt;p&gt;It is a family of multimodal generative AI models (text, image, audio, video) designed for advanced reasoning, conversational assistants, and content generation in enterprise applications.&lt;br&gt;
⏱ 04:15 hours | 📚 4 courses&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 Amazon Nova 2: Understanding Models (New)&lt;/li&gt;
&lt;li&gt;🟢 Amazon Nova 2 Sonic: Next-Generation Conversational AI (Update)&lt;/li&gt;
&lt;li&gt;🟢 Introduction to Amazon Nova Forge (New)&lt;/li&gt;
&lt;li&gt;🟡 Extended Thinking with Amazon Nova (Update)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Amazon Quick Suite
&lt;/h3&gt;

&lt;p&gt;An integrated analytics and business intelligence platform powered by generative AI that unifies agents for research, data visualization, and workflow automation, accessible via chat and embedded in tools like browser, Slack, or Office.&lt;br&gt;
⏱ 03:10 hours | 📚 3 courses&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 Introduction to Amazon Quick Suite&lt;/li&gt;
&lt;li&gt;🟢 Getting Started with Administering Amazon Quick Suite&lt;/li&gt;
&lt;li&gt;🟡 Amazon Quick Automate – Building Intelligent Workflows (Update)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  AWS DevOps Agent
&lt;/h3&gt;

&lt;p&gt;AI agent for operations that analyzes events and metrics, automates incident response, assists with root cause analysis, and suggests preventive actions to improve reliability.&lt;br&gt;
⏱ 1:00 hour | 📚 1 course&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 Introduction to AWS DevOps Agent (New)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  AWS AI Factories
&lt;/h3&gt;

&lt;p&gt;It is a dedicated AI infrastructure solution deployed in the customer’s data center, with specialized hardware to train and run models while maintaining data sovereignty.&lt;br&gt;
⏱ 00:30 minutes | 📚 1 course&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 Introduction to AWS AI Factories (New)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Amazon SageMaker
&lt;/h3&gt;

&lt;p&gt;It is AWS’s managed machine learning platform that offers notebooks, data preparation tools, model training, and model deployment, now with more serverless options and a focus on foundation models. In this latest update, it includes a set of “SageMaker AI” capabilities such as serverless notebooks, simplified customization of foundation models, and elastic training with HyperPod to scale without managing infrastructure.&lt;br&gt;
⏱ 03:30 hours | 📚 4 courses&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 Introduction to Amazon SageMaker Notebooks (Update)&lt;/li&gt;
&lt;li&gt;🟢 Introduction to Model Customization in Amazon SageMaker AI (Update)&lt;/li&gt;
&lt;li&gt;🔴 Elastic Training on Amazon SageMaker HyperPod (New)&lt;/li&gt;
&lt;li&gt;🔴 Checkpointless Training on Amazon SageMaker HyperPod (New)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  AWS Security Agent
&lt;/h3&gt;

&lt;p&gt;Security agent that reviews from code to production environment, automates configuration assessments and penetration tests, and generates recommendations to reduce risk throughout the development lifecycle.&lt;br&gt;
⏱ 00:30 minutes | 📚 1 course&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 Introduction to AWS Security Agent (Tech Preview) (Update)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Amazon Bedrock
&lt;/h3&gt;

&lt;p&gt;It is the service that allows building and operating AI agents based on foundation models, with security controls, continuous evaluation, and policies to govern their behavior.&lt;br&gt;
⏱ 02:10 hours | 📚 2 courses&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 AgentCore Evaluation on Amazon Bedrock (New)&lt;/li&gt;
&lt;li&gt;🟢 AgentCore Policy on Amazon Bedrock (New)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Amazon EC2
&lt;/h3&gt;

&lt;p&gt;This service has new compute instances with next-generation GPUs designed to train and serve large AI models with high performance. The new instances are optimized for frontier model training, combining next-generation GPUs with network and storage improvements to offer several times more performance than previous generations.&lt;br&gt;
⏱ 02:30 hours | 📚 5 courses&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 Introduction to Amazon EC2 P6e-GB300 UltraServers (Update)&lt;/li&gt;
&lt;li&gt;🟢 Introduction to Capacity Manager for Amazon EC2 (Update)&lt;/li&gt;
&lt;li&gt;🟢 Introduction to Amazon EC2 Instance Attestation (New)&lt;/li&gt;
&lt;li&gt;🟢 Introduction to Amazon EC2 P6-B300 Instances (New)&lt;/li&gt;
&lt;li&gt;🟢 Introduction to Capacity Manager for Amazon EC2 (New)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Amazon S3 Vectors
&lt;/h3&gt;

&lt;p&gt;It is an S3 capability to store vectors (embeddings) and perform semantic and similarity searches on documents, images, or other objects.&lt;br&gt;
⏱ 1:00 hour | 📚 1 course&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 Amazon S3 Vectors Getting Started (Update)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Amazon FSx for NetApp ONTAP
&lt;/h3&gt;

&lt;p&gt;Fully managed service that provides ONTAP file systems with enterprise features (snapshots, clones, replication) and the elasticity and pay-as-you-go model of AWS cloud.&lt;br&gt;
⏱ 1:15 hours | 📚 1 course&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 Amazon FSx for NetApp ONTAP Primer (Update)&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Amazon Aurora PostgreSQL&lt;br&gt;
It is a relational database compatible with PostgreSQL that adds policies to hide or transform sensitive data. This new functionality allows defining dynamic masking policies so that sensitive data is displayed differently depending on the user’s role, reinforcing access control at column and row level.&lt;br&gt;
⏱ 1:30 hours | 📚 1 course&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔴 Dynamic Data Masking in Aurora PostgreSQL (New)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AWS Transform
&lt;/h3&gt;

&lt;p&gt;It is a suite of AI-powered tools to modernize .NET applications, full-stack Windows, and custom code, automating analysis, refactoring, and migration to accelerate legacy modernization toward cloud-native architectures.&lt;br&gt;
⏱ 03:00 hours | 📚 3 courses&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 AWS Transform for .NET Getting Started (Update)&lt;/li&gt;
&lt;li&gt;🟢 AWS Transform Custom (New)&lt;/li&gt;
&lt;li&gt;🟢 AWS Transform Full-Stack Windows (New)&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  🚀 Conclusion: AI as the Engine of the Cloud Ecosystem
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;AWS re:Invent 2025&lt;/strong&gt; marks a decisive turning point: &lt;code&gt;Generative AI&lt;/code&gt; has moved beyond being an isolated tool to become the central engine that drives the transformation of the cloud ecosystem.&lt;/p&gt;

&lt;p&gt;This learning path of &lt;strong&gt;33 courses&lt;/strong&gt; is not just a set of trainings but a strategic roadmap showing how &lt;code&gt;infrastructure&lt;/code&gt;, &lt;code&gt;security&lt;/code&gt;, and &lt;code&gt;operations&lt;/code&gt; converge with &lt;code&gt;AI&lt;/code&gt; to enable a new generation of solutions.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;incorporation of agents&lt;/strong&gt;, along with the evolution of compute and security improvements, is creating environments that are much more autonomous, efficient, and prepared for new use cases.&lt;/p&gt;

&lt;p&gt;Specialized infrastructure plays a key role, where AWS AI Factories ensure data sovereignty in regulated industries, while the new EC2 instances optimized for AI increase performance for model training and deployment at scale. In this set of updates, it is clear that foundation models are becoming more powerful and are a fundamental part of decision-making, intelligent automation, and the creation of AI-powered products, generating real competitive advantage for organizations that can combine AI + infrastructure + security as a single strategy.&lt;/p&gt;

&lt;p&gt;Therefore, this learning path is the ideal starting point to learn the new features, prepare your skills, and put them into practice in your next project within the AWS ecosystem.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>From Search to Story: Using Gemini API to Automate Brand Content Analysis with Python</title>
      <dc:creator>Romina Elena Mendez Escobar</dc:creator>
      <pubDate>Mon, 20 Oct 2025 17:54:15 +0000</pubDate>
      <link>https://dev.to/r_elena_mendez_escobar/from-search-to-story-using-gemini-api-to-automate-brand-content-analysis-with-python-2i1a</link>
      <guid>https://dev.to/r_elena_mendez_escobar/from-search-to-story-using-gemini-api-to-automate-brand-content-analysis-with-python-2i1a</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;In a hyperconnected world, every post, comment, or interaction contributes to building a brand's reputation. Therefore, identifying what people are talking about and turning it into stories that inform, inspire, and connect is essential for any modern communication strategy.&lt;/p&gt;

&lt;p&gt;This article was born from a concrete question: &lt;strong&gt;how can Generative AI be used to discover what is being said about a company and transform that information into relevant stories?&lt;/strong&gt; Stories that reflect real experiences and concerns, turning them into inspiring narratives that strengthen brand identity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiinwni9w2ltd5h22jjqh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiinwni9w2ltd5h22jjqh.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this tutorial, you will learn how to use Google Gemini to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🔍 Search for information&lt;/strong&gt; using generative AI integrated with Google Search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;✍️ Transform findings&lt;/strong&gt; into structured journalistic narratives&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📊 Generate visual reports&lt;/strong&gt; with graphics and automated storytelling&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
   What is Brand Journalism?
&lt;/h1&gt;

&lt;p&gt;According to an article by &lt;a href="https://nytlicensing.com/latest/marketing/brand-journalism-and-why-it-matters/" rel="noopener noreferrer"&gt;The New York Times Licensing Group&lt;/a&gt;, readers experience significant content fatigue: there are more than 1.8 billion websites and over 70 million blogs published each month.&lt;/p&gt;

&lt;p&gt;Brand Journalism is a communication strategy where brands adopt journalistic techniques to tell relevant and engaging stories. Instead of direct advertising messages, content is created with a narrative, informative, and value-added approach, similar to traditional media.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cr9w8fpz7gddlzqu3tq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cr9w8fpz7gddlzqu3tq.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Journalistic techniques:&lt;/strong&gt; Application of rigorous journalistic methods to create credible and well-structured content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audience interests:&lt;/strong&gt; Focus on the real interests of the audience, not just the messages the brand wants to convey.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality and useful information&lt;/strong&gt;: Content that educates, informs, or solves concrete problems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use of different formats:&lt;/strong&gt; Variety of formats (reports, interviews, analyses, infographics, videos) to maintain engagement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storytelling:&lt;/strong&gt; Narratives that connect emotionally with values, experiences, and social impact.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1905tnjax72ha9hxju1r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1905tnjax72ha9hxju1r.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Benefits
&lt;/h2&gt;

&lt;p&gt;The benefits we can identify based on this are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Brand Positioning:&lt;/strong&gt; Establish yourself as a thought leader in your industry.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audience Loyalty:&lt;/strong&gt; Build authentic and lasting relationships with your audience.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Differentiation against the Competition:&lt;/strong&gt; Stand out from competitors through higher-quality editorial content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Greater Organic Reach:&lt;/strong&gt; Valuable content is naturally shared, amplifying reach without direct advertising investment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vyxb7bqplqzjr1marfn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vyxb7bqplqzjr1marfn.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  What is Generative AI?
&lt;/h1&gt;

&lt;p&gt;Generative AI is a branch of artificial intelligence focused on creating new and original content: text, images, audio, video, or synthetic data. Its development has been possible thanks to deep learning, especially through advanced architectures such as transformers, which process information in parallel and capture complex relationships in large data volumes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources on GenAI
&lt;/h2&gt;

&lt;p&gt;I have written a series of articles on the fundamentals of generative AI&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkkmxi7r8nyuejaornk27.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkkmxi7r8nyuejaornk27.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-1-prompt-basics-from-theory-to-practice-1a5"&gt;GenAI Foundations – Chapter 1: Prompt Basics: From Theory to Practice&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-2-prompt-engineering-in-action-unlocking-better-ai-responses-l28"&gt;GenAI Foundations – Chapter 2: Prompt Engineering in Action – Unlocking Better AI Responses&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-3-rag-patterns-and-best-practices-cpc"&gt;GenAI Foundations – Chapter 3: RAG Patterns and Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-4-model-customization-evaluation-can-we-trust-the-outputs-i21"&gt;GenAI Foundations – Chapter 4: Model Customization &amp;amp; Evaluation – Can We Trust the Outputs?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-5-project-planning-with-the-generative-ai-canvas-2o73"&gt;GenAI Foundations – Chapter 5: Project Planning with the Generative AI Canvas&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Gemini
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Gemini&lt;/strong&gt; is a family of multimodal AI models developed by Google DeepMind. It integrates into multiple Google products and can process text, images, and other data types simultaneously.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grounding with Google Search
&lt;/h3&gt;

&lt;p&gt;For this use case, we will use the Grounding with Google Search functionality, which connects the model directly to Google to perform searches and obtain up-to-date information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhdazimmkpdxzdnybj6ej.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhdazimmkpdxzdnybj6ej.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Main Advantages:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;📏Increased Accuracy:&lt;/strong&gt; Reduces model hallucinations by accessing verifiable information.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⚡️Real-Time Information:&lt;/strong&gt; Access to current data, reducing uncertainty about the model's knowledge.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📚Citations and References:&lt;/strong&gt; Retrieves source links and provides control over consulted data sources.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Use Case
&lt;/h1&gt;

&lt;p&gt;Brand Journalism is a strategic tool for companies to communicate their values from an authentic perspective. However, we often need to find topics that might interest our target audience, so it is essential to search for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mentions of the company on different sites&lt;/li&gt;
&lt;li&gt;Reputation and notable aspects&lt;/li&gt;
&lt;li&gt;Trends and relevant conversations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This starting point helps those who write articles or create storytelling based not only on what the company wants to show but also on the external perspective others have of it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Example: 📱iPhone 17
&lt;/h2&gt;

&lt;p&gt;Using the latest iPhone launch as an example, we will:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Search for recently published articles&lt;/li&gt;
&lt;li&gt;Classify and analyze these documents&lt;/li&gt;
&lt;li&gt;Generate a report with visualizations, conclusions, and structured narratives&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Next, we will see how to implement this strategy through an automated workflow that integrates AI and data analysis.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Implementation Process
&lt;/h2&gt;

&lt;p&gt;The following diagram illustrates how our automated analysis system works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptgtujbp39akqmt6szbg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptgtujbp39akqmt6szbg.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1️⃣ Search with Google Search
&lt;/h3&gt;

&lt;p&gt;We use &lt;strong&gt;Grounding with Google Search&lt;/strong&gt; to find relevant articles and request output in JSON format using this structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; 
   &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"full article title"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"source_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"media name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"publication date"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"article link"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"site_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"website name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2-4 line summary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"sentiment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"positive/negative/neutral"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rumor/analysis/comparison/market/technical"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"sentiment_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1-10 score"&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  2️⃣ Storytelling Generation
&lt;/h3&gt;

&lt;p&gt;We use another prompt to generate different types of narratives based on the articles found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Analytical Insights:&lt;/strong&gt; Compact analytical summary with concrete data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storytelling Narrative:&lt;/strong&gt; Engaging mini-narrative based on dataset evidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tone Variants (A/B/C):&lt;/strong&gt; Three versions with different focuses: objective, emotional, and strategic.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3️⃣ Report Creation
&lt;/h3&gt;

&lt;p&gt;We generate a PDF report including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Charts created with Seaborn and Matplotlib&lt;/li&gt;
&lt;li&gt;Visual trend analyses&lt;/li&gt;
&lt;li&gt;Narrative conclusions based on generated storytelling&lt;/li&gt;
&lt;li&gt;Customizing the layout using ReportLab&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  Tutorial
&lt;/h1&gt;
&lt;h2&gt;
  
  
  How Does Gemini Work with Google Search?
&lt;/h2&gt;

&lt;p&gt;When performing a query, Gemini not only relies on its internal knowledge but also actively searches updated information on Google Search. This grounding capability allows the model to access real-time data, verify facts, and provide responses based on concrete sources, reducing hallucination risk and ensuring relevance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhe427glwz3m59rfbvto.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhe427glwz3m59rfbvto.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Pre-requisite: Access to Gemini API
&lt;/h2&gt;

&lt;p&gt;Before starting, you need to get access to the Gemini API:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create an account in &lt;a href="https://aistudio.google.com/" rel="noopener noreferrer"&gt;Google AI Studio&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Create or log in with your Google account&lt;/li&gt;
&lt;li&gt;Generate your API key from the control panel&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: You can use Gemini's free tier to test this project.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvm22ktvs6q2my515n4ai.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvm22ktvs6q2my515n4ai.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once you have your API key, configure it in a .env file:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;API_KEY &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tu_api_key_de_gemini"&lt;/span&gt;
MODEL_ID &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"gemini-2.5-flash"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;We use Gemini 2.5 Flash because it is the most cost-efficient model optimized for frequent, low-cost tasks.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Repository Structure
&lt;/h2&gt;

&lt;p&gt;For this tutorial you must clone the following repository and you can get the complete code from this tutorial.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/RominaElenaMendezEscobar" rel="noopener noreferrer"&gt;
        RominaElenaMendezEscobar
      &lt;/a&gt; / &lt;a href="https://github.com/RominaElenaMendezEscobar/brand-journalism-gemini" rel="noopener noreferrer"&gt;
        brand-journalism-gemini
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Tutorial about Brand Journalism Code Using Google Gemini
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;&lt;a href="https://www.buymeacoffee.com/r0mymendez" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/b96fd4ea89ea15fcec30a4f86382eef0bbd17454aa3a8d4de8c8c5e92b55cf6c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4275792532304d6525323041253230436f666665652d737570706f72742532306d79253230776f726b2d4646444430303f7374796c653d666c6174266c6162656c436f6c6f723d313031303130266c6f676f3d6275792d6d652d612d636f66666565266c6f676f436f6c6f723d7768697465" alt="Buy Me A Coffee"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;How to Use AI in Brand Journalism with Gemini to Transform Digital Information into Strategic Editorial Content?&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Introduction&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;In a hyperconnected world, every post, comment, or interaction contributes to building a brand's reputation. Therefore, identifying what people are talking about and turning it into stories that inform, inspire, and connect is essential for any modern communication strategy.&lt;/p&gt;

&lt;p&gt;This repository was born from a concrete question: &lt;strong&gt;how can Generative AI be used to discover what is being said about a company and transform that information into relevant stories?&lt;/strong&gt; Stories that reflect real experiences and concerns, turning them into inspiring narratives that strengthen brand identity.&lt;/p&gt;

&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/RominaElenaMendezEscobar/brand-journalism-gemini/img/readme/1.google-search.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FRominaElenaMendezEscobar%2Fbrand-journalism-gemini%2Fimg%2Freadme%2F1.google-search.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;In this tutorial, you will learn how to use Google Gemini to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🔍 Search for information&lt;/strong&gt; using generative AI integrated with Google Search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;✍️ Transform findings&lt;/strong&gt; into structured journalistic narratives&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📊 Generate visual reports&lt;/strong&gt; with graphics and automated storytelling&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;# What is Brand Journalism
According to an…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/RominaElenaMendezEscobar/brand-journalism-gemini" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;




&lt;blockquote&gt;
&lt;p&gt;If you find this tutorial helpful, feel free to leave a star ⭐️ and follow me to get notified about new articles. Your support helps me grow within the tech community and create more valuable content! 🚀&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;project/
   ├── img/                    &lt;span class="c"&gt;# Generated graphics&lt;/span&gt;
   ├── prompt/
   │   ├── prompt_search.txt       &lt;span class="c"&gt;# Search Prompt&lt;/span&gt;
   │   └── prompt_storytelling.txt &lt;span class="c"&gt;# Prompt for narrative&lt;/span&gt;
   ├── report/                &lt;span class="c"&gt;# PDFs generated&lt;/span&gt;
   ├── brand_journalist_analyzer.py
   ├── report_plots.py
   ├── report_analysis.py
   └── main.py
   └── .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Main Files
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1️⃣ Prompts (/prompt)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🗒&lt;code&gt;prompt_search.txt&lt;/code&gt;&lt;/strong&gt;: Here we define how to perform the search in Google Search and structure the results in JSON. This prompt instructs the model to return structured information with fields such as the article's title, source, date, URL, summary, sentiment, and category.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🗒&lt;code&gt;prompt_storytelling.txt&lt;/code&gt;&lt;/strong&gt;: In this file, we define how to generate conclusions and storytelling based on the articles found. It requests different types of outputs, including objective analysis, immersive narratives, and three tone variants (objective, emotional, and emotional).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;2️⃣ Brand Journalism Analyzer&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🗒&lt;code&gt;brand_journalist_analyzer.py&lt;/code&gt;&lt;/strong&gt;: This class is the core of the application and handles all interaction with the Gemini API. It implements three main functionalities: news retrieval using Google Search, structured storytelling generation, and analytical insights extraction. 
The most important method is &lt;strong&gt;search_news()&lt;/strong&gt;, which executes real-time searches and returns structured data in JSON format. To use integrated Google Search, simply set &lt;code&gt;config={"tools": [{"google_search": {}}]}&lt;/code&gt; in the API call.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_news&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_dataframe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search for news on a topic using Google Search.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search_prompt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}}]}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Process and clean JSON response
&lt;/span&gt;    &lt;span class="n"&gt;txt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="n"&gt;clean_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_clean_json_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;txt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;3️⃣ Visualization Generator&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;report_plots.py:&lt;/code&gt; This class creates all the report visualizations using Seaborn and Matplotlib. It generates three essential chart types: a bar chart showing which media outlets publish the most on the topic, a timeline visualizing the evolution of publications over time, and a heatmap that cross-references sentiment with content categories. 
All visual aspects are customizable: color palette, titles, axis labels, and save paths. The methods first prepare the data with Pandas aggregations and then generate the visualizations, automatically saving them as PNG files.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;4️⃣ PDF Report Generator&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;report_analysis.py&lt;/code&gt;: This class assembles the final report in professional PDF format using ReportLab. It combines multiple elements: a customizable logo, corporate-style headers, informative tables about the analyzed dataset, pre-generated visualizations, formatted narratives with full Markdown support (including headings, lists, code, and emphasis), and conclusions and storytelling sections with different tone variations. &lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎯Process Orchestration
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;main.py&lt;/code&gt; file constitutes the application's main entry point, orchestrating the entire Brand Journalism pipeline. This script coordinates the interaction between all the developed classes, managing the flow from real-time information retrieval to the generation of the final document, ensuring that each component executes in the correct order and with the necessary dependencies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🐍main.py&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;brand_journalist_analyzer&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BrandJournalistAnalyzer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;report_analysis&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ReportAnalysis&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;report_plots&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataVisualizer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Cargar variables de entorno
&lt;/span&gt;    &lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MODEL_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generar ruta con timestamp
&lt;/span&gt;    &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y%m%d%H%M%S&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;output_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report/news_report_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Inicializar analizador
&lt;/span&gt;    &lt;span class="n"&gt;analyzer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BrandJournalistAnalyzer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Buscar o cargar noticias (usa caché si existe)
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_load_or_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;force_refresh&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;search_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generar storytelling y conclusiones
&lt;/span&gt;    &lt;span class="n"&gt;storytelling&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_storytelling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;conclusion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_conclusion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Crear visualizaciones
&lt;/span&gt;    &lt;span class="n"&gt;visualizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataVisualizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;search_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;visualizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot_news_by_source&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;visualizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot_news_over_time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;visualizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot_sentiment_category_heatmap&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Generar reporte PDF
&lt;/span&gt;    &lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ReportAnalysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;search_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;conclusion&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conclusion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;storytelling&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;storytelling&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_report&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  🗒 Report Generation
&lt;/h3&gt;

&lt;p&gt;The system automatically generates a professional PDF report using Seaborn/Matplotlib for visuals and ReportLab for document layout. It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Media coverage charts&lt;/li&gt;
&lt;li&gt;Temporal trends&lt;/li&gt;
&lt;li&gt;Heatmap crossing content categories with sentiment&lt;/li&gt;
&lt;li&gt;Structured storytelling and analytical conclusions&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Final Report Structure
&lt;/h3&gt;

&lt;p&gt;In this use case, we generated a four-page PDF report that provides a comprehensive overview of the analysis, starting with complete details of the websites and media outlets where relevant news stories on the researched topic were found.&lt;/p&gt;

&lt;p&gt;The document includes graphical visualizations specifically designed to analyze temporal publishing trends, allowing for the identification of patterns of interest over time, as well as categorical classifications based on the criteria identified by the AI ​​model following the instructions defined in the search prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl8wktg9u3u4z7htzjok.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl8wktg9u3u4z7htzjok.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The final section of the report presents analytical conclusions based on quantitative data and storytelling narratives structured in different tones, providing multiple perspectives on the same information.&lt;/p&gt;




&lt;h1&gt;
  
  
  💡 Conclusions
&lt;/h1&gt;

&lt;p&gt;AI can be a powerful tool for optimizing research and analysis processes, but I still believe that authentic company communication requires the perspective, sensitivity, and values ​​that only people can provide.&lt;/p&gt;

&lt;p&gt;This tutorial offers an automated &lt;strong&gt;starting point&lt;/strong&gt; that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collects and structures scattered information&lt;/li&gt;
&lt;li&gt;Identifies patterns and trends in large data volumes&lt;/li&gt;
&lt;li&gt;Generates evidence-based insights&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, Brand Journalism work should remain in the hands of professionals who can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interpret data within the organizational context&lt;/li&gt;
&lt;li&gt;Align narratives with real corporate values&lt;/li&gt;
&lt;li&gt;Add nuances, experiences, and internal perspectives&lt;/li&gt;
&lt;li&gt;Ensure the message genuinely reflects brand identity&lt;/li&gt;
&lt;li&gt;Humanize content with empathy and authentic connection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AI&lt;/strong&gt; provides the knowledge foundation, but people create the true connection with the audience. Therefore, effective storytelling emerges from combining automated analysis with human narrative craftsmanship.&lt;/p&gt;




&lt;h1&gt;
  
  
  📚 References:
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;What Is Brand Journalism — and Why It Matters.&lt;br&gt;
The New York Times Licensing Group.&lt;/strong&gt;&lt;br&gt;
Retrieved from &lt;a href="https://nytlicensing.com/latest/marketing/brand-journalism-and-why-it-matters/" rel="noopener noreferrer"&gt;https://nytlicensing.com/latest/marketing/brand-journalism-and-why-it-matters/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gemini About.&lt;/strong&gt;&lt;br&gt;
Google.&lt;br&gt;
Retrieved from &lt;a href="https://gemini.google/about/" rel="noopener noreferrer"&gt;https://gemini.google/about/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pichai, S., &amp;amp; Hassabis, D. (2023, December 6). Introducing Gemini: Our largest and most capable AI model.&lt;/strong&gt;&lt;br&gt;
Google Blog.&lt;br&gt;
Retrieved from &lt;a href="https://blog.google/technology/ai/google-gemini-ai/#sundar-note" rel="noopener noreferrer"&gt;https://blog.google/technology/ai/google-gemini-ai/#sundar-note&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Grounding with Google Search.&lt;/strong&gt;&lt;br&gt;
Google AI Documentation.&lt;br&gt;
Retrieved from &lt;a href="https://ai.google.dev/gemini-api/docs/google-search?hl=es-419" rel="noopener noreferrer"&gt;https://ai.google.dev/gemini-api/docs/google-search?hl=es-419&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Do you have any other thoughts or suggestions?&lt;/strong&gt; Leave them in the comments.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>cloud</category>
      <category>gemini</category>
      <category>ai</category>
      <category>python</category>
    </item>
    <item>
      <title>How to Use AI in Brand Journalism with Gemini to Transform Digital Information into Strategic Editorial Content?</title>
      <dc:creator>Romina Elena Mendez Escobar</dc:creator>
      <pubDate>Mon, 20 Oct 2025 08:32:49 +0000</pubDate>
      <link>https://dev.to/r_elena_mendez_escobar/how-to-use-ai-in-brand-journalism-with-gemini-to-transform-digital-information-into-strategic-4al3</link>
      <guid>https://dev.to/r_elena_mendez_escobar/how-to-use-ai-in-brand-journalism-with-gemini-to-transform-digital-information-into-strategic-4al3</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;In a hyperconnected world, every post, comment, or interaction contributes to building a brand's reputation. Therefore, identifying what people are talking about and turning it into stories that inform, inspire, and connect is essential for any modern communication strategy.&lt;/p&gt;

&lt;p&gt;This article was born from a concrete question: &lt;strong&gt;how can Generative AI be used to discover what is being said about a company and transform that information into relevant stories?&lt;/strong&gt; Stories that reflect real experiences and concerns, turning them into inspiring narratives that strengthen brand identity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0w2fljlngynn2erhyfa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0w2fljlngynn2erhyfa.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this tutorial, you will learn how to use Google Gemini to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🔍 Search for information&lt;/strong&gt; using generative AI integrated with Google Search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;✍️ Transform findings&lt;/strong&gt; into structured journalistic narratives&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📊 Generate visual reports&lt;/strong&gt; with graphics and automated storytelling&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
   What is Brand Journalism?
&lt;/h1&gt;

&lt;p&gt;According to an article by &lt;a href="https://nytlicensing.com/latest/marketing/brand-journalism-and-why-it-matters/" rel="noopener noreferrer"&gt;The New York Times Licensing Group&lt;/a&gt;, readers experience significant content fatigue: there are more than 1.8 billion websites and over 70 million blogs published each month.&lt;/p&gt;

&lt;p&gt;Brand Journalism is a communication strategy where brands adopt journalistic techniques to tell relevant and engaging stories. Instead of direct advertising messages, content is created with a narrative, informative, and value-added approach, similar to traditional media.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0v8nmvc9808p7bepx3y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0v8nmvc9808p7bepx3y.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Journalistic techniques:&lt;/strong&gt; Application of rigorous journalistic methods to create credible and well-structured content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audience interests:&lt;/strong&gt; Focus on the real interests of the audience, not just the messages the brand wants to convey.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality and useful information&lt;/strong&gt;: Content that educates, informs, or solves concrete problems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use of different formats:&lt;/strong&gt; Variety of formats (reports, interviews, analyses, infographics, videos) to maintain engagement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storytelling:&lt;/strong&gt; Narratives that connect emotionally with values, experiences, and social impact.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu5p8wn06a3uvnjjkmslx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu5p8wn06a3uvnjjkmslx.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Benefits
&lt;/h2&gt;

&lt;p&gt;The benefits we can identify based on this are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Brand Positioning:&lt;/strong&gt; Establish yourself as a thought leader in your industry.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audience Loyalty:&lt;/strong&gt; Build authentic and lasting relationships with your audience.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Differentiation against the Competition:&lt;/strong&gt; Stand out from competitors through higher-quality editorial content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Greater Organic Reach:&lt;/strong&gt; Valuable content is naturally shared, amplifying reach without direct advertising investment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8om3s7cq6o6mooithofs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8om3s7cq6o6mooithofs.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  What is Generative AI?
&lt;/h1&gt;

&lt;p&gt;Generative AI is a branch of artificial intelligence focused on creating new and original content: text, images, audio, video, or synthetic data. Its development has been possible thanks to deep learning, especially through advanced architectures such as transformers, which process information in parallel and capture complex relationships in large data volumes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources on GenAI
&lt;/h2&gt;

&lt;p&gt;I have written a series of articles on the fundamentals of generative AI&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqir56tz3mi32h2hwm6mf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqir56tz3mi32h2hwm6mf.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-1-prompt-basics-from-theory-to-practice-1a5"&gt;GenAI Foundations – Chapter 1: Prompt Basics: From Theory to Practice&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-2-prompt-engineering-in-action-unlocking-better-ai-responses-l28"&gt;GenAI Foundations – Chapter 2: Prompt Engineering in Action – Unlocking Better AI Responses&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-3-rag-patterns-and-best-practices-cpc"&gt;GenAI Foundations – Chapter 3: RAG Patterns and Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-4-model-customization-evaluation-can-we-trust-the-outputs-i21"&gt;GenAI Foundations – Chapter 4: Model Customization &amp;amp; Evaluation – Can We Trust the Outputs?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-5-project-planning-with-the-generative-ai-canvas-2o73"&gt;GenAI Foundations – Chapter 5: Project Planning with the Generative AI Canvas&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Gemini
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Gemini&lt;/strong&gt; is a family of multimodal AI models developed by Google DeepMind. It integrates into multiple Google products and can process text, images, and other data types simultaneously.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grounding with Google Search
&lt;/h3&gt;

&lt;p&gt;For this use case, we will use the Grounding with Google Search functionality, which connects the model directly to Google to perform searches and obtain up-to-date information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2fzmliazb4757vbwgw4h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2fzmliazb4757vbwgw4h.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Main Advantages:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;📏Increased Accuracy:&lt;/strong&gt; Reduces model hallucinations by accessing verifiable information.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⚡️Real-Time Information:&lt;/strong&gt; Access to current data, reducing uncertainty about the model's knowledge.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📚Citations and References:&lt;/strong&gt; Retrieves source links and provides control over consulted data sources.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Use Case
&lt;/h1&gt;

&lt;p&gt;Brand Journalism is a strategic tool for companies to communicate their values from an authentic perspective. However, we often need to find topics that might interest our target audience, so it is essential to search for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mentions of the company on different sites&lt;/li&gt;
&lt;li&gt;Reputation and notable aspects&lt;/li&gt;
&lt;li&gt;Trends and relevant conversations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This starting point helps those who write articles or create storytelling based not only on what the company wants to show but also on the external perspective others have of it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Example: 📱iPhone 17
&lt;/h2&gt;

&lt;p&gt;Using the latest iPhone launch as an example, we will:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Search for recently published articles&lt;/li&gt;
&lt;li&gt;Classify and analyze these documents&lt;/li&gt;
&lt;li&gt;Generate a report with visualizations, conclusions, and structured narratives&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Next, we will see how to implement this strategy through an automated workflow that integrates AI and data analysis.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Implementation Process
&lt;/h2&gt;

&lt;p&gt;The following diagram illustrates how our automated analysis system works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2cup991yat975lcwa9u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2cup991yat975lcwa9u.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1️⃣ Search with Google Search
&lt;/h3&gt;

&lt;p&gt;We use &lt;strong&gt;Grounding with Google Search&lt;/strong&gt; to find relevant articles and request output in JSON format using this structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; 
   &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"full article title"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"source_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"media name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"publication date"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"article link"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"site_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"website name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2-4 line summary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"sentiment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"positive/negative/neutral"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rumor/analysis/comparison/market/technical"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"sentiment_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1-10 score"&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  2️⃣ Storytelling Generation
&lt;/h3&gt;

&lt;p&gt;We use another prompt to generate different types of narratives based on the articles found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Analytical Insights:&lt;/strong&gt; Compact analytical summary with concrete data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storytelling Narrative:&lt;/strong&gt; Engaging mini-narrative based on dataset evidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tone Variants (A/B/C):&lt;/strong&gt; Three versions with different focuses: objective, emotional, and strategic.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3️⃣ Report Creation
&lt;/h3&gt;

&lt;p&gt;We generate a PDF report including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Charts created with Seaborn and Matplotlib&lt;/li&gt;
&lt;li&gt;Visual trend analyses&lt;/li&gt;
&lt;li&gt;Narrative conclusions based on generated storytelling&lt;/li&gt;
&lt;li&gt;Customizing the layout using ReportLab&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  Tutorial
&lt;/h1&gt;
&lt;h2&gt;
  
  
  How Does Gemini Work with Google Search?
&lt;/h2&gt;

&lt;p&gt;When performing a query, Gemini not only relies on its internal knowledge but also actively searches updated information on Google Search. This grounding capability allows the model to access real-time data, verify facts, and provide responses based on concrete sources, reducing hallucination risk and ensuring relevance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffn4uuw6qmzatdntnoso1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffn4uuw6qmzatdntnoso1.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Pre-requisite: Access to Gemini API
&lt;/h2&gt;

&lt;p&gt;Before starting, you need to get access to the Gemini API:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create an account in &lt;a href="https://aistudio.google.com/" rel="noopener noreferrer"&gt;Google AI Studio&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Create or log in with your Google account&lt;/li&gt;
&lt;li&gt;Generate your API key from the control panel&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: You can use Gemini's free tier to test this project.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwu089z6p0l4z738jbuqz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwu089z6p0l4z738jbuqz.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once you have your API key, configure it in a .env file:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;API_KEY &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tu_api_key_de_gemini"&lt;/span&gt;
MODEL_ID &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"gemini-2.5-flash"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;We use Gemini 2.5 Flash because it is the most cost-efficient model optimized for frequent, low-cost tasks.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Repository Structure
&lt;/h2&gt;

&lt;p&gt;For this tutorial you must clone the following repository and you can get the complete code from this tutorial.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/RominaElenaMendezEscobar" rel="noopener noreferrer"&gt;
        RominaElenaMendezEscobar
      &lt;/a&gt; / &lt;a href="https://github.com/RominaElenaMendezEscobar/brand-journalism-gemini" rel="noopener noreferrer"&gt;
        brand-journalism-gemini
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Tutorial about Brand Journalism Code Using Google Gemini
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;&lt;a href="https://www.buymeacoffee.com/r0mymendez" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/b96fd4ea89ea15fcec30a4f86382eef0bbd17454aa3a8d4de8c8c5e92b55cf6c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4275792532304d6525323041253230436f666665652d737570706f72742532306d79253230776f726b2d4646444430303f7374796c653d666c6174266c6162656c436f6c6f723d313031303130266c6f676f3d6275792d6d652d612d636f66666565266c6f676f436f6c6f723d7768697465" alt="Buy Me A Coffee"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;How to Use AI in Brand Journalism with Gemini to Transform Digital Information into Strategic Editorial Content?&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Introduction&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;In a hyperconnected world, every post, comment, or interaction contributes to building a brand's reputation. Therefore, identifying what people are talking about and turning it into stories that inform, inspire, and connect is essential for any modern communication strategy.&lt;/p&gt;

&lt;p&gt;This repository was born from a concrete question: &lt;strong&gt;how can Generative AI be used to discover what is being said about a company and transform that information into relevant stories?&lt;/strong&gt; Stories that reflect real experiences and concerns, turning them into inspiring narratives that strengthen brand identity.&lt;/p&gt;

&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/RominaElenaMendezEscobar/brand-journalism-gemini/img/readme/1.google-search.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FRominaElenaMendezEscobar%2Fbrand-journalism-gemini%2Fimg%2Freadme%2F1.google-search.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;In this tutorial, you will learn how to use Google Gemini to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🔍 Search for information&lt;/strong&gt; using generative AI integrated with Google Search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;✍️ Transform findings&lt;/strong&gt; into structured journalistic narratives&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📊 Generate visual reports&lt;/strong&gt; with graphics and automated storytelling&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;# What is Brand Journalism
According to an…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/RominaElenaMendezEscobar/brand-journalism-gemini" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;br&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;project/
   ├── img/                    &lt;span class="c"&gt;# Generated graphics&lt;/span&gt;
   ├── prompt/
   │   ├── prompt_search.txt       &lt;span class="c"&gt;# Search Prompt&lt;/span&gt;
   │   └── prompt_storytelling.txt &lt;span class="c"&gt;# Prompt for narrative&lt;/span&gt;
   ├── report/                &lt;span class="c"&gt;# PDFs generated&lt;/span&gt;
   ├── brand_journalist_analyzer.py
   ├── report_plots.py
   ├── report_analysis.py
   └── main.py
   └── .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Main Files
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1️⃣ Prompts (/prompt)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🗒&lt;code&gt;prompt_search.txt&lt;/code&gt;&lt;/strong&gt;: Here we define how to perform the search in Google Search and structure the results in JSON. This prompt instructs the model to return structured information with fields such as the article's title, source, date, URL, summary, sentiment, and category.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🗒&lt;code&gt;prompt_storytelling.txt&lt;/code&gt;&lt;/strong&gt;: In this file, we define how to generate conclusions and storytelling based on the articles found. It requests different types of outputs, including objective analysis, immersive narratives, and three tone variants (objective, emotional, and emotional).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;2️⃣ Brand Journalism Analyzer&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🗒&lt;code&gt;brand_journalist_analyzer.py&lt;/code&gt;&lt;/strong&gt;: This class is the core of the application and handles all interaction with the Gemini API. It implements three main functionalities: news retrieval using Google Search, structured storytelling generation, and analytical insights extraction. 
The most important method is &lt;strong&gt;search_news()&lt;/strong&gt;, which executes real-time searches and returns structured data in JSON format. To use integrated Google Search, simply set &lt;code&gt;config={"tools": [{"google_search": {}}]}&lt;/code&gt; in the API call.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_news&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_dataframe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search for news on a topic using Google Search.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search_prompt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}}]}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Process and clean JSON response
&lt;/span&gt;    &lt;span class="n"&gt;txt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="n"&gt;clean_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_clean_json_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;txt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;3️⃣ Visualization Generator&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;report_plots.py:&lt;/code&gt; This class creates all the report visualizations using Seaborn and Matplotlib. It generates three essential chart types: a bar chart showing which media outlets publish the most on the topic, a timeline visualizing the evolution of publications over time, and a heatmap that cross-references sentiment with content categories. 
All visual aspects are customizable: color palette, titles, axis labels, and save paths. The methods first prepare the data with Pandas aggregations and then generate the visualizations, automatically saving them as PNG files.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;4️⃣ PDF Report Generator&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;report_analysis.py&lt;/code&gt;: This class assembles the final report in professional PDF format using ReportLab. It combines multiple elements: a customizable logo, corporate-style headers, informative tables about the analyzed dataset, pre-generated visualizations, formatted narratives with full Markdown support (including headings, lists, code, and emphasis), and conclusions and storytelling sections with different tone variations. &lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎯Process Orchestration
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;main.py&lt;/code&gt; file constitutes the application's main entry point, orchestrating the entire Brand Journalism pipeline. This script coordinates the interaction between all the developed classes, managing the flow from real-time information retrieval to the generation of the final document, ensuring that each component executes in the correct order and with the necessary dependencies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🐍main.py&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;brand_journalist_analyzer&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BrandJournalistAnalyzer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;report_analysis&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ReportAnalysis&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;report_plots&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataVisualizer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Cargar variables de entorno
&lt;/span&gt;    &lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MODEL_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generar ruta con timestamp
&lt;/span&gt;    &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y%m%d%H%M%S&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;output_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report/news_report_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Inicializar analizador
&lt;/span&gt;    &lt;span class="n"&gt;analyzer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BrandJournalistAnalyzer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Buscar o cargar noticias (usa caché si existe)
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_load_or_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;force_refresh&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;search_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generar storytelling y conclusiones
&lt;/span&gt;    &lt;span class="n"&gt;storytelling&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_storytelling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;conclusion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_conclusion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Crear visualizaciones
&lt;/span&gt;    &lt;span class="n"&gt;visualizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataVisualizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;search_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;visualizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot_news_by_source&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;visualizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot_news_over_time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;visualizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot_sentiment_category_heatmap&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Generar reporte PDF
&lt;/span&gt;    &lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ReportAnalysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;search_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;conclusion&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conclusion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;storytelling&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;storytelling&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_report&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  🗒 Report Generation
&lt;/h3&gt;

&lt;p&gt;The system automatically generates a professional PDF report using Seaborn/Matplotlib for visuals and ReportLab for document layout. It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Media coverage charts&lt;/li&gt;
&lt;li&gt;Temporal trends&lt;/li&gt;
&lt;li&gt;Heatmap crossing content categories with sentiment&lt;/li&gt;
&lt;li&gt;Structured storytelling and analytical conclusions&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Final Report Structure
&lt;/h3&gt;

&lt;p&gt;In this use case, we generated a four-page PDF report that provides a comprehensive overview of the analysis, starting with complete details of the websites and media outlets where relevant news stories on the researched topic were found.&lt;/p&gt;

&lt;p&gt;The document includes graphical visualizations specifically designed to analyze temporal publishing trends, allowing for the identification of patterns of interest over time, as well as categorical classifications based on the criteria identified by the AI ​​model following the instructions defined in the search prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1o6v3ghzui9xt0epwld.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1o6v3ghzui9xt0epwld.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The final section of the report presents analytical conclusions based on quantitative data and storytelling narratives structured in different tones, providing multiple perspectives on the same information.&lt;/p&gt;




&lt;h1&gt;
  
  
  💡 Conclusions
&lt;/h1&gt;

&lt;p&gt;AI can be a powerful tool for optimizing research and analysis processes, but I still believe that authentic company communication requires the perspective, sensitivity, and values ​​that only people can provide.&lt;/p&gt;

&lt;p&gt;This tutorial offers an automated &lt;strong&gt;starting point&lt;/strong&gt; that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collects and structures scattered information&lt;/li&gt;
&lt;li&gt;Identifies patterns and trends in large data volumes&lt;/li&gt;
&lt;li&gt;Generates evidence-based insights&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, Brand Journalism work should remain in the hands of professionals who can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interpret data within the organizational context&lt;/li&gt;
&lt;li&gt;Align narratives with real corporate values&lt;/li&gt;
&lt;li&gt;Add nuances, experiences, and internal perspectives&lt;/li&gt;
&lt;li&gt;Ensure the message genuinely reflects brand identity&lt;/li&gt;
&lt;li&gt;Humanize content with empathy and authentic connection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AI&lt;/strong&gt; provides the knowledge foundation, but people create the true connection with the audience. Therefore, effective storytelling emerges from combining automated analysis with human narrative craftsmanship.&lt;/p&gt;




&lt;h1&gt;
  
  
  📚 References:
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;What Is Brand Journalism — and Why It Matters.&lt;br&gt;
The New York Times Licensing Group.&lt;/strong&gt;&lt;br&gt;
Retrieved from &lt;a href="https://nytlicensing.com/latest/marketing/brand-journalism-and-why-it-matters/" rel="noopener noreferrer"&gt;https://nytlicensing.com/latest/marketing/brand-journalism-and-why-it-matters/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gemini About.&lt;/strong&gt;&lt;br&gt;
Google.&lt;br&gt;
Retrieved from &lt;a href="https://gemini.google/about/" rel="noopener noreferrer"&gt;https://gemini.google/about/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pichai, S., &amp;amp; Hassabis, D. (2023, December 6). Introducing Gemini: Our largest and most capable AI model.&lt;/strong&gt;&lt;br&gt;
Google Blog.&lt;br&gt;
Retrieved from &lt;a href="https://blog.google/technology/ai/google-gemini-ai/#sundar-note" rel="noopener noreferrer"&gt;https://blog.google/technology/ai/google-gemini-ai/#sundar-note&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Grounding with Google Search.&lt;/strong&gt;&lt;br&gt;
Google AI Documentation.&lt;br&gt;
Retrieved from &lt;a href="https://ai.google.dev/gemini-api/docs/google-search?hl=es-419" rel="noopener noreferrer"&gt;https://ai.google.dev/gemini-api/docs/google-search?hl=es-419&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Do you have any other thoughts or suggestions?&lt;/strong&gt; Leave them in the comments.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>python</category>
      <category>cloud</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>How to Use AI in Brand Journalism with Gemini to Transform Digital Information into Strategic Editorial Content?</title>
      <dc:creator>Romina Elena Mendez Escobar</dc:creator>
      <pubDate>Mon, 20 Oct 2025 08:08:06 +0000</pubDate>
      <link>https://dev.to/r_elena_mendez_escobar/how-to-use-ai-in-brand-journalism-with-gemini-to-transform-digital-information-into-strategic-202k</link>
      <guid>https://dev.to/r_elena_mendez_escobar/how-to-use-ai-in-brand-journalism-with-gemini-to-transform-digital-information-into-strategic-202k</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;In a hyperconnected world, every post, comment, or interaction contributes to building a brand's reputation. Therefore, identifying what people are talking about and turning it into stories that inform, inspire, and connect is essential for any modern communication strategy.&lt;/p&gt;

&lt;p&gt;This article was born from a concrete question: &lt;strong&gt;how can Generative AI be used to discover what is being said about a company and transform that information into relevant stories?&lt;/strong&gt; Stories that reflect real experiences and concerns, turning them into inspiring narratives that strengthen brand identity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frt2039xahlq44ubd8akt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frt2039xahlq44ubd8akt.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this tutorial, you will learn how to use Google Gemini to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🔍 Search for information&lt;/strong&gt; using generative AI integrated with Google Search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;✍️ Transform findings&lt;/strong&gt; into structured journalistic narratives&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📊 Generate visual reports&lt;/strong&gt; with graphics and automated storytelling&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
   What is Brand Journalism?
&lt;/h1&gt;

&lt;p&gt;According to an article by &lt;a href="https://nytlicensing.com/latest/marketing/brand-journalism-and-why-it-matters/" rel="noopener noreferrer"&gt;The New York Times Licensing Group&lt;/a&gt;, readers experience significant content fatigue: there are more than 1.8 billion websites and over 70 million blogs published each month.&lt;/p&gt;

&lt;p&gt;Brand Journalism is a communication strategy where brands adopt journalistic techniques to tell relevant and engaging stories. Instead of direct advertising messages, content is created with a narrative, informative, and value-added approach, similar to traditional media.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy87408ds515b2j0ek4wd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy87408ds515b2j0ek4wd.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Journalistic techniques:&lt;/strong&gt; Application of rigorous journalistic methods to create credible and well-structured content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audience interests:&lt;/strong&gt; Focus on the real interests of the audience, not just the messages the brand wants to convey.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality and useful information&lt;/strong&gt;: Content that educates, informs, or solves concrete problems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use of different formats:&lt;/strong&gt; Variety of formats (reports, interviews, analyses, infographics, videos) to maintain engagement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storytelling:&lt;/strong&gt; Narratives that connect emotionally with values, experiences, and social impact.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feshyu7cfka0kyj1ebyz3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feshyu7cfka0kyj1ebyz3.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Benefits
&lt;/h2&gt;

&lt;p&gt;The benefits we can identify based on this are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Brand Positioning:&lt;/strong&gt; Establish yourself as a thought leader in your industry.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audience Loyalty:&lt;/strong&gt; Build authentic and lasting relationships with your audience.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Differentiation against the Competition:&lt;/strong&gt; Stand out from competitors through higher-quality editorial content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Greater Organic Reach:&lt;/strong&gt; Valuable content is naturally shared, amplifying reach without direct advertising investment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fru6de5g4ncc44xfz6qoo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fru6de5g4ncc44xfz6qoo.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  What is Generative AI?
&lt;/h1&gt;

&lt;p&gt;Generative AI is a branch of artificial intelligence focused on creating new and original content: text, images, audio, video, or synthetic data. Its development has been possible thanks to deep learning, especially through advanced architectures such as transformers, which process information in parallel and capture complex relationships in large data volumes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources on GenAI
&lt;/h2&gt;

&lt;p&gt;I have written a series of articles on the fundamentals of generative AI&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs3nruzp3mjknrh9r1vfw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs3nruzp3mjknrh9r1vfw.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-1-prompt-basics-from-theory-to-practice-1a5"&gt;GenAI Foundations – Chapter 1: Prompt Basics: From Theory to Practice&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-2-prompt-engineering-in-action-unlocking-better-ai-responses-l28"&gt;GenAI Foundations – Chapter 2: Prompt Engineering in Action – Unlocking Better AI Responses&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-3-rag-patterns-and-best-practices-cpc"&gt;GenAI Foundations – Chapter 3: RAG Patterns and Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-4-model-customization-evaluation-can-we-trust-the-outputs-i21"&gt;GenAI Foundations – Chapter 4: Model Customization &amp;amp; Evaluation – Can We Trust the Outputs?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-5-project-planning-with-the-generative-ai-canvas-2o73"&gt;GenAI Foundations – Chapter 5: Project Planning with the Generative AI Canvas&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Gemini
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Gemini&lt;/strong&gt; is a family of multimodal AI models developed by Google DeepMind. It integrates into multiple Google products and can process text, images, and other data types simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltxawwdiudtxpzvkaj7g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltxawwdiudtxpzvkaj7g.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Grounding with Google Search
&lt;/h3&gt;

&lt;p&gt;For this use case, we will use the Grounding with Google Search functionality, which connects the model directly to Google to perform searches and obtain up-to-date information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kkjfcyb2x8tih6vx91l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kkjfcyb2x8tih6vx91l.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Main Advantages:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;📏Increased Accuracy:&lt;/strong&gt; Reduces model hallucinations by accessing verifiable information.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⚡️Real-Time Information:&lt;/strong&gt; Access to current data, reducing uncertainty about the model's knowledge.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📚Citations and References:&lt;/strong&gt; Retrieves source links and provides control over consulted data sources.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Use Case
&lt;/h1&gt;

&lt;p&gt;Brand Journalism is a strategic tool for companies to communicate their values from an authentic perspective. However, we often need to find topics that might interest our target audience, so it is essential to search for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mentions of the company on different sites&lt;/li&gt;
&lt;li&gt;Reputation and notable aspects&lt;/li&gt;
&lt;li&gt;Trends and relevant conversations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This starting point helps those who write articles or create storytelling based not only on what the company wants to show but also on the external perspective others have of it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Example: 📱iPhone 17
&lt;/h2&gt;

&lt;p&gt;Using the latest iPhone launch as an example, we will:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Search for recently published articles&lt;/li&gt;
&lt;li&gt;Classify and analyze these documents&lt;/li&gt;
&lt;li&gt;Generate a report with visualizations, conclusions, and structured narratives&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Next, we will see how to implement this strategy through an automated workflow that integrates AI and data analysis.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Implementation Process
&lt;/h2&gt;

&lt;p&gt;The following diagram illustrates how our automated analysis system works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhn2zaiq0v2f42wfm91e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhn2zaiq0v2f42wfm91e.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1️⃣ Search with Google Search
&lt;/h3&gt;

&lt;p&gt;We use &lt;strong&gt;Grounding with Google Search&lt;/strong&gt; to find relevant articles and request output in JSON format using this structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; 
   &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"full article title"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"source_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"media name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"publication date"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"article link"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"site_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"website name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2-4 line summary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"sentiment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"positive/negative/neutral"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rumor/analysis/comparison/market/technical"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"sentiment_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1-10 score"&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  2️⃣ Storytelling Generation
&lt;/h3&gt;

&lt;p&gt;We use another prompt to generate different types of narratives based on the articles found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Analytical Insights:&lt;/strong&gt; Compact analytical summary with concrete data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storytelling Narrative:&lt;/strong&gt; Engaging mini-narrative based on dataset evidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tone Variants (A/B/C):&lt;/strong&gt; Three versions with different focuses: objective, emotional, and strategic.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3️⃣ Report Creation
&lt;/h3&gt;

&lt;p&gt;We generate a PDF report including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Charts created with Seaborn and Matplotlib&lt;/li&gt;
&lt;li&gt;Visual trend analyses&lt;/li&gt;
&lt;li&gt;Narrative conclusions based on generated storytelling&lt;/li&gt;
&lt;li&gt;Customizing the layout using ReportLab&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  Tutorial
&lt;/h1&gt;
&lt;h2&gt;
  
  
  How Does Gemini Work with Google Search?
&lt;/h2&gt;

&lt;p&gt;When performing a query, Gemini not only relies on its internal knowledge but also actively searches updated information on Google Search. This grounding capability allows the model to access real-time data, verify facts, and provide responses based on concrete sources, reducing hallucination risk and ensuring relevance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxcbxxco5lcz3viy2k6fk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxcbxxco5lcz3viy2k6fk.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Pre-requisite: Access to Gemini API
&lt;/h2&gt;

&lt;p&gt;Before starting, you need to get access to the Gemini API:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create an account in &lt;a href="https://aistudio.google.com/" rel="noopener noreferrer"&gt;Google AI Studio&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Create or log in with your Google account&lt;/li&gt;
&lt;li&gt;Generate your API key from the control panel&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: You can use Gemini's free tier to test this project.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1i6fbxt80e2jqlal8z0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1i6fbxt80e2jqlal8z0.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once you have your API key, configure it in a .env file:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;API_KEY &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tu_api_key_de_gemini"&lt;/span&gt;
MODEL_ID &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"gemini-2.5-flash"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;We use Gemini 2.5 Flash because it is the most cost-efficient model optimized for frequent, low-cost tasks.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Repository Structure
&lt;/h2&gt;

&lt;p&gt;For this tutorial you must clone the following repository and you can get the complete code from this tutorial.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/RominaElenaMendezEscobar" rel="noopener noreferrer"&gt;
        RominaElenaMendezEscobar
      &lt;/a&gt; / &lt;a href="https://github.com/RominaElenaMendezEscobar/brand-journalism-gemini" rel="noopener noreferrer"&gt;
        brand-journalism-gemini
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Tutorial about Brand Journalism Code Using Google Gemini
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;&lt;a href="https://www.buymeacoffee.com/r0mymendez" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/b96fd4ea89ea15fcec30a4f86382eef0bbd17454aa3a8d4de8c8c5e92b55cf6c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4275792532304d6525323041253230436f666665652d737570706f72742532306d79253230776f726b2d4646444430303f7374796c653d666c6174266c6162656c436f6c6f723d313031303130266c6f676f3d6275792d6d652d612d636f66666565266c6f676f436f6c6f723d7768697465" alt="Buy Me A Coffee"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;How to Use AI in Brand Journalism with Gemini to Transform Digital Information into Strategic Editorial Content?&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Introduction&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;In a hyperconnected world, every post, comment, or interaction contributes to building a brand's reputation. Therefore, identifying what people are talking about and turning it into stories that inform, inspire, and connect is essential for any modern communication strategy.&lt;/p&gt;

&lt;p&gt;This repository was born from a concrete question: &lt;strong&gt;how can Generative AI be used to discover what is being said about a company and transform that information into relevant stories?&lt;/strong&gt; Stories that reflect real experiences and concerns, turning them into inspiring narratives that strengthen brand identity.&lt;/p&gt;

&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/RominaElenaMendezEscobar/brand-journalism-gemini/img/readme/1.google-search.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FRominaElenaMendezEscobar%2Fbrand-journalism-gemini%2Fimg%2Freadme%2F1.google-search.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;In this tutorial, you will learn how to use Google Gemini to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🔍 Search for information&lt;/strong&gt; using generative AI integrated with Google Search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;✍️ Transform findings&lt;/strong&gt; into structured journalistic narratives&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📊 Generate visual reports&lt;/strong&gt; with graphics and automated storytelling&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;# What is Brand Journalism
According to an…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/RominaElenaMendezEscobar/brand-journalism-gemini" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;br&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;project/
   ├── img/                    &lt;span class="c"&gt;# Generated graphics&lt;/span&gt;
   ├── prompt/
   │   ├── prompt_search.txt       &lt;span class="c"&gt;# Search Prompt&lt;/span&gt;
   │   └── prompt_storytelling.txt &lt;span class="c"&gt;# Prompt for narrative&lt;/span&gt;
   ├── report/                &lt;span class="c"&gt;# PDFs generated&lt;/span&gt;
   ├── brand_journalist_analyzer.py
   ├── report_plots.py
   ├── report_analysis.py
   └── main.py
   └── .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Main Files
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1️⃣ Prompts (/prompt)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🗒&lt;code&gt;prompt_search.txt&lt;/code&gt;&lt;/strong&gt;: Here we define how to perform the search in Google Search and structure the results in JSON. This prompt instructs the model to return structured information with fields such as the article's title, source, date, URL, summary, sentiment, and category.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🗒&lt;code&gt;prompt_storytelling.txt&lt;/code&gt;&lt;/strong&gt;: In this file, we define how to generate conclusions and storytelling based on the articles found. It requests different types of outputs, including objective analysis, immersive narratives, and three tone variants (objective, emotional, and emotional).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;2️⃣ Brand Journalism Analyzer&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🗒&lt;code&gt;brand_journalist_analyzer.py&lt;/code&gt;&lt;/strong&gt;: This class is the core of the application and handles all interaction with the Gemini API. It implements three main functionalities: news retrieval using Google Search, structured storytelling generation, and analytical insights extraction. 
The most important method is &lt;strong&gt;search_news()&lt;/strong&gt;, which executes real-time searches and returns structured data in JSON format. To use integrated Google Search, simply set &lt;code&gt;config={"tools": [{"google_search": {}}]}&lt;/code&gt; in the API call.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_news&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_dataframe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search for news on a topic using Google Search.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search_prompt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}}]}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Process and clean JSON response
&lt;/span&gt;    &lt;span class="n"&gt;txt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="n"&gt;clean_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_clean_json_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;txt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;3️⃣ Visualization Generator&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;report_plots.py:&lt;/code&gt; This class creates all the report visualizations using Seaborn and Matplotlib. It generates three essential chart types: a bar chart showing which media outlets publish the most on the topic, a timeline visualizing the evolution of publications over time, and a heatmap that cross-references sentiment with content categories. 
All visual aspects are customizable: color palette, titles, axis labels, and save paths. The methods first prepare the data with Pandas aggregations and then generate the visualizations, automatically saving them as PNG files.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;4️⃣ PDF Report Generator&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;report_analysis.py&lt;/code&gt;: This class assembles the final report in professional PDF format using ReportLab. It combines multiple elements: a customizable logo, corporate-style headers, informative tables about the analyzed dataset, pre-generated visualizations, formatted narratives with full Markdown support (including headings, lists, code, and emphasis), and conclusions and storytelling sections with different tone variations. &lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎯Process Orchestration
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;main.py&lt;/code&gt; file constitutes the application's main entry point, orchestrating the entire Brand Journalism pipeline. This script coordinates the interaction between all the developed classes, managing the flow from real-time information retrieval to the generation of the final document, ensuring that each component executes in the correct order and with the necessary dependencies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🐍main.py&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;brand_journalist_analyzer&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BrandJournalistAnalyzer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;report_analysis&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ReportAnalysis&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;report_plots&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataVisualizer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Cargar variables de entorno
&lt;/span&gt;    &lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MODEL_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generar ruta con timestamp
&lt;/span&gt;    &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y%m%d%H%M%S&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;output_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report/news_report_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Inicializar analizador
&lt;/span&gt;    &lt;span class="n"&gt;analyzer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BrandJournalistAnalyzer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Buscar o cargar noticias (usa caché si existe)
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_load_or_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;force_refresh&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;search_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generar storytelling y conclusiones
&lt;/span&gt;    &lt;span class="n"&gt;storytelling&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_storytelling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;conclusion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_conclusion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Crear visualizaciones
&lt;/span&gt;    &lt;span class="n"&gt;visualizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataVisualizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;search_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;visualizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot_news_by_source&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;visualizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot_news_over_time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;visualizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot_sentiment_category_heatmap&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Generar reporte PDF
&lt;/span&gt;    &lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ReportAnalysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;search_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;conclusion&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conclusion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;storytelling&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;storytelling&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_report&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  🗒 Report Generation
&lt;/h3&gt;

&lt;p&gt;The system automatically generates a professional PDF report using Seaborn/Matplotlib for visuals and ReportLab for document layout. It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Media coverage charts&lt;/li&gt;
&lt;li&gt;Temporal trends&lt;/li&gt;
&lt;li&gt;Heatmap crossing content categories with sentiment&lt;/li&gt;
&lt;li&gt;Structured storytelling and analytical conclusions&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Final Report Structure
&lt;/h3&gt;

&lt;p&gt;In this use case, we generated a four-page PDF report that provides a comprehensive overview of the analysis, starting with complete details of the websites and media outlets where relevant news stories on the researched topic were found.&lt;/p&gt;

&lt;p&gt;The document includes graphical visualizations specifically designed to analyze temporal publishing trends, allowing for the identification of patterns of interest over time, as well as categorical classifications based on the criteria identified by the AI ​​model following the instructions defined in the search prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ea7zmtqtb3zj6vduufv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ea7zmtqtb3zj6vduufv.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The final section of the report presents analytical conclusions based on quantitative data and storytelling narratives structured in different tones, providing multiple perspectives on the same information.&lt;/p&gt;




&lt;h1&gt;
  
  
  💡 Conclusions
&lt;/h1&gt;

&lt;p&gt;AI can be a powerful tool for optimizing research and analysis processes, but I still believe that authentic company communication requires the perspective, sensitivity, and values ​​that only people can provide.&lt;/p&gt;

&lt;p&gt;This tutorial offers an automated &lt;strong&gt;starting point&lt;/strong&gt; that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collects and structures scattered information&lt;/li&gt;
&lt;li&gt;Identifies patterns and trends in large data volumes&lt;/li&gt;
&lt;li&gt;Generates evidence-based insights&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, Brand Journalism work should remain in the hands of professionals who can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interpret data within the organizational context&lt;/li&gt;
&lt;li&gt;Align narratives with real corporate values&lt;/li&gt;
&lt;li&gt;Add nuances, experiences, and internal perspectives&lt;/li&gt;
&lt;li&gt;Ensure the message genuinely reflects brand identity&lt;/li&gt;
&lt;li&gt;Humanize content with empathy and authentic connection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AI&lt;/strong&gt; provides the knowledge foundation, but people create the true connection with the audience. Therefore, effective storytelling emerges from combining automated analysis with human narrative craftsmanship.&lt;/p&gt;




&lt;h1&gt;
  
  
  📚 References:
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;What Is Brand Journalism — and Why It Matters.&lt;br&gt;
The New York Times Licensing Group.&lt;/strong&gt;&lt;br&gt;
Retrieved from &lt;a href="https://nytlicensing.com/latest/marketing/brand-journalism-and-why-it-matters/" rel="noopener noreferrer"&gt;https://nytlicensing.com/latest/marketing/brand-journalism-and-why-it-matters/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gemini About.&lt;/strong&gt;&lt;br&gt;
Google.&lt;br&gt;
Retrieved from &lt;a href="https://gemini.google/about/" rel="noopener noreferrer"&gt;https://gemini.google/about/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pichai, S., &amp;amp; Hassabis, D. (2023, December 6). Introducing Gemini: Our largest and most capable AI model.&lt;/strong&gt;&lt;br&gt;
Google Blog.&lt;br&gt;
Retrieved from &lt;a href="https://blog.google/technology/ai/google-gemini-ai/#sundar-note" rel="noopener noreferrer"&gt;https://blog.google/technology/ai/google-gemini-ai/#sundar-note&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Grounding with Google Search.&lt;/strong&gt;&lt;br&gt;
Google AI Documentation.&lt;br&gt;
Retrieved from &lt;a href="https://ai.google.dev/gemini-api/docs/google-search?hl=es-419" rel="noopener noreferrer"&gt;https://ai.google.dev/gemini-api/docs/google-search?hl=es-419&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Do you have any other thoughts or suggestions?&lt;/strong&gt; Leave them in the comments.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
    </item>
    <item>
      <title>How to Use AI in Brand Journalism with Gemini to Transform Digital Information into Strategic Editorial Content?</title>
      <dc:creator>Romina Elena Mendez Escobar</dc:creator>
      <pubDate>Sun, 19 Oct 2025 22:53:37 +0000</pubDate>
      <link>https://dev.to/r_elena_mendez_escobar/how-to-use-ai-in-brand-journalism-with-gemini-to-transform-digital-information-into-strategic-akl</link>
      <guid>https://dev.to/r_elena_mendez_escobar/how-to-use-ai-in-brand-journalism-with-gemini-to-transform-digital-information-into-strategic-akl</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;In a hyperconnected world, every post, comment, or interaction contributes to building a brand's reputation. Therefore, identifying what people are talking about and turning it into stories that inform, inspire, and connect is essential for any modern communication strategy.&lt;/p&gt;

&lt;p&gt;This article was born from a concrete question: &lt;strong&gt;how can Generative AI be used to discover what is being said about a company and transform that information into relevant stories?&lt;/strong&gt; Stories that reflect real experiences and concerns, turning them into inspiring narratives that strengthen brand identity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gnm27wnto1tpket2ycf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gnm27wnto1tpket2ycf.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this tutorial, you will learn how to use Google Gemini to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🔍 Search for information&lt;/strong&gt; using generative AI integrated with Google Search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;✍️ Transform findings&lt;/strong&gt; into structured journalistic narratives&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📊 Generate visual reports&lt;/strong&gt; with graphics and automated storytelling&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
   What is Brand Journalism?
&lt;/h1&gt;

&lt;p&gt;According to an article by &lt;a href="https://nytlicensing.com/latest/marketing/brand-journalism-and-why-it-matters/" rel="noopener noreferrer"&gt;The New York Times Licensing Group&lt;/a&gt;, readers experience significant content fatigue: there are more than 1.8 billion websites and over 70 million blogs published each month.&lt;/p&gt;

&lt;p&gt;Brand Journalism is a communication strategy where brands adopt journalistic techniques to tell relevant and engaging stories. Instead of direct advertising messages, content is created with a narrative, informative, and value-added approach, similar to traditional media.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focvlilzlaxt3taxvxkr4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Focvlilzlaxt3taxvxkr4.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Journalistic techniques:&lt;/strong&gt; Application of rigorous journalistic methods to create credible and well-structured content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audience interests:&lt;/strong&gt; Focus on the real interests of the audience, not just the messages the brand wants to convey.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality and useful information&lt;/strong&gt;: Content that educates, informs, or solves concrete problems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use of different formats:&lt;/strong&gt; Variety of formats (reports, interviews, analyses, infographics, videos) to maintain engagement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storytelling:&lt;/strong&gt; Narratives that connect emotionally with values, experiences, and social impact.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1awd76drx6clov95gnrw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1awd76drx6clov95gnrw.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Benefits
&lt;/h2&gt;

&lt;p&gt;The benefits we can identify based on this are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Brand Positioning:&lt;/strong&gt; Establish yourself as a thought leader in your industry.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audience Loyalty:&lt;/strong&gt; Build authentic and lasting relationships with your audience.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Differentiation against the Competition:&lt;/strong&gt; Stand out from competitors through higher-quality editorial content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Greater Organic Reach:&lt;/strong&gt; Valuable content is naturally shared, amplifying reach without direct advertising investment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3oxihjetmvmos9emjek.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu3oxihjetmvmos9emjek.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  What is Generative AI?
&lt;/h1&gt;

&lt;p&gt;Generative AI is a branch of artificial intelligence focused on creating new and original content: text, images, audio, video, or synthetic data. Its development has been possible thanks to deep learning, especially through advanced architectures such as transformers, which process information in parallel and capture complex relationships in large data volumes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional Resources on GenAI
&lt;/h2&gt;

&lt;p&gt;I have written a series of articles on the fundamentals of generative AI&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg05c4jyhwh00201v91wn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg05c4jyhwh00201v91wn.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-1-prompt-basics-from-theory-to-practice-1a5"&gt;GenAI Foundations – Chapter 1: Prompt Basics: From Theory to Practice&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-2-prompt-engineering-in-action-unlocking-better-ai-responses-l28"&gt;GenAI Foundations – Chapter 2: Prompt Engineering in Action – Unlocking Better AI Responses&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-3-rag-patterns-and-best-practices-cpc"&gt;GenAI Foundations – Chapter 3: RAG Patterns and Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-4-model-customization-evaluation-can-we-trust-the-outputs-i21"&gt;GenAI Foundations – Chapter 4: Model Customization &amp;amp; Evaluation – Can We Trust the Outputs?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/r0mymendez/genai-foundations-chapter-5-project-planning-with-the-generative-ai-canvas-2o73"&gt;GenAI Foundations – Chapter 5: Project Planning with the Generative AI Canvas&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Gemini
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Gemini&lt;/strong&gt; is a family of multimodal AI models developed by Google DeepMind. It integrates into multiple Google products and can process text, images, and other data types simultaneously.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grounding with Google Search
&lt;/h3&gt;

&lt;p&gt;For this use case, we will use the Grounding with Google Search functionality, which connects the model directly to Google to perform searches and obtain up-to-date information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3zed6muwjo5dtua4tffx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3zed6muwjo5dtua4tffx.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Main Advantages:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;📏Increased Accuracy:&lt;/strong&gt; Reduces model hallucinations by accessing verifiable information.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⚡️Real-Time Information:&lt;/strong&gt; Access to current data, reducing uncertainty about the model's knowledge.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📚Citations and References:&lt;/strong&gt; Retrieves source links and provides control over consulted data sources.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Use Case
&lt;/h1&gt;

&lt;p&gt;Brand Journalism is a strategic tool for companies to communicate their values from an authentic perspective. However, we often need to find topics that might interest our target audience, so it is essential to search for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mentions of the company on different sites&lt;/li&gt;
&lt;li&gt;Reputation and notable aspects&lt;/li&gt;
&lt;li&gt;Trends and relevant conversations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This starting point helps those who write articles or create storytelling based not only on what the company wants to show but also on the external perspective others have of it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Example: 📱iPhone 17
&lt;/h2&gt;

&lt;p&gt;Using the latest iPhone launch as an example, we will:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Search for recently published articles&lt;/li&gt;
&lt;li&gt;Classify and analyze these documents&lt;/li&gt;
&lt;li&gt;Generate a report with visualizations, conclusions, and structured narratives&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Next, we will see how to implement this strategy through an automated workflow that integrates AI and data analysis.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Implementation Process
&lt;/h2&gt;

&lt;p&gt;The following diagram illustrates how our automated analysis system works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fycev350svy25mblvdw6k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fycev350svy25mblvdw6k.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1️⃣ Search with Google Search
&lt;/h3&gt;

&lt;p&gt;We use &lt;strong&gt;Grounding with Google Search&lt;/strong&gt; to find relevant articles and request output in JSON format using this structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; 
   &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"full article title"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"source_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"media name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"publication date"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"article link"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"site_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"website name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2-4 line summary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"sentiment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"positive/negative/neutral"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rumor/analysis/comparison/market/technical"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="nl"&gt;"sentiment_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1-10 score"&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  2️⃣ Storytelling Generation
&lt;/h3&gt;

&lt;p&gt;We use another prompt to generate different types of narratives based on the articles found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Analytical Insights:&lt;/strong&gt; Compact analytical summary with concrete data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storytelling Narrative:&lt;/strong&gt; Engaging mini-narrative based on dataset evidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tone Variants (A/B/C):&lt;/strong&gt; Three versions with different focuses: objective, emotional, and strategic.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3️⃣ Report Creation
&lt;/h3&gt;

&lt;p&gt;We generate a PDF report including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Charts created with Seaborn and Matplotlib&lt;/li&gt;
&lt;li&gt;Visual trend analyses&lt;/li&gt;
&lt;li&gt;Narrative conclusions based on generated storytelling&lt;/li&gt;
&lt;li&gt;Customizing the layout using ReportLab&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  Tutorial
&lt;/h1&gt;
&lt;h2&gt;
  
  
  How Does Gemini Work with Google Search?
&lt;/h2&gt;

&lt;p&gt;When performing a query, Gemini not only relies on its internal knowledge but also actively searches updated information on Google Search. This grounding capability allows the model to access real-time data, verify facts, and provide responses based on concrete sources, reducing hallucination risk and ensuring relevance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhe427glwz3m59rfbvto.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhe427glwz3m59rfbvto.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Pre-requisite: Access to Gemini API
&lt;/h2&gt;

&lt;p&gt;Before starting, you need to get access to the Gemini API:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create an account in &lt;a href="https://aistudio.google.com/" rel="noopener noreferrer"&gt;Google AI Studio&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Create or log in with your Google account&lt;/li&gt;
&lt;li&gt;Generate your API key from the control panel&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: You can use Gemini's free tier to test this project.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1i6fbxt80e2jqlal8z0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1i6fbxt80e2jqlal8z0.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once you have your API key, configure it in a .env file:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;API_KEY &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tu_api_key_de_gemini"&lt;/span&gt;
MODEL_ID &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"gemini-2.5-flash"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;We use Gemini 2.5 Flash because it is the most cost-efficient model optimized for frequent, low-cost tasks.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Repository Structure
&lt;/h2&gt;

&lt;p&gt;For this tutorial you must clone the following repository and you can get the complete code from this tutorial.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/RominaElenaMendezEscobar" rel="noopener noreferrer"&gt;
        RominaElenaMendezEscobar
      &lt;/a&gt; / &lt;a href="https://github.com/RominaElenaMendezEscobar/brand-journalism-gemini" rel="noopener noreferrer"&gt;
        brand-journalism-gemini
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Tutorial about Brand Journalism Code Using Google Gemini
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;&lt;a href="https://www.buymeacoffee.com/r0mymendez" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/b96fd4ea89ea15fcec30a4f86382eef0bbd17454aa3a8d4de8c8c5e92b55cf6c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4275792532304d6525323041253230436f666665652d737570706f72742532306d79253230776f726b2d4646444430303f7374796c653d666c6174266c6162656c436f6c6f723d313031303130266c6f676f3d6275792d6d652d612d636f66666565266c6f676f436f6c6f723d7768697465" alt="Buy Me A Coffee"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;How to Use AI in Brand Journalism with Gemini to Transform Digital Information into Strategic Editorial Content?&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Introduction&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;In a hyperconnected world, every post, comment, or interaction contributes to building a brand's reputation. Therefore, identifying what people are talking about and turning it into stories that inform, inspire, and connect is essential for any modern communication strategy.&lt;/p&gt;

&lt;p&gt;This repository was born from a concrete question: &lt;strong&gt;how can Generative AI be used to discover what is being said about a company and transform that information into relevant stories?&lt;/strong&gt; Stories that reflect real experiences and concerns, turning them into inspiring narratives that strengthen brand identity.&lt;/p&gt;

&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/RominaElenaMendezEscobar/brand-journalism-gemini/img/readme/1.google-search.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FRominaElenaMendezEscobar%2Fbrand-journalism-gemini%2Fimg%2Freadme%2F1.google-search.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;In this tutorial, you will learn how to use Google Gemini to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🔍 Search for information&lt;/strong&gt; using generative AI integrated with Google Search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;✍️ Transform findings&lt;/strong&gt; into structured journalistic narratives&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📊 Generate visual reports&lt;/strong&gt; with graphics and automated storytelling&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;# What is Brand Journalism
According to an…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/RominaElenaMendezEscobar/brand-journalism-gemini" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;br&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;project/
   ├── img/                    &lt;span class="c"&gt;# Generated graphics&lt;/span&gt;
   ├── prompt/
   │   ├── prompt_search.txt       &lt;span class="c"&gt;# Search Prompt&lt;/span&gt;
   │   └── prompt_storytelling.txt &lt;span class="c"&gt;# Prompt for narrative&lt;/span&gt;
   ├── report/                &lt;span class="c"&gt;# PDFs generated&lt;/span&gt;
   ├── brand_journalist_analyzer.py
   ├── report_plots.py
   ├── report_analysis.py
   └── main.py
   └── .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Main Files
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1️⃣ Prompts (/prompt)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🗒&lt;code&gt;prompt_search.txt&lt;/code&gt;&lt;/strong&gt;: Here we define how to perform the search in Google Search and structure the results in JSON. This prompt instructs the model to return structured information with fields such as the article's title, source, date, URL, summary, sentiment, and category.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🗒&lt;code&gt;prompt_storytelling.txt&lt;/code&gt;&lt;/strong&gt;: In this file, we define how to generate conclusions and storytelling based on the articles found. It requests different types of outputs, including objective analysis, immersive narratives, and three tone variants (objective, emotional, and emotional).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;2️⃣ Brand Journalism Analyzer&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🗒&lt;code&gt;brand_journalist_analyzer.py&lt;/code&gt;&lt;/strong&gt;: This class is the core of the application and handles all interaction with the Gemini API. It implements three main functionalities: news retrieval using Google Search, structured storytelling generation, and analytical insights extraction. 
The most important method is &lt;strong&gt;search_news()&lt;/strong&gt;, which executes real-time searches and returns structured data in JSON format. To use integrated Google Search, simply set &lt;code&gt;config={"tools": [{"google_search": {}}]}&lt;/code&gt; in the API call.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_news&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_dataframe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search for news on a topic using Google Search.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search_prompt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}}]}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Process and clean JSON response
&lt;/span&gt;    &lt;span class="n"&gt;txt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="n"&gt;clean_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_clean_json_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;txt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;3️⃣ Visualization Generator&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;report_plots.py:&lt;/code&gt; This class creates all the report visualizations using Seaborn and Matplotlib. It generates three essential chart types: a bar chart showing which media outlets publish the most on the topic, a timeline visualizing the evolution of publications over time, and a heatmap that cross-references sentiment with content categories. 
All visual aspects are customizable: color palette, titles, axis labels, and save paths. The methods first prepare the data with Pandas aggregations and then generate the visualizations, automatically saving them as PNG files.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;4️⃣ PDF Report Generator&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;report_analysis.py&lt;/code&gt;: This class assembles the final report in professional PDF format using ReportLab. It combines multiple elements: a customizable logo, corporate-style headers, informative tables about the analyzed dataset, pre-generated visualizations, formatted narratives with full Markdown support (including headings, lists, code, and emphasis), and conclusions and storytelling sections with different tone variations. &lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎯Process Orchestration
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;main.py&lt;/code&gt; file constitutes the application's main entry point, orchestrating the entire Brand Journalism pipeline. This script coordinates the interaction between all the developed classes, managing the flow from real-time information retrieval to the generation of the final document, ensuring that each component executes in the correct order and with the necessary dependencies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🐍main.py&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;brand_journalist_analyzer&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BrandJournalistAnalyzer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;report_analysis&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ReportAnalysis&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;report_plots&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataVisualizer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Cargar variables de entorno
&lt;/span&gt;    &lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MODEL_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generar ruta con timestamp
&lt;/span&gt;    &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y%m%d%H%M%S&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;output_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report/news_report_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Inicializar analizador
&lt;/span&gt;    &lt;span class="n"&gt;analyzer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BrandJournalistAnalyzer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Buscar o cargar noticias (usa caché si existe)
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_load_or_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;force_refresh&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;search_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generar storytelling y conclusiones
&lt;/span&gt;    &lt;span class="n"&gt;storytelling&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_storytelling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;conclusion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_conclusion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Crear visualizaciones
&lt;/span&gt;    &lt;span class="n"&gt;visualizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataVisualizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;search_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;visualizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot_news_by_source&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;visualizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot_news_over_time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;visualizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot_sentiment_category_heatmap&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Generar reporte PDF
&lt;/span&gt;    &lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ReportAnalysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;search_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;conclusion&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conclusion&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;storytelling&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;storytelling&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_report&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  🗒 Report Generation
&lt;/h3&gt;

&lt;p&gt;The system automatically generates a professional PDF report using Seaborn/Matplotlib for visuals and ReportLab for document layout. It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Media coverage charts&lt;/li&gt;
&lt;li&gt;Temporal trends&lt;/li&gt;
&lt;li&gt;Heatmap crossing content categories with sentiment&lt;/li&gt;
&lt;li&gt;Structured storytelling and analytical conclusions&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Final Report Structure
&lt;/h3&gt;

&lt;p&gt;In this use case, we generated a four-page PDF report that provides a comprehensive overview of the analysis, starting with complete details of the websites and media outlets where relevant news stories on the researched topic were found.&lt;/p&gt;

&lt;p&gt;The document includes graphical visualizations specifically designed to analyze temporal publishing trends, allowing for the identification of patterns of interest over time, as well as categorical classifications based on the criteria identified by the AI ​​model following the instructions defined in the search prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frj3t4bnborcbccydg21n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frj3t4bnborcbccydg21n.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The final section of the report presents analytical conclusions based on quantitative data and storytelling narratives structured in different tones, providing multiple perspectives on the same information.&lt;/p&gt;




&lt;h1&gt;
  
  
  💡 Conclusions
&lt;/h1&gt;

&lt;p&gt;AI can be a powerful tool for optimizing research and analysis processes, but I still believe that authentic company communication requires the perspective, sensitivity, and values ​​that only people can provide.&lt;/p&gt;

&lt;p&gt;This tutorial offers an automated &lt;strong&gt;starting point&lt;/strong&gt; that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collects and structures scattered information&lt;/li&gt;
&lt;li&gt;Identifies patterns and trends in large data volumes&lt;/li&gt;
&lt;li&gt;Generates evidence-based insights&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, Brand Journalism work should remain in the hands of professionals who can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interpret data within the organizational context&lt;/li&gt;
&lt;li&gt;Align narratives with real corporate values&lt;/li&gt;
&lt;li&gt;Add nuances, experiences, and internal perspectives&lt;/li&gt;
&lt;li&gt;Ensure the message genuinely reflects brand identity&lt;/li&gt;
&lt;li&gt;Humanize content with empathy and authentic connection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AI&lt;/strong&gt; provides the knowledge foundation, but people create the true connection with the audience. Therefore, effective storytelling emerges from combining automated analysis with human narrative craftsmanship.&lt;/p&gt;




&lt;h1&gt;
  
  
  📚 References:
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;What Is Brand Journalism — and Why It Matters.&lt;br&gt;
The New York Times Licensing Group.&lt;/strong&gt;&lt;br&gt;
Retrieved from &lt;a href="https://nytlicensing.com/latest/marketing/brand-journalism-and-why-it-matters/" rel="noopener noreferrer"&gt;https://nytlicensing.com/latest/marketing/brand-journalism-and-why-it-matters/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gemini About.&lt;/strong&gt;&lt;br&gt;
Google.&lt;br&gt;
Retrieved from &lt;a href="https://gemini.google/about/" rel="noopener noreferrer"&gt;https://gemini.google/about/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pichai, S., &amp;amp; Hassabis, D. (2023, December 6). Introducing Gemini: Our largest and most capable AI model.&lt;/strong&gt;&lt;br&gt;
Google Blog.&lt;br&gt;
Retrieved from &lt;a href="https://blog.google/technology/ai/google-gemini-ai/#sundar-note" rel="noopener noreferrer"&gt;https://blog.google/technology/ai/google-gemini-ai/#sundar-note&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Grounding with Google Search.&lt;/strong&gt;&lt;br&gt;
Google AI Documentation.&lt;br&gt;
Retrieved from &lt;a href="https://ai.google.dev/gemini-api/docs/google-search?hl=es-419" rel="noopener noreferrer"&gt;https://ai.google.dev/gemini-api/docs/google-search?hl=es-419&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Do you have any other thoughts or suggestions?&lt;/strong&gt; Leave them in the comments.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
    </item>
    <item>
      <title>GenAI Foundations – Chapter 5: Project Planning with the Generative AI Canvas</title>
      <dc:creator>Romina Elena Mendez Escobar</dc:creator>
      <pubDate>Tue, 09 Sep 2025 16:42:55 +0000</pubDate>
      <link>https://dev.to/r_elena_mendez_escobar/genai-foundations-chapter-5-project-planning-with-the-generative-ai-canvas-2o73</link>
      <guid>https://dev.to/r_elena_mendez_escobar/genai-foundations-chapter-5-project-planning-with-the-generative-ai-canvas-2o73</guid>
      <description>&lt;p&gt;&lt;code&gt;👉 “A structured framework to design and validate AI initiatives”&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhs03i0zpa116u8cweeo0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhs03i0zpa116u8cweeo0.png" alt=" " width="800" height="538"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;This chapter represents the &lt;strong&gt;closing of the GenAI Foundations series&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
After exploring prompts, engineering techniques, RAG patterns, and model evaluation, we now turn to a crucial step: &lt;strong&gt;capturing and documenting requirements in an agile way&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;Rather than relying on heavy processes, the goal here is to provide a &lt;strong&gt;lightweight, structured approach&lt;/strong&gt; that ensures all the fundamentals we've covered are synthesized in a &lt;strong&gt;structured methodology: the Generative AI Project Canvas&lt;/strong&gt;.  &lt;/p&gt;




&lt;h1&gt;
  
  
  AI Project Planning
&lt;/h1&gt;

&lt;p&gt;When managing and planning an AI project, it is essential to prepare key questions for the user in advance and clearly define the scope of the work.&lt;br&gt;
In this context, I decided to develop a specific canvas for generative AI projects, inspired by tools such as the Business Model Canvas (used to design business models) and the ML Canvas (focused on the planning of machine learning projects). As in those methodologies, the objective is to offer a visual and structured representation that facilitates the organization of ideas and the definition of priorities.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzctn9ktzkyvts48f75o7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzctn9ktzkyvts48f75o7.png" alt=" " width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Canvas Structure
&lt;/h1&gt;

&lt;p&gt;The canvas is organized into four main blocks:&lt;br&gt;
Define Value (Why?)&lt;/p&gt;

&lt;p&gt;At the center, the purpose of the project is established. Not everything that can be automated should be this block allows identifying which problems are truly priority and high impact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build (What?)
&lt;/h2&gt;

&lt;p&gt;Once the purpose is defined, what to build is determined: requirements, limits, and scope of the solution. Here data dependencies are considered (what information is needed, how it is obtained, and under what conditions it can be used) along with the costs and the required investment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deliver (How?)
&lt;/h2&gt;

&lt;p&gt;This block answers how the solution will be implemented. It includes the selection of the appropriate model, the deployment architecture, and the integration with other systems. However, Deliver is not limited to the technical implementation, but also incorporates the definition of evaluation metrics that allow measuring, correcting, and continuously improving the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Validate
&lt;/h2&gt;

&lt;p&gt;Finally, the project must be constantly validated. This block covers the comparison of the results with the initial objectives, the monitoring of risks, and the verification that the solution is safe, reliable, and sustainable over time.&lt;/p&gt;




&lt;h1&gt;
  
  
  From Concept to Canvas: Structuring AI Projects
&lt;/h1&gt;

&lt;p&gt;Below you can see that each section of the Generative AI Project Canvas is accompanied by a series of questions and subtopics that help guide the definition of the project. These guides allow going deeper into the key aspects, from the purpose and value to the data strategy, implementation, and risks, in a structured way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1q8p23f494uowirjpus.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1q8p23f494uowirjpus.png" alt=" " width="800" height="466"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  We must mainly answer
&lt;/h3&gt;

&lt;h4&gt;
  
  
  🎯 Define Value (Why?)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Value Proposition:&lt;/strong&gt; What unique value are we creating with this solution?&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  🛠️ Build (What?)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Output:&lt;/strong&gt; What will the AI generate as the final deliverable?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Strategy:&lt;/strong&gt; How will we source, prepare, and update the data?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Costs &amp;amp; ROI:&lt;/strong&gt; What will it cost and what return will it bring?&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  ⚙️ Deliver (How?)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Implementation:&lt;/strong&gt; How will we deploy, integrate, and maintain it?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Approach:&lt;/strong&gt; Which model will we use and how will we adapt it?&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  ✅ Validate
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation Metrics:&lt;/strong&gt; How will we evaluate quality and success?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risks &amp;amp; Monitoring:&lt;/strong&gt; What risks exist and how will we mitigate them?&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 You can access the complete template in its editable version on &lt;a href="https://docs.google.com/presentation/d/1yCDRXrgdTkHtH_Wd1HeyyKyn98bBcW7nbcresRFvzao/edit?usp=sharing" rel="noopener noreferrer"&gt;Google Presentation&lt;/a&gt; to reuse it directly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;With this canvas, you now have a complete toolkit: from prompt design to RAG architectures, evaluation, and project planning. The GenAI Foundations series is a starting point to explore, adapt, and responsibly scale generative AI in your domain.&lt;/p&gt;




&lt;h1&gt;
  
  
  📚 References
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;Academy OpenAI. (2025, febrero 13). Advanced prompt engineering. &lt;a href="https://academy.openai.com/home/videos/advanced-prompt-engineering-2025-02-13" rel="noopener noreferrer"&gt;https://academy.openai.com/home/videos/advanced-prompt-engineering-2025-02-13&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Anthropic. (s.f.). Creating message batches. Anthropic Documentation. &lt;a href="https://docs.anthropic.com/en/api/creating-message-batches" rel="noopener noreferrer"&gt;https://docs.anthropic.com/en/api/creating-message-batches&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AWS. (s.f.). ¿Qué son los modelos fundacionales?. &lt;a href="https://aws.amazon.com/es/what-is/foundation-models/" rel="noopener noreferrer"&gt;https://aws.amazon.com/es/what-is/foundation-models/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AWS. (s.f.). ¿Qué es Retrieval-Augmented Generation (RAG)?. &lt;a href="https://aws.amazon.com/es/what-is/retrieval-augmented-generation/" rel="noopener noreferrer"&gt;https://aws.amazon.com/es/what-is/retrieval-augmented-generation/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Cloud Skills Boost. (s.f.). Introduction to generative AI. Google Cloud. &lt;a href="https://www.cloudskillsboost.google/course_templates/536" rel="noopener noreferrer"&gt;https://www.cloudskillsboost.google/course_templates/536&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Google Developers. (s.f.). Ingeniería de instrucciones para la IA generativa &lt;a href="https://developers.google.com/machine-learning/resources/prompt-eng?hl=es-419" rel="noopener noreferrer"&gt;https://developers.google.com/machine-learning/resources/prompt-eng?hl=es-419&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Google Developers. (s.f.). Información general: ¿Qué es un modelo generativo? &lt;a href="https://developers.google.com/machine-learning/gan/generative?hl=es-419" rel="noopener noreferrer"&gt;https://developers.google.com/machine-learning/gan/generative?hl=es-419&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;IBM. (s.f.). What is LLM Temperature?. &lt;a href="https://www.ibm.com/think/topics/llm-temperature" rel="noopener noreferrer"&gt;https://www.ibm.com/think/topics/llm-temperature&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;IBM. (s.f.). ¿Qué es el prompt engineering ? &lt;a href="https://www.ibm.com/es-es/think/topics/prompt-engineering" rel="noopener noreferrer"&gt;https://www.ibm.com/es-es/think/topics/prompt-engineering&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;IBM. (s.f.). AI hallucinations.
&lt;a href="https://www.ibm.com/es-es/think/topics/ai-hallucinations" rel="noopener noreferrer"&gt;https://www.ibm.com/es-es/think/topics/ai-hallucinations&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Luke Salamone. (s.f.). What is temperature?. &lt;a href="https://blog.lukesalamone.com/posts/what-is-temperature/" rel="noopener noreferrer"&gt;https://blog.lukesalamone.com/posts/what-is-temperature/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;McKinsey &amp;amp; Company. (2024-04-02). What is generative AI?&lt;a href="https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-generative-ai" rel="noopener noreferrer"&gt;https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-generative-ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;New York Times. (2025-05-08). La IA es cada vez más potente, pero sus alucinaciones son cada vez peores &lt;a href="https://www.nytimes.com/es/2025/05/08/espanol/negocios/ia-errores-alucionaciones-chatbot.html" rel="noopener noreferrer"&gt;https://www.nytimes.com/es/2025/05/08/espanol/negocios/ia-errores-alucionaciones-chatbot.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Prompt Engineering. (2024-04-06). Complete Guide to Prompt Engineering with Temperature and Top-p &lt;a href="https://promptengineering.org/prompt-engineering-with-temperature-and-top-p/" rel="noopener noreferrer"&gt;https://promptengineering.org/prompt-engineering-with-temperature-and-top-p/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Prompting Guide. (s.f.). ReAct prompting. &lt;a href="https://www.promptingguide.ai/techniques/react" rel="noopener noreferrer"&gt;https://www.promptingguide.ai/techniques/react&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Prompting Guide. (s.f.). Consistency prompting. &lt;a href="https://www.promptingguide.ai/techniques/consistency" rel="noopener noreferrer"&gt;https://www.promptingguide.ai/techniques/consistency&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Learn Prompting. (2024-09-27). Self-Calibration Prompting &lt;a href="https://learnprompting.org/docs/advanced/self_criticism/self_calibration" rel="noopener noreferrer"&gt;https://learnprompting.org/docs/advanced/self_criticism/self_calibration&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;AI Prompt Theory. (2026-07-08). Temperature and Top p: Controlling Creativity and Predictability &lt;a href="https://aiprompttheory.com/temperature-and-top-p-controlling-creativity-and-predictability/?utm_source=chatgpt.com" rel="noopener noreferrer"&gt;https://aiprompttheory.com/temperature-and-top-p-controlling-creativity-and-predictability/?utm_source=chatgpt.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Vellum. (s.f.). How to use JSON Mode &lt;a href="https://www.vellum.ai/llm-parameters/json-mode?utm_source=www.vellum.ai&amp;amp;utm_medium=referral" rel="noopener noreferrer"&gt;https://www.vellum.ai/llm-parameters/json-mode?utm_source=www.vellum.ai&amp;amp;utm_medium=referral&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;OpenAI. (2025-08). What are tokens and how to count them?. &lt;a href="https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them" rel="noopener noreferrer"&gt;https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Milvus.(s.f.) What are benchmark datasets in machine learning, and where can I find them?. &lt;a href="https://milvus.io/ai-quick-reference/what-are-benchmark-datasets-in-machine-learning-and-where-can-i-find-them" rel="noopener noreferrer"&gt;https://milvus.io/ai-quick-reference/what-are-benchmark-datasets-in-machine-learning-and-where-can-i-find-them&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>data</category>
      <category>rag</category>
      <category>agile</category>
    </item>
  </channel>
</rss>
