DEV Community

Suriya Kumar
Suriya Kumar

Posted on

AI Agent on Kaggle

✍️ Final Article Draft: English Translation
Title: Building an AI Agent on Kaggle: A Journey Through the Confounding Challenges of a Generative Agent

  1. Introduction: Stepping into the World of AI Agents When I first embarked on the AI Agents Intensive program, I realized that Agents were not just theoretical concepts but powerful frameworks capable of solving real-world problems. For our Capstone Project, we chose to build a Smart Content Generator Agent. Its primary objective was to reduce the time spent on creating compelling advertising copy for social media. This article will detail the challenges I faced in designing our Agent—such as structuring the prompt and dealing with performance issues—and the profound lessons I learned in guiding an Agent to produce accurate and high-quality content.
  2. Agent Architecture and Tools Our Agent's architecture is straightforward yet highly focused on the goal.
    • Agent’s Goal: To analyze the provided product features and desired tone, and instantly generate an engaging social media post.
    • The Engine (LLM): We utilized Google's Gemini-2.5-Flash as the core LLM. Its high speed and efficiency made it ideal for a short-duration hackathon project.
    • Tools: We made the conscious decision not to integrate any external tools (like Web Search or a Calculator). Since all necessary information for the ad copy was provided via the Prompt, avoiding tools significantly improved the Agent's Performance (Latency).
  3. Challenges and Solutions Making an AI Agent perform exactly as expected is the most critical part of development. Here are the key challenges I encountered and the solutions I implemented: | The Challenge | The Solution | |---|---| | Output Consistency: The Agent often failed to maintain the desired 'Humorous' or 'Formal' tone, instead defaulting to a generic voice. | System Prompt Refinement: I provided a much clearer System Prompt at the beginning, defining the Agent as a "High-level Advertising Consultant." | | Input Control: The Agent sometimes omitted critical input, such as the product name or the key features, in the final ad copy. | Forced Formatting: I strictly mandated in the prompt that "the output must contain the product name and all three specified features." | | Performance (Latency): Initial tests using larger models resulted in wait times exceeding 15 seconds per output. | Model Selection: Choosing Gemini-2.5-Flash, which strikes the perfect balance between speed and quality, reduced the time-to-output to under 5 seconds. |
  4. Future Vision Building on the success of this prototype, I plan to enhance the Agent further in the future:
    • Multimodality: Currently, the Agent only generates text. In the future, I intend to integrate its ability to generate corresponding images using models like Gemini 2.5 Pro, enabling it to create complete text-and-image social media posts simultaneously.
    • Memory: I plan to add a Session Memory feature. This will allow the Agent to recall Brand Guidelines after learning them once, ensuring high Consistency in tone and style across multiple outputs.
    • Self-Improvement: Finally, I plan to develop a Feedback Loop to automatically adjust the Agent’s prompt based on user feedback, enabling the Agent to continuously improve its own output quality.
  5. Conclusion and Acknowledgements The AI Agents Intensive Capstone Project was not just a competition; it was an invaluable lesson in understanding and deploying Generative AI Agents. Despite the challenges, we successfully harnessed the power of Gemini to create an efficient and accurate AI Copywriter. This experience has given me great confidence to build larger, more complex multi-tool Agents in the future. Finally, I extend my heartfelt gratitude to the Kaggle Team and the Google Mentors for their guidance during this 5-day Intensive. This experience marks a significant turning point in my AI journey.

Top comments (0)