Akash Raidas

Posted on Aug 20

Day 3 of Experimenting with Open Source AI

#ai #github #learning #opensource

Remember Day 2's terminal disaster? Well, I'm happy to report that Day 3 went significantly better! I've made some real headway on my infographics generator project, though I've also discovered a new challenge that's keeping me on my toes.

The Wins: API Setup & Tool Decisions

API Victory!
I successfully generated my API key and got it working. Instead of using Claude API as originally planned, I went with Gemini API for now. Why the switch?

The inspiration actually came from Gemini's own features - they have "Deep Research" and "Canvas" capabilities that work beautifully together: first you perform comprehensive research on a topic, then you can create visual presentations in their canvas interface. I thought, "What if we could bring this workflow to Weam AI?" So I'm using Gemini API to recreate this powerful research-to-infographic pipeline for the open source community.

IDE Switch: Welcome Cursor!
I also made the jump from Trae IDE to Cursor IDE, and wow - what a difference! The AI integration feels much more natural, and the code indexing capabilities we discussed in Day 2 are working like magic.

Quick refresher: Code indexing creates a searchable database of your entire codebase, helping both you and AI assistants understand how all your code connects together.

The New Challenge: Folder Structure Integration Hell

Here's where things get interesting (and by interesting, I mean slightly panic-inducing). I have two folder structures that need to become one:

My infographics generator solution - nicely organized, clean structure
Weam AI's existing codebase - established patterns, existing architecture

The question is: How do you merge these without breaking everything or creating a maintenance nightmare?

Understanding the Key Terms

Before diving deeper, let me explain some terms that might be confusing:

Weam AI: This is the open source AI platform I'm building on top of. Think of it as a foundation that already has user management, AI integrations, and core functionality - I'm adding my infographics generator as a new feature.

API (Application Programming Interface): This is basically how different software components talk to each other. When I say "Gemini API," I mean Google's Gemini AI service that my app can send requests to (like "generate content for an infographic about climate change") and get structured responses back.

Folder Structure/Architecture: This refers to how you organize your code files and folders. Good structure makes code maintainable; bad structure makes future development a nightmare. It's like organizing your house - you want related things grouped together logically.

Codebase: The complete collection of source code for a software project. Think of it as all the code files that make up an application.

The Folder Structure Dilemma Explained

Imagine you've built a beautiful LEGO castle (your infographics generator), and now you want to add it to an existing LEGO city (Weam AI). You have a few options:

Plop it down as-is - Quick but might not fit the city's style
Take it apart and rebuild it to match the city - Time-consuming but cohesive
Find a middle ground - Adapt parts while keeping the essence

That's essentially what I'm facing with code.

The Current Situation

My Infographics Generator Structure:

infographics-generator/
├── src/
│   ├── components/
│   ├── api/
│   ├── utils/
│   └── styles/
├── public/
└── docs/

Weam AI's Structure:

weam-ai/
├── apps/
├── packages/
├── libs/
├── tools/
└── docs/

See the problem? These are completely different organizational philosophies!

What I've Learned So Far

Weam AI Is Way More Sophisticated Than I Expected
I thought I was adding a feature to a simple AI chat app. Turns out, I'm integrating into a production-ready platform that handles multi-workspace environments, role-based access control, document processing pipelines, and ready-to-deploy automation workflows. The bar is much higher!

Find more about the platform on Github.

There Are Multiple Integration Paths
After studying their architecture, I can see several ways to approach this:

As an "AI App Solution" (like their existing QA Agent, Video Analyzer)
As a specialized Agent with custom knowledge base
As a Brain extension with infographic capabilities
As a standalone service that plugs into their platform

Code Organization Matters More Than I Thought
With teams of 20+ members potentially using this, I can't just "make it work" - it needs to follow their patterns, be maintainable, secure, and scalable. This isn't a weekend hack project anymore.

The Next Challenge: Prompt Engineering

With the API ready and Cursor set up, I need to work on prompt engineering - essentially, crafting the perfect instructions to tell Gemini AI exactly how to generate infographic content. This is trickier than it sounds because:

The prompt needs to be specific enough to get consistent results
But flexible enough to work with different topics
And structured enough to create usable data for my HTML templates

Day 4 Preview: The Integration Strategy

Tomorrow, I'll be diving deep into the integration challenge. Based on what I've learned about Weam's architecture, I need to decide between:

Option 1: AI App Solution Route
Build it like their existing automation workflows (QA Agent, Video Analyzer, SEO Content Writer) - complete with specialized APIs and ready-to-use functionality.

Option 2: Agent-Based Approach

Create it as a specialized agent that users can deploy with "@infographics" in any chat, leveraging their existing agent framework.

Option 3: Brain Enhancement
Integrate infographic creation directly into their Brain system, so any team workspace can generate visual content.

Each approach has different technical requirements, user experiences, and maintenance implications.

Questions for the Community

If you've dealt with integrating a new feature into an existing codebase:

What approach worked best for you?
Any horror stories about folder structure decisions you regret?
How do you balance "doing it right" with "getting it done"?

Progress Update: 15% - API ready, tools selected, problem identified. The foundation is solid; now it's time to build!

Stay tuned for Day 4, where we either achieve integration harmony or create a beautiful disaster. Either way, it'll be educational! 😄

Following along with this chaotic learning adventure? Drop a comment with your own integration challenges - misery loves company, and solutions love sharing!

DEV Community