DEV Community

Cover image for Social Butler: The Ultimate AI Toolkit for Social Media Creators
Abhi nandan
Abhi nandan

Posted on

Social Butler: The Ultimate AI Toolkit for Social Media Creators

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built Social Butler, a comprehensive AI-powered toolkit designed to streamline the creative workflow for social media managers, content creators, and marketers. The applet solves the time-consuming process of creating engaging social media assets by offering a suite of powerful, specialized tools in one cohesive interface.

Social Butler features three core modules:

  1. YouTube Thumbnail Generator: A highly customizable tool that goes beyond simple text-to-image. It allows users to define a theme, art style, lighting, framing, and specific text overlays to generate eye-catching thumbnails that drive clicks. Users can generate from a detailed prompt or upload their own base image for the AI to edit and enhance.
  2. Social Media Post Generator: This module crafts platform-specific copy (for LinkedIn and Instagram) based on a user's core idea and desired post type (e.g., promotional, educational). Crucially, it also generates a contextually relevant image to accompany the text, providing a complete, ready-to-publish content package.
  3. Background Remover: A straightforward yet essential utility that takes any user-uploaded image and intelligently removes the background, providing a clean PNG with a transparent background, perfect for layering in other designs.

Demo

How I Used Google AI Studio

This application is built entirely on the power of the Gemini API, which I prototyped and refined using Google AI Studio. The platform was essential for testing different multimodal prompting strategies to achieve the desired quality and control across all features.
I leveraged two key models:

  • gemini-2.5-flash: This model is the text-generation workhorse. It's used for the "meta-prompting" in the Thumbnail Generator, where it intelligently transforms simple user selections into a detailed, descriptive prompt for the image model. It is also used to generate the nuanced, platform-aware copy for the Social Media Post Generator.
  • gemini-2.5-flash-image-preview: This is the core of the app's visual capabilities. As a versatile multimodal model, it handles all image-related tasks:
  • Text-to-Image Generation for creating thumbnails and social media images from scratch.
  • Image-and-Text Editing for enhancing a user's uploaded base image in the Thumbnail Generator.
  • Mask-free Image Editing for the Background Remover, where it understands the instruction to isolate the subject without needing a specific mask.

Multimodal Features

Social Butler is fundamentally multimodal, integrating text and images as both inputs and outputs to create a seamless user experience.

  1. Combined Image and Text Input for Thumbnail Editing: The Thumbnail Generator's most powerful feature is its ability to take a user's uploaded image and a complex set of text-based instructions (theme, style, text to add, etc.) to produce a new, edited image. The model doesn't just overlay text; it reinterprets the entire image in the context of the user's request, creating a cohesive and professional final product. This enhances the user experience by giving them creative control far beyond a simple filter, allowing them to bring their precise vision to life.
  2. Text-to-Multimodal-Output for Social Posts: The Social Post Generator demonstrates a chained multimodal workflow. It starts with a text prompt from the user and first generates a text output (the post copy). It then intelligently creates a new text prompt derived from the post's content and context, which is fed to the image model to generate a perfectly matching visual. This creates immense value by packaging two distinct creative tasks into one click, ensuring the text and image are thematically aligned and saving the user significant time.
  3. Instruction-Based Image Editing: The Background Remover uses multimodal input in its simplest, most practical form: an image combined with a direct text instruction ("remove the background"). The model's ability to understand this command and perform a complex editing task without further user input (like manual masking) makes a tedious task trivial. This direct, instruction-based interaction makes the tool highly intuitive and efficient.

Top comments (0)