DEV Community: Florian

Cloudflare AI Challenge Submission: Image to Lyrics Generator 🎨🎵

Florian — Wed, 10 Apr 2024 15:28:31 +0000

This is a submission for the Cloudflare AI Challenge.

What I Built

You're probably familiar with the phrases "an image tells a thousand words" and "music is a way to express yourself". So I combined the two core statements of those phrases and came up with ImageHarmoni an AI tool using Cloudflare Worker AI with the models Llama and Mistral to generate lyrics for a user selected genre based on a user given image.

Whisper model is also used but only for some additional fun. 🤫

Demo

Live Link

GIF is speed up since generation takes around 30 seconds

My Code

GitHub Link

Used Tools, Components and inspiration

Dark Mode Toggle from Justin Schroeder - CodePen
Color Themes for light and dark mode - Realtime Colors
Inspiration for moving emojis in background - CSS glassmorphism generator

Journey

Since I never worked with any AI models in the past and the Cloudflare environment was also completely new to me, I searched for some beginner guidance and luckily stumbled upon this amazing Cloudflare Workers tutorial on YouTube.

During the tutorial, the difference between the user and system role gets explained and how you can use them. So basically with the system content you can tell the model how it should respond to the user request and this was the point where I came up with my project idea because my genre selection uses exactly this behavior.

For example, if you selected the classic music genre, you tell the llama-2 model that it is now Wolfgang Amadeus Mozart and his task is to create a classical song. I also used three adjective which fit the most with the selected genre to get a somewhat better result for the lyrics.

Now we only needed a fitting user request to generate lyrics with, but since a simple input directly from the user would be a bit boring, I looked through the available models for the challenged and saw that there we also two which could scan images for their content. So I had a complete concept for my tool.

I tested a bit with the resnet-50 model and mistral and ended up using mistral because it gave me for my case the better description of the provided images. Now, I could have taken the short route and directly gave the image description to llama-2 to generate the lyrics, but the results were, to put it gently, improvable.

Therefore, I used llama-2 a second time but this time with the task to extract all persons, objects, emotions, moods ... stuff you need for lyrics from the given image description. And with this extracted keywords I feed the llama-2 model which is responsible for the lyric generation itself.

And then we have the story with the use of the whisper model. So if you open ImageHarmonie the UI experts under, you probably wonder why I choose a 3 by 3 grid for the genre selection if you can only select 8 genres. But if the night breaks in (toggle dark mode) and you look closely you can see a very cool vegetable which ask you something about fruits but to answer him you have to talk. Based on your response, the 3 by 3 grid in the genre selection actually makes sense.

When you talk with the vegetable, I use the whisper model to scan the user input. First I only scan directly for the needed code word if it is in the user input or not but if the users say something like "no code word" it would still trigger with scanning only for the presence of the code word. So it was once again time for llama-2 to shine. This time, the task was to determine whether the user input was pro or contra code word. (Unfortunately, the determination whether pro or contra is not optimal, so I have left the evaluation as a conlo.log to understand how llama-2 has decided)

And while waiting for the used neurons to reset (since testing if the lyrics were somewhat okish took quite a few requests) I styled the site a bit with good old CSS and to also use the tool on the go I made it responsive with media queries. But since I took no time to further look into pages or how to use the workers with a framework like angular because the deadline was quite short the complete code is basically in on file so sorry to all clean code enthusiasts, I vow to do better in the future.

Multiple Models and/or Triple Task Types

llama-2:

lyric generation bases on selected genre
extracting keywords from image description
check whether input is pro or contra code word

mistral:

generating image description

whisper:

listening to user for code word input

Glam Up My Markup: Camp bunny hop 🐰

Florian — Mon, 25 Mar 2024 20:00:00 +0000

This is a submission for DEV Challenge v24.03.20, Glam Up My Markup: Camp Activities

What I Built

I created a colorful form for the camp activities, which should remind you of a wooden bulletin board, which you can also find in summer camps. And since Easter is just around the corner and cute bunnies make everything better, I used bunnies to visualize the selected activity

Demo

Journey

When looking at the HTML code, I noticed that there were no image tags, which is why I worked exclusively with background images to achieve my goal.

Depending on the selected activity, a suitable image is displayed in the h1. To recognize the image and the text in the h1 I use a combination of padding and flexbox to place the text at the bottom center of the image but not too far down.

I added the required attribute to the selection via JavaScript and set the first option to disabled and selected so that it only behaves like a placeholder for the selection.

I then adjusted the colors of the remaining elements such as the textarea or the button to match the images.
And when submitting the form, you receive feedback via an alert that the information has been successfully submitted.

All images were generated with AI. And unfortunately the end product is a bit too high in my opinion, which could be adjusted with better fitting images or adjustments to the HTML itself, for example.

The Frontend Challenge: 🍌 bananas are the superior fruit

Florian — Mon, 25 Mar 2024 17:00:00 +0000

This is a submission for DEV Challenge v24.03.20, CSS Art: Favorite Snack.

Inspiration

I know apples have many great qualities, but if we are honest, bananas are simply the better apples. Just the fact that the code of bananas has been optimized by humans in a way that they come with a practical packaging and, unlike apples, you don't have an inedible inside part. That's why bananas are the perfect snack for me on the go, whether in the morning or in the evening, wonderful before sport or as an ingredient in a smoothie or muesli.

Demo

Journey

Since I'm not artistically talented and the banana is unfortunately crooked I couldn't just use a grid to draw but thanks to my past as a Minecraft player I knew that you can create great artwork with just blocks or in the case of CSS pixels. But since creating pixel art is not one of my strengths either, I took a template and pimped it up a bit with a few faces

Then I wanted to just go ahead and create countless divs with different background colors in a grid to finally create a banana. But with the thought that the banana is about 30 by 50 pixels in size that would be 1500 divs, and so I rather went on the search to make the whole thing a bit clearer and come across this great article, which creates pixel art in CSS through box-shadows.

To make the whole thing a bit more interactive I use the pseudo elements hover and active to peel the banana. Of course an animation by using keyframes could not be missing to draw the attention to the banana for more than 3 seconds.