Peter Goshulak

Posted on Mar 24, 2023

I built a website with GPT and it almost left me vulnerable to hackers

#webdev #ai #security

“Is my job safe?”

There’s been a lot of speculation in the recent weeks about AI “coming for our jobs”. Millions of jobs will be affected by AI in some manner. But what does “affected” actually mean? Are we all going to be replaced? Made more “productive”?

I tried using an AI to build a simple website, and the results were pretty great (except for that one giant security flaw it tried to introduce. Seriously! Read on!)

What did I build?

Remember (or have you heard the legends of) “Hot or Not”? It was a pretty … controversial … website from the early 2000s where users rated photos of each other on a scale of hotness.

Given the recent surge in AI image generators, I had the idea to build a website which lets you guess whether an image was AI-generated, or created by a human artist. What better to call the game than “Bot or Not”? How meta! Surely this won’t open any ethical can of worms!

Check out the finished result at bot-or-not.xyz , then come back and learn how it was made!

First, who am I? Why does my opinion matter?

I’ve been a front-end developer for about 6 years. I taught myself programming basics, attended a bootcamp to refine and “professionalize” my skillset, then started working for an agriculture-tech startup as their frontend developer. I’ve designed, built, and iterated several versions of our VueJS webapps, grew our team, and coached new junior developers.

Recently I’ve dipped my toes in Product Management, then Account Management roles, so my coding skills are covered in a healthy layer of dust, but it’s all still in there… somewhere?

All this is to say, I’ve seen app development from many angles and skillsets; I was curious to see where AI helps, and where AI is more of a hindrance. Basically, when AI inevitably comes for our jobs, does anyone come out on top?

Building the app

Getting started

So, I had the general idea in my head about what this website should do. Time to Product Manager it up.

Modifying the base prompt from @mortenjust’s thread, I opened with the following:

GPT4 did a pretty great job at getting the general gist of things.

Unsurprisingly, it included a very simple placeholder of fill-in-the-blank images. I asked it pull images from an external file, then tweak the randomizer algorithm to give a truly 50/50 randomization

Next, I asked it where to get these images from. It gave back a few examples, and after some manual digging I realized that DeviantArt’s “topics” would be a pretty easy way to distinguish AI-generated vs human-created artwork (as a proof-of-concept app, I deemed this Good Enough™️). I asked it to write the script:

I ran into some issues with getting the API to work, and GPT was great at troubleshooting this

I continued in this way - asking for incremental changes, copy+pasting into my code editor, and guiding GPT4 to the eventual result I wanted. So, how did it go?

The Good: where GPT is strong

GPT was pretty great at getting the general essence of what I was trying to do. It created the app skeleton, scaffolded the general game logic, and even did the majority of CSS given only the prompt Can you add a general pass of CSS that gives the entire page a futuristic, synthwave-esque feel?. This is huge, especially for a front-end developer who hates CSS.

Take a look at these two first-passes of the game logic, first by GPT3.5 then by GPT4:

If you’re new to code, first try to work out what’s going on! This image show’s GPT3.5’s initial attempt at the image randomizer. The game:

chooses a random image
waits for the user’s guess
randomly decides whether the image was AI or human-created
tells you whether your guess matches its random choice.

It’s close, and it runs, but a) it’s not correct (or fair), and b) it would’ve taken a while to troubleshoot just by playing the game and not reading the code. Let's see how GPT4 manages:

In contrast, this image show’s (part of) GPT4’s first attempt at a randomizer, following the same prompt. The game:

creates a list of images pre-labelled with “AI” or “human”
chooses a random image from this list
waits for the user’s guess
tells you whether your guess matches the image’s type.

This was a huge improvement, mostly because it showed that GPT4 understood the essence of the game, over GPT3.5’s patchwork of functional-but-illogical code. Awesome.

GPT4 was also great at troubleshooting relatively surface-level issues that can be encapsulated in an entire code snippet. It also gave great boilerplate scaffolding, and specific instructions (eg. How do I set up firebase hosting?). It feels like a supercharged, instantaneous, “white glove” version of our favourite coding tool (read: Ctrl+C and Ctrl+V from Stack Overflow).

The Bad: where GPT is weak

GPT4 had a hard time keeping variable names consistent, and remembering specific details from earlier. Sure, it understood that there was a “Bot” and a “Not” button, but over the course of one chat session it included code for <button id="bot">, <button id="bot-btn">, and <button id="botButton">. In a self-contained snippet, this is fine, but when other code elsewhere tries and fails to find “the button with ID bot", things break fast: both the code and your own patience.

Towards the end, I was mostly manually retyping GPT’s suggestions because it was 10x easier than doing the back-and-forth.

When we were making tweaks, quite often the solution was painfully hard to troubleshoot verbally (The “Again” button was off-screen → I initially thought the main image itself was too big → we kept shrinking it, but the “Again” button was still “below the fold”), but was very obvious as soon as you “looked under the hood” (ie. “GPT, you set the game’s min-height: 100vh but it should be max-height to keep the whole page from scrolling”). In some cases, it’s much faster to just look at the code yourself and ignore GPT entirely. Some cases.

Finally, GPT4 gets stuck in its own rabbit-holes. Many times I tried to fix something in one area of the code, only for it to balloon in size and complexity when it could be fixed very easily by refactoring a different part of the code.

For a very technical example, GPT4 used 41 lines of code to manually build a DOM tree in JS, but it didn’t realize it could declare the DOM in the HTML using only 9 lines. Because we were iterating on the logical interaction (JS) part of the code, it got stuck and ignored that it could be done much simpler elsewhere.

The fix? Prompting “Let’s back up”, then pointing it in a specific direction. Not easy if you don’t understand what’s going on.

The Ugly: where GPT tried to let the hackers in

GPT introduced a significant, common, high-risk security vulnerability, and I had to manually a) spot it, and b) specifically ask GPT for the fix.

Very simply, a Cross-Site Scripting, or XSS vulnerability happens when a website runs code (”Scripting”) from a different, unintended website (”Cross-Site”). What did this look like?

Imagine a hacker uploads an image to DeviantArt, which is then displayed on Bot or Not. Instead of specifying a normal image title like Robot drinking coffee, the hacker sets the title to Robot drinking coffee'}<script>alert("you've been hacked")</script>. When your browser tries to display this title, it is tricked into also running the additional code (in this case, a harmless popup saying “you’ve been hacked”). This code could be pretty malicious, typically stealing personal data and sending it back to the hacker.

I asked GPT to use a different method for displaying results, and it corrected its mistake easily.

The critical issues:

GPT suggested the vulnerable code in the first place
if I hadn’t noticed the vulnerability, and explicitly asked it to be fixed, it would still be vulnerable

I’m assuming most people using AI wouldn’t catch this type of vulnerability, and would trust the AI to generate safe code. Hopefully future AI versions will be more proactive.

Give me the TLDR

GPT4 was like a stereotypical teacher’s pet: it has the right answers, it’s really keen to do the hard work, but it still makes silly (or dangerous) mistakes and gets stuck in its own rabbit holes unless you pull it out

GPT4 still needs a “coach”: someone to maintain the overall vision, choose what details to focus on, identify what bugs matter, and (for now) give it some common sense. Think Tony Stark and Jarvis.

“So… is my job safe?”

It appears that for now*, AI will not completely** replace programmers.

* Who knows how fast this will grow. Probably pretty fast.

** AI will definitely replace part of your daily tasks. You will almost certainly need to embrace it as a very very powerful tool, but don’t trust it blindly… yet.

Centuries ago there was a job called a “human computer”. Their job was to literally do math all day, for example calculating the position of planets. Certainly, their jobs have been automated, changed, and eliminated by electronic computers, but largely it was the rote computation - the “hard” work - that was automated away. With the hard work taken care of, new jobs appeared which used the new tools, opening up new avenues for discovery and creativity.

A question you could ask yourself is therefore: what part of your day-to-day is similar to a “human computer”? What tasks are simply a means to an end?

What’s next for Bot or Not?

First, check out bot-or-not.xyz . Let me know what you think!

Functionally, I’d like to expand the scope a bit (local and online scorekeeping). Learning-wise, looks like I’ll need to ask GPT to help me design and create a back-end. There are new languages, frameworks, and skillsets to learn, and nuances to AI prompting to make things run smoother (and safer!). I’m also curious to see how other AIs (eg. Copilot X was just announced as I finished up this experiment) handle larger scope projects. Let me know what looks promising!

DEV Community