Thomas Betous for Doctolib

Posted on Jun 30

Cracking the code: How Copilot supercharged my last CTF and where it fell short

#cybersecurity #ai #productivity #learning

Over the years, I’ve always been drawn to riddles and brainteasers. It’s no surprise, then, that as a software engineer, I’ve always been interested in Capture The Flag (CTF) cybersecurity challenges. In these challenges, you need to find a solution (hack) to retrieve a secret string hidden somewhere. This could be in a website, social media, assembly code, images, or any medium that can conceal information. These challenges require a broad range of knowledge, particularly in computer science and software engineering, but also creativity and inventiveness. However, I never dared to try because, in my mind, CTFs were reserved for the elite: the seasoned hackers with skills far beyond my own.

Now, at 35, life is busier than ever. I recently became a father, and finding time for new pursuits feels almost impossible. But the itch to finally try a CTF remained. My main obstacles? A lack of time and the belief that I didn’t have enough knowledge.

That’s when generative AI, like ChatGPT, Gemini, Github Copilot sparked an idea: what if I tried hacking with AI as my sidekick? To my surprise, not only were CTF challenges more accessible than I’d imagined, but AI assistants also helped me save precious time by accelerating my learning and problem-solving. Thanks to this partnership, I completed the 404 CTF — one of France’s most famous competitions — and finished in a respectable 165th place out of more than 2,800 participants.

In this article, I’ll share how I used AI to supercharge my hacking journey, the tips I picked up along the way, and where the limits of AI became clear. As a disclaimer before diving in, while I used GitHub Copilot for this article, other technologies like Cursor, Claude, and similar tools are equally valid choices. You can even use several of them simultaneously, the end goal remains the same.

💬 Prompts I used the most and how I used them

Explain me the purpose of XXX?

GitHub Copilot is an excellent assistant that integrates seamlessly with VS Code. Microsoft’s plugin provides a powerful interface where you can query specific lines of code or use a dedicated console to focus Copilot’s attention on particular files or directories.

Since most CTF challenges begin with analyzing files, Copilot proved to be a perfect investigation tool. My typical workflow starts by asking Copilot to analyze the project, explain its purpose, and map out the function of each file. Then, I examine each file in detail, and whenever I encounter something unfamiliar, I ask Copilot to explain specific sections.

"Copilot, could you explain the purpose of this project? Could you describe the function of each file and directory?" — prompt example

For instance, one web security challenge included an nginx configuration file. While I don't regularly work with nginx configurations, by asking Copilot to explain specific instructions, I could quickly understand their purpose without leaving VS Code. This allowed me to learn just enough to efficiently identify potential attack vectors.

An important consideration is that files aren't always immediately readable by Copilot (or any AI assistant). Sometimes you need to be creative with file formatting. For example, when challenges provide binary files, Copilot can't interpret them directly. The solution is to export the binary's hexadecimal or assembly code to a text file. This conversion makes the content accessible to Copilot for analysis. While this approach is valuable for initial investigation, sometime specialized tools are necessary for deeper analysis. In my last example, after the initial examination with Copilot, I had to switch to BinaryNinja for a more thorough investigation of the binary file.

Another interesting example involves analyzing a pcapng file containing network captures. The challenge was to discover how a malicious user extracted a password from numerous network packets. While this file wasn't readable in VS Code and required Wireshark for viewing, it contained an overwhelming amount of data, including many irrelevant packets. To effectively analyze such data with an AI assistant, you first need to filter out the relevant packets (in my case, HTTP packets) through Wireshark and then export them to a more comprehensible format like JSON. It's essential to perform this initial filtering step, as otherwise, the AI assistant won't be able to analyze the data effectively due to the sheer volume of information.

Do you see any security issues or unusual elements?

After analyzing the structure of a challenge, there are times when I don't immediately spot relevant issues. In such scenarios, I ask Copilot to identify potential security vulnerabilities. Often, Copilot suggests multiple issues that provide a good foundation for starting to hack. This is where your skills as a hacker/developer become crucial, you need to evaluate and filter these suggestions, identifying which are viable and which aren't. You also need to find synergies and discover exploits that Copilot didn't explicitly mention. Frequently, challenges require exploiting multiple security vulnerabilities in combination rather than just a single issue.

"Copilot, do you see any security issue or unusual elements in route.py? Do you see any deprecated dependencies?" — prompt example

For example, in one web security challenge from the 404CTF, solving it required combining multiple vulnerabilities that Copilot helped identify such as issues in both the backend and proxy cache server. Out of the various potential problems Copilot pointed out, I had to analyze, select, and connect the relevant ones to reach the solution and extract the flag. This experience shows how AI can accelerate the investigative process, but still requires human intuition to piece everything together.

Finally, when Copilot identifies vulnerabilities, the method of exploitation isn't always obvious. You can continue the conversation with your AI assistant by asking how to exploit these vulnerabilities and requesting more detailed information. Don't hesitate to ask about specific tools or request step-by-step instructions for executing your hack. Those questions are always instructive and provides valuable learning opportunities.

"Copilot, what's a smuggle request and how can I do it ? Can you guide me step-by-step?" — prompt example

Write a Python script to accomplish XXX?

The analysis process generates numerous ideas. Generally, these ideas require some coding to create a proof of concept for a vulnerability. When you're unsure where to start and time is limited, asking an AI to write a small code snippet can be valuable. Copilot can adapt to your repository structure and generate files in the appropriate locations with the requested code. While not essential, this approach saves time.

One crucial point to notice is that you need to be precise with your requirements; otherwise, the generated code may not align with your needs. Don't hesitate to specify the programming language, preferred libraries, and your desired outcome. The goal isn't to obtain perfect code, but rather to get something you can easily customize to suit your specific needs.

"Copilot, can you write a Python script with pwntools that connects to [IP] and [PORT], receives two numbers, adds them, and sends back the result." — prompt example

🔄 It's an iterative process

This process is often iterative. Basically, after the global analysis, I enter a loop where I filter the data I have, analyze it, ask questions, inquire about security issues, and reflect on the results. Then, I reiterate again and again until I start to assemble pieces of the puzzle to solve the challenge.

💔 What's the drawback of using AI for a CTF

Using AI is exhausting

Using AI allows me to learn quickly, but it's exhausting. While you can access specific information and documentation faster than ever, it involves processing a lot of knowledge in a short time. You need to read extensively, maintain focus, and review every suggestion and generated code snippet. It's a different style of work. Counter-intuitively, working with an AI assistant actually consumes more energy than working alone. While one might think that having a computer do the thinking would require less effort, the reality is quite different.

Working without an AI assistant is a steady hike; working with one is a sprint - you move faster, but it takes a lot more energy.

AI == cheat?

A legitimate question might be: is using AI cheating in a CTF? For the easiest challenges, like the Intro or Easy levels in 404CTF, AI can sometimes hand you the solution on a silver platter. But as soon as you move up to Medium or harder, the problems get too complex for Copilot or any AI to just solve for you.

That's why I don't see AI as a cheat code. In reality, it's just one tool among many, like the ones real hackers use to break systems in the wild. Sometimes it helps, sometimes it doesn't, and it rarely cracks the tough challenges on its own. In cybersecurity, there are no shortcuts, just different methods and tools to reach your goal. And honestly, no hacker would refuse a useful tool just because it feels "unfair."

AI mostly raises the floor, not the ceiling. It helps beginners level up faster, but expert won't suddenly become stronger thanks to AI alone.

AI can be censored for certain types of questions

Sometimes I encountered limitations with Copilot, likely due to ethical considerations. When attempting to create attacks, Copilot sometimes refuses to provide assistance. As a workaround, I needed to specify in my initial prompt that I was working in a CTF context. However, even with this clarification, Copilot might still censor certain responses.

It's amusing to realize that you sometimes have to find creative ways to "hack" Copilot itself. It reminds me of Asimov's novels, where clever loopholes are found in the famous three laws of robotics. In a similar way, you need to think outside the box to get Copilot to do what you want.

AI can push you to do some unsecure actions

While asking Copilot how to exploit vulnerabilities, I sometimes received recommendations for tools whose safety I wasn't entirely confident about. When investigating these tools on GitHub, regardless of their quality, they often had few stars, indicating limited usage, and were frequently unmaintained with commits dating back years. This didn't inspire much confidence.

For example, when I was looking for a password generator based on social knowledge, Copilot recommended tools like pypasswords. While I won't judge this tool's quality, it may be good but pypasswords' last commit was 5 years ago and the project only had 1 star. Without looking in more details, it can be dangerous to install such a library.

That's why, as personal advice, I recommend not blindly trusting what your AI suggests when it comes to running commands or installing libraries. Always verify what you're about to execute.

AI can lead you down the wrong path

I think it's part of the experience, but I've lost count of the times Copilot has hallucinated issues where there were none, or assured me everything was fine when there was actually a subtle problem. I once spent hours trying to craft a smuggling request that turned out to be impossible - simply because Copilot suggested it might work. But as I mentioned earlier, this is a valuable lesson: don't blindly trust everything your AI suggests.

That said, I'd put this into perspective based on your goals with CTFs. If your aim is to learn new skills, sometimes following the wrong leads can be incredibly educational.

🚀 Beyond the CTF

Using AI to solve CTFs is fun, and I strongly encourage you to try it. You'll sharpen your AI skills for analysis and programming while learning extensively about specific domains. Personally, I focused mainly on web security since I specialize in web development. The knowledge I gained is valuable and helps me spot issues I might have missed before. It's all about honing your skills.

At Doctolib, we actively encourage software engineers to embrace AI, not to work less, but to free up time for higher-value tasks that truly matter. This philosophy mirrors exactly what I experienced during my CTF journey.

Just as I used AI to accelerate my learning and problem-solving in cybersecurity challenges, at Doctolib, we leverage tools like GitHub Copilot, Cursor, and Claude Code to streamline routine coding tasks, accelerate debugging, and enhance code exploration. The goal isn't to replace our expertise, but to augment it allowing us to focus on architectural decisions, complex problem-solving, and innovative solutions that improve healthcare for millions of patients.

Whether it's analyzing nginx configurations in a CTF challenge or refactoring critical healthcare infrastructure at Doctolib, AI serves as an amplifier of human intelligence. The key lesson from both contexts remains the same: AI is most powerful when it enhances human judgment, not when it replaces it.

Thank you for reading. If you have any questions or insights to share, feel free to reach out or leave a comment.

Special thanks to Justin Rabiller for reviewing this article and providing valuable insights.