Previously, I covered what AI prototyping projects are, how to launch AI prototyping projects, and running AI prototyping projects. In this article we'll discuss managing risks in AI.
While communicating the vision of an AI project through an interactive demo is generally understood to be the main goal of an AI prototyping project, one of the key pieces of value is the ability of these projects to uncover and resolve risks.
With AI projects your organization may not know the things they don’t know. You don’t know what aspects of your data or prompts will prove insufficient to get the results you’re looking for. You don’t know how people will interact with your application or the specific areas in which your application may prove inaccurate.
If you’re working with new models or new technologies, the performance, reliability, formats, and specific behaviors of these systems may not be known to you in advance.
For example, I was part of an AI prototyping project team at Leading EDJE where we fed pictures of users to a model hosted on Azure for Azure OpenAI GPT-4 to remark on the user’s attire. While this project was successful, one of the things we discovered in testing was that our model frequently remarked on the user’s face being blurry or mysterious.
We were initially confused by this behavior as our source images were fine, but the more it kept happening the more we realized that there was a hidden layer between us and the model that blurred out faces in images before they even reached our model.
While this behavior had a specific purpose (ensuring that said models aren’t being abused to endanger users), it was unexpected to us and noticeably altered our agent’s behavior. While we were able to minimize its impact with careful prompting, our prototyping phase informed us that we’d need to use a different model for real-world usage if we wanted to be completely free of the issue.
In software engineering you rarely know all of the behaviors you may encounter, particularly with working with external APIs, services, and data sources. This is even more true with artificial intelligence as AI systems are complex and have many hidden characteristics.
Your mission in designing AI systems is to identify these unknown risks and either resolve them or create a plan for resolving them.
Some of these risks can be found through basic interactions while others will only appear once you get a wide variety of users using your system in earnest as an internal prototype. Your role as a member of a prototyping team is to challenge your system to see what it’s good at as well as what it’s bad at.
Additionally, some systems will perform adequately for a single user but come to a crawl with several concurrent users or sudden usage spikes. Determining the scalability characteristics of your application is an important activity that should be done before an application is deployed to production but is not necessarily critical to do within the first few days of prototyping.
Finally, many AI systems will be attacked by end users, including some internal users. Users may try to use prompt injection attacks or simply convince the AI system using logic to act in a certain manner. Keep in mind that your AI systems are real software systems and therefore are part of the surface an attacker might try to exploit when looking for ways of getting access to your data or systems.
I strongly recommend you spend some time “red teaming” your AI systems by trying to find ways of getting them to behave in unacceptable ways before you release a system. You want to be able to detect and deter these attacks on the system, but the importance of this activity varies based on the data your system has access to.
For example, a simple system to summarize information from your corporate web site has a different impact if compromised versus an AI orchestration system that is capable of retrieving data about sales and even inserting data into databases for subsequent processing.
Gaining Certainty with AI
We’ve now discussed the importance of identifying risk in AI systems, but let’s talk now about resolving those unexpected issues that come up during AI systems prototyping.
Getting Unstuck
When working with new systems and APIs it can be normal to get stuck by the issues your system encounters.
I remember a time when I was working on a RAG application prototype for a client of Leading EDJE and my application worked just fine on the initial request to an external API, but all subsequent interactions resulted in 400 Bad Request responses without any additional details about the request.
Since the first request was very similar to the second one (in fact it was made using the exact same code) I was confused and blocked by this issue for a number of hours as I performed troubleshooting.
Searching for the issue online didn’t yield any helpful results and what I was doing was cutting edge enough that there weren’t many help resources available at the time.
I ultimately decided to simplify my solution bit by bit until the application fully worked. In my scenario the culprit wound up being that I was explicitly giving the user and the AI agent a name in code and this name was going to the external API and confusing the API as a result. I removed the custom name code, changed the name displayed on the user interface instead, and restored the rest of my code and the application finally fully worked.
I suspect that the more you and your team conduct AI prototyping projects the more you’ll have stories like mine.
In general I’ve found the following approach helpful:
- Working in small batches of changes at a time and checking in code when it works
- Asking coworkers when I encounter unexpected problems
- Searching when our knowledge isn’t sufficient
- Retracing my steps to get back to a working piece of code
- Consulting external documentation and samples for additional insights
This list is suspiciously similar to the list of recommendations I used to give my students who were learning software engineering during my time as a teacher, but it turns out that these steps are remarkably effective at not just helping us with the basics but with the advanced.
Involve Experts
One of the dangers of doing something completely new to you is that you don’t know what you don’t know. There are a number of pitfalls with AI projects that must be learned - just as new programmers must learn to ensure database connections and file handles are closed, guard against SQL injection attacks, and deal with unavailable external services.
Machine learning and artificial intelligence are powerful tools, but the potential for inconsistent performance across different types of requests or types of users is real. What’s more, this inconsistency can manifest itself in terms of biased behavior that can be hard to detect if you’re not explicitly looking for it.
Your AI projects will be similar to software engineering projects in some respects, but in others they’ll have entirely new concerns to worry about, such as dealing with model drift and detecting and preventing certain types of attacks such as data poisoning attacks (in systems that are retrained periodically) or prompt injection attacks.
In this regard, it’s helpful to have someone on your team who has been through these problems before and can help you and your team identify the things they’ll need to watch out for with your projects - and how to successfully maintain AI projects that remain effective as time goes on and the world changes.
Targeted Spikes
When you’ve identified key risk areas that you feel could make or break your projects, sometimes the only way to resolve these risks is to explore them in a dedicated manner through a “spike”.
A spike is a targeted piece of work designed to explore the viability of an idea, approach, or service. Spikes can be used to determine areas that will be problematic for the organization going forward and will need additional engineering effort to resolve.
Just as an AI prototyping project is essentially a functional prototype of an AI product, a spike is a technical prototype of a new capability needed to serve that larger product.
An example spike would be a team that has never worked with a multimodal LLM before using that model for the first time, sending it both images and text content and dealing with its responses. The team knows this is possible and has seen documentation around it, however, their confidence in the service working reliably might not be high, or they may be worried about costs, latency, or the accuracy of the model with their data.
In a spike, you spend a fixed period of time looking at a targeted area of risk in an effort to resolve those areas of risk into either a problem-free service or a list of issues that need to be resolved to move forwards.
In the context of a short-lived AI prototyping project, a spike might be committing half a day for a single developer to investigate an area and learn all they can, then make and adjust plans based on what they found.
Larger projects and problems would likely require more time, but the key point to emphasize is that you’re not building a fully-functioning feature with all of its bells and whistles, you’re just looking to see what areas will require additional time investments.
In our next and final article in the series, we'll cover concluding AI prototyping projects.
Top comments (0)