DEV Community

Cover image for Accelerating into AI: Lessons from AWS
Brad Micklea for KitOps

Posted on • Originally published at jozu.com

Accelerating into AI: Lessons from AWS

One of the hallmarks of the best businesses is that they move fast, and consistently get the long-term strategy right. While running Amazon API Gateway, I was struck by the interplay between two of their leadership principles:

  • Bias For Action
  • Right, A Lot

To radically oversimplify: “when in doubt, start doing...but remember that huge impact come from getting the long term bets right.”

AWS made the early big bet that utility computing in the cloud would change the face of software and IT. They moved fast, but never lost sight of that goal and were rewarded for it. Having spent time with the people who were there in the early days, that path was hard and uncertain because the easiest answers rarely worked for what they were building - something which is always true for big changes.

I’ve been reminded of this while speaking to people about how AI is being adopted in enterprises.

The Enterprise AI Divide

Today there is a sharp divide - many organizations are taking a wait-and-see approach, but a few enterprises are building out internal AI/ML development teams that will train, tune, and manage AI models and agents tailored to their business and customers.

These companies are making a long-term bet that AI will be a market changer and that those with in-house AI skills will win. Looking at previous seismic shifts like the internet and mobility, they’re likely right.
For now, though, the road they’re walking is difficult. They’re struggling to:

  • Choose amongst 1,000 MLOps tools that have no standards, never work with each other, and are periodically abandoned
  • Find and hiring strong people in not-yet-well-defined job areas like AI research, data science, ML engineering, and MLOps
  • Establish processes to keep AI projects moving quickly and safely, without compromising enterprise data privacy and compliance regulations

The simpler route is to use public LLMs from OpenAI, Google, Mistral, or Perplexity and simply avoiding any use cases that touch sensitive data. That’s a reasonable place to start, but it won’t be where the best companies end up, because the greatest customer impact comes from using the deepest data.

Too Many Tools!

There are a host of reasons why companies are struggling to move AI projects out of the pilot phase, from a lack of strategic clarity, to worries about hallucinations, to a lack of tooling and talent. But according to a recent McKinsey study “too many tools” was the biggest reason.

Focus on tools with open standards; protect yourself from vendor changes.

In many cases each group, team, or division has selected a set of tools, an LLM, and started on prototypes. This is a fine way to start (after all, n solutions are better than 0 solutions), but to protect the organization there should be a focus on standards and flexibility. The hard truth is that many AI/ML tool vendors will disappear along with their tools. Focusing on standards is great protection. Unfortunately today’s crop of MLOps tools have chosen to focus on proprietary options over standards. Instead, look for solutions that let you leverage the standards in other parts of your software development chain: Terraform for infrastructure-as-code, AI project storage in your OCI registry, and open source LLMs are a good start. There are other open source projects that might help too:

  • Pachyderm is an open source platform for managing datasets and the workflow around their cleaning and changes.
  • Feast is a feature store to help teams track changes during in-house model development.
  • CometML and mlMLFlow are popular development and experimentation tools, although some express concerns about their proprietary and weak data storage with its lack of tamper-proof guarantees.
  • KitOps lets you store all your AI project artifacts (models, datasets, parameters, code, and documentation) in a tamper-proof and versionable package you can store and share through your existing OCI / container registry. It can help protect you from costly migration work if you need to change part of your toolchain.
  • KubeFlow simplifies the task of deploying, running, and managing ML models on Kubernetes… which you’re probably already using.

While it’s more “exciting” to focus on getting a chatbot deployed and playing with it and your data, companies that focus on building a repeatable, fast, and safe workflow will be able to learn faster, deploy faster, and beat their competitors. A solid and enterprise-approved AI project development lifecycle and toolset should be the first big milestone in any company’s journey with AI.

What separates the best is the operational maturity around the AI project.

A Framework for Selecting AI Projects

Once you have a solid foundation, then the fight about which projects to focus on begins (I know, it began long ago…). They key is prioritizing them based on customer value and risk avoidance.

I’ve used the following framework to prioritize projects because it divides potential AI use cases into four quadrants based on their customer value and organizational risk.

A Framework for Prioritizing Enterprise AI Use Cases

A Framework for Prioritizing Enterprise AI Use Cases

The X-axis focuses on your customers and the amount of value they’d get from an AI-driven use case (which in a good business equates to an increase in value for your organization).

The Y-axis is about your business and the amount of risk (severity and likelihood) that would result from a failure or problem with the AI-driven solution.
The four quadrants are:

  • Now: These projects have high value but low risk so start work here. You can build MVPs using hosted LLMs’ APIs, but ultimately you want this to be handled in-house - so after the MVP is launched, use these projects as a testing ground for an in-house AI team. Can they train a model that works better, faster, and cheaper?
  • Next: This is where your differentiation and long-term value will be unlocked. However, the risk is high so you can’t offload this to a public LLM, it will need to be built and managed in-house. This isn’t where you start, but once an in-house team has proven themselves, it’s where they need to go next.
  • Later: This area is dangerous because it can be a distraction and resource suck. It looks appealing because it’s easy, but the customer value is low so unless you need to do something here to keep up with a competitor (and are actually losing deals because of it), then keep teams focused on the high value projects.
  • Stop: The risk / value equation is wrong. Don’t touch these unless that materially changes.

Over time use cases will shift as their value or risk profile changes. Re-classify projects every 6-12 months.

For mid-to-large enterprises, most use cases will be at the top two quadrants, because valuable data is almost always sensitive. Again, these are areas where you shouldn’t be outsourcing to a public LLM because of data privacy, but also because you will be helping your competitors by training a public model that they may be using on your unique data.

Setting Up an AI Product Team

Once you decide you need an internal AI team focused on your products, how do you make it happen?

Make sure the engineering and business milestones are aligned and realistic

Before starting, ensure you have clarity on goals and milestones (I prefer writing a document to slides, but different company cultures will dictate what is best).
Define the Mission and Impact: It's important to establish clear goals that directly map them to the organization's business problems. Include what things the group won’t do that people may expect. Details matter here, so this should include metrics and milestones with dates for accountability. If time will be needed for experimentation (it often is with AI projects) then be clear what the milestones of learning will be along the way and treat them like any other deadline.

  • Secure Executive Sponsorship: With a strong mission and set of objectives, gaining executive support shouldn’t be difficult. However, executives are busy so practice your “pitch” and keep it short. Executives learn by asking questions to treat your pitch as a movie trailer, not as the movie and give them time to question.
  • Create a Roadmap: It’s tempting to do this as part of the mission and impact, but in most cases, only a high level roadmap is needed at that point. Once you have executive sponsorship your mission may shift. This roadmap should go a level or two deeper but align to the same milestones you agreed to when getting sponsorship. Classifying opportunities based on the framework above keeps things simple for stakeholders and consistent with how your team will execute.
  • Emphasize Agile and Iterative Approaches: AI is changing at a rapid pace and no one knows the future. Don’t give an impossible sense of determinism to your future plans, instead focus on how the team will be built to adapt and iterate quickly (faster than competitors).

Creating compelling but realistic milestones can be tricky. Below is just an example, but shows how you can mix engineering milestones (we’re going to get faster, safer, and smoother over time so we can win long-term) and business milestones (we solved a customer problem faster than expected).

Be up-front that your roadmap will need to balance engineering maturity and business impact milestones

Choosing the Organization Structure

There are several ways to set up an AI team. The best teams, regardless of their specialization, do their best work when they are connected and focused on the customer benefit. That means that they need to be close to the “edge” where the customer is.

Be up-front that your roadmap will need to balance engineering maturity and business impact milestones
Be up-front that your roadmap will need to balance engineering maturity and business impact milestones

Don’t over-staff a central AI team, you need “doers” close to the customer.

Centralized Model

Centralized works for small organizations with a single product, where decisions are made centrally. In this case the organization is small enough that everyone in the product organization should know the customer and their use cases, including the AI team. They should be included in key meetings not only to report on their progress, but to learn about other teams’ successes and struggles that they might learn from.

Hub and Spoke Model

Hub and Spoke is best for larger organizations where there are multiple products and customers, and where decisions are made independently. In this model there’s a natural division of responsibility:

The Spokes sit in each business area and should feel (and be treated) as a core part of that team. Their priorities should be set by the business, not the hub because they are closest to the product and customer. They are also closest to the ground and should be able to provide valuable data and insights back to the hub. There isn’t a lot of ambiguity in this realm, making it good for new and experienced team members.

The Hub provides standards and tools that will elevate every team member in the spokes. Their primary customers are the spokes so they should listen to them and enable them. This job is hard because it’s tempting for this team to try and move more and more into standards and then push those standards down. Instead, they should listen for common problems across spokes and decide if a shared solution would benefit all. The hub is also responsible for creating a consistent career path and performance expectation. There is much more ambiguity and balance needed in the hub roles so it’s a better place for more senior and experienced team members, or for people who excel in solving ambiguous problems.

One challenge with this model is that it can get "center heavy," pulling resources from the spokes which have the greatest customer impact. To prevent this, I question situations where the number of people in the hub is >30% of the number of people in the spokes because it can indicate an imbalance.

Seizing the AI Opportunity

We are in the early stages of an AI revolution in the enterprise. Organizations that take this time to work through the challenges will be rewarded with a competitive advantage in the future. OpenAI and Google might get you started, but they won’t solve your AI problems for you - it’s time to take the reins yourself.

Top comments (0)