All of us techies, have experimented with Large language models (LLMs) like GPT at some shape or form, and the promise is it will help businesses work smarter and more efficiently. While there's been plenty of experimentation, we're now at an exciting point where these AI applications and tools, could really become integrated into companies' core operations in a scalable, reliable way.
But to make that happen, easier said than done, we can't just drop an LLM in the middle of our systems, as some companies sell that dream. There is a need for an ecosystem, that assures the longevity and scalable growth of AI into your enterprise. Like others, I have been experimenting and finding the best places where LLMs and LLM applications can have impact, and this is part of a series talking about the components of what I think the near future AI platform would look like.
What is the next step for an AI Platform
- HITL - Human in The Loop Subsystem
- Feedback Subsystem
- Evaluation Subsystem
- Decision Executor Subsystem
HITL - Human In The Loop as a subsystem
LLMs can't always make perfect decisions, especially when first deployed, a lot of fine tuning, prompt engineering and context testing is needed. Humans in the loop, are essential, to review and approve/reject decisions the LLMs are unsure about.
Over time, as the system proves itself, more decisions can be fully automated. But that initial human oversight builds trust into your AI platform.
For any LLM app to function effectively within a business context, it requires this subsystem, that can handle decision-making with varying levels of confidence.
Low-confidence decisions should be routed to this subsystem, where a human can evaluate and approve or decline the decisions. This ensures that while LLM applications learn and improve, human oversight maintains decision accuracy and reliability.
Even high-confidence decisions, may initially require human auditing, until the business is confident in the LLM app's performance. This process builds trust and allows for the gradual transition of decision-making responsibilities.
Even high-confidence decisions, may initially require human auditing, until the business is confident in the LLM app's performance. This process builds trust and allows for the gradual transition of decision-making responsibilities.
Feedback as a Subsystem
False positives, i.e. The LLM incorrectly flags a legitimate transaction as fraud.
False negatives, i.e. The LLM fails to flag a fraudulent transaction.
Humans involved in the decision-making process, or advanced LLMs auditing a process, must provide actionable feedback, whether through simple upvotes or downvotes, textual feedback, or discrepancies between human and LLM decisions.
This feedback helps identify whether issues stem from input quality, inference problems, or contextual misunderstandings, allowing for targeted improvements.
Evaluation as a Subsystem
LLMs like any other model, drift over time, and system prompts could change in way that will effect your LLM Apps from functioning as expected. At this stage, you have a good amount of data from your LLM Apps, Feedback from your customers, HITL system and the decisions made to take it to the next level.
Evaluation is a key subsystem in our platform, because many of the outputs of our Customers, System prompts, HITL and Feedback will pour into it as parameters to weigh in the efficiency of our LLM Apps, the components of an evaluation subsystem can vary, you can start simple, by introducing metrics to measure the quality of your prompts, context and output, and then for each LLM app, this can grow in different directions, you might even have custom built models to evaluate certain scenarios and applications.
With all that feedback collected, you can't just let it sit there. It has to flow back into actually retraining and fine-tuning the LLM model itself through reinforcement learning techniques. This closes the loop so the AI keeps getting smarter.
Decision Executor as a subsystem
Once an LLM makes a decision (issuing a refund, flagging fraud, etc.), you need a component to implement that decision across your company's systems.
It's the final piece in the path to production ecosystem, the action-taker subsystem, responsible for executing decisions across your business.
Acting as a proxy to integrate seamlessly with existing systems. Whether issuing refunds, canceling orders, or reporting fraud, the action-taker ensures that decisions are implemented efficiently and accurately.
So far, I see this ecosystem being built as a separate platform that integrates with but is distinct from your core business systems. That way you get modularity and observability, as LLM apps move fast and change frequently.
Putting it all together
Integrating this platform into existing business infrastructure requires a standalone architecture that encapsulates LLM apps and their supporting systems. Communication with others can be managed through event-driven or request-response patterns, with robust observability to monitor and maintain the platform's stability.
I want to emphasize that such ecosystem can grow with time, we are still finding and figuring out how LLMs will have the biggest impact to our businesses.
Ethical Considerations
Ethical considerations also come into play, necessitating moderation agents and policies to ensure responsible usage of LLMs.
There are certainly challenges around trust, and responsible AI obstacles to overcome. But with the right supporting infrastructure, LLMs could soon graduate from experiments and SaaS to become powerful Enterprise AI assistants woven into your day-to-day operations. Exciting times!
Top comments (0)