DEV Community

Luis Farzati
Luis Farzati

Posted on

From APIs to ACIs: The Next Evolution in Software Interaction

Introduction

In the ever-evolving landscape of software development, we've seen significant shifts in how applications communicate and interact. From the early days of monolithic architectures to the rise of microservices, each evolution aimed to make software more efficient and user-friendly. Yet, as technology advances, so does the complexity of interactions.

This is where Application Conversational Interfaces (ACIs) come into play, and I'd like to share some thoughts of a new paradigm that could reduce cognitive load and make software interaction more intuitive than ever before.

Large Language Models are transforming how we interact with technology. We're entering an era where communicating with machines is as natural as conversing with a colleague. Users can now express their needs in everyday language, and systems respond accordingly. And this is the foundation of ACIs.

This proposal is not just a technology upgrade – it's a paradigm shift that also raises profound questions: how will this reshape the experience for users, developers and AI agents?

The dissolving interface

In an era where LLMs can respond with text, structured formats, images, voice, or even code generating dynamic visual interfaces, we must ask ourselves: how much longer will we continue building layers between us and the software? What if we stop developing traditional user interfaces and application programming interfaces altogether?

Today, everything seems to be converging into a prompt or chat interface. While not all problems will be solved through such means, this is merely the beginning. The key isn't the interface itself but what it represents—a fundamental shift in how we interact with technology.

For example, integrating a relatively complex API requires navigating extensive documentation, understanding schemas, responses, exceptions; this not only consumes valuable time but also diverts focus from building core features.

Stripe API spec

Now, imagine bypassing these complexities through an interface that understands your intent via natural language. This is where we can see that the era of UIs and APIs could be evolving into a new paradigm: ACIs.

Before we delve into the what, how, and why of ACIs, let's briefly revisit the journey that has led us to this pivotal moment.

Everything is an API

Let's trace the evolution of APIs to understand why they were invented. Initially, when we had systems with specific algorithms and data structures solving problems in certain domains, it was efficient to encapsulate and expose them through interfaces — Application Programming Interfaces (APIs). To avoid confusion later, let's call these specific APIs "Domain APIs." Then, on top of that, we added another interface, a User Interface, that would capture user intentions and translate them into corresponding API calls.

In essence, we implemented an interface (UI) to interact with another interface (Domain API).

UI interacting with an API

Eventually, we also needed to interface with programs running in remote places all over the Internet, thus our interfaces or even our backend systems would interact with these remote interfaces. And so our interface-building journey led us to where we are today:

UI and remote system interacting with API

So, everything is an I(nterface)... and?

Well not only is everything an interface, the thing is each operates at a different abstraction level, written in various languages, by diverse authors making distinct (and not always coherent) design choices that respond to unique use cases.

As developers, when navigating all this interfacing, we spend considerable time moving through the different layers of a system and its dependencies. We're essentially translating data from one interface to another, all while trying to keep sight of our original intent. In many cases, implementing these interfaces correctly is more challenging and tedious than writing the actual domain logic that solves the problem!

A substantial part of this effort is primarily spent on steps 2 and 3 below:

  1. Developers implement backend systems.
  2. Developers expose programming interfaces in those backend systems.
  3. Developers consume programming interfaces to build backend systems and/or user interfaces.
  4. End-users interact with user interfaces to accomplish their tasks.

This complexity often obscures our primary goal: efficiently solving the user's problem.

The Dawn of Dynamically Generated Interfaces

In the past, what, 16 months?, we've observed a breakthrough: LLM applications like ChatGPT have started using actions that consume APIs and return data in multiple formats, including structured JSON and HTML, CSS, and JavaScript code.

We see again a user interface, in this case, in the form of a prompt or chat. This is a UI that, as any other UIs, is written beforehand by a group of engineers. But this one has a special characteristic: this prompt interface generates new interfaces on the fly, particularly UIs (but not limited to, as we'll see below).

Conversational interface generating a user interface

Today you can ask models like Sonnet or ChatGPT to create interactive interfaces with inputs and outputs, chart visualizations, SVGs, you name it. Combined with actions that enable the model to connect with APIs, this provides the user with an immediate, on-demand, ad-hoc user interface for addressing any particular task.

But, that's not all. With more powerful Tool Calling and Structured Output capabilities, these interfaces not only can generate UIs but also APIs. They return data in whatever format and schema you specify.

Conversational interfaces generating APIs

And this is just the beginning.

The layers that sit between the end-user and the actual algorithms solving the user's needs —the intermediary interfaces designed to translate user intent into programmatic API calls— are becoming redundant.

Sure, most humans are visual; and visual people still want dashboards, panels, and buttons. But the point is not about replacing UIs with Prompts; the point is what these UIs will really become from now on, who will have to implement them, when and how?
Is it really necessary to build a common enough UI to cater the average user? Trying to anticipate use cases? Adding localization for multiple languages? Ensuring accessibility? Always limiting to a visual interface?

All these tasks could be dynamically addressed in a way it best fits Every. Single. User.

These users can receive exactly the interface they need, showing the precise information they require, in a format that matches their preferences and accessibility needs: visual, audio, or text.

Multimodal interface

Towards a Single Interface

With these layers collapsing and interfaces being dynamically generated on the fly, we can envision a single interface serving all purposes:

  • It serves developers looking to integrate functionality into their applications.
  • It serves end-users seeking to interact directly with the system.
  • It serves AI agents performing tasks or seeking information autonomously.

I'd call this single interface an Application Conversational Interface (ACI).

ACI: The Single Interface

While their impact may be remotely comparable to how GraphQL changed API interactions compared to REST, ACIs go much further by proposing an entirely new paradigm.

We can outline several radical differences when compared to traditional APIs:

Intent-based interaction

ACIs don't have fixed or pre-established contracts, schemas, methods, or precise workflows. They are intent-based rather than procedural call-based.

Instead of calling specific methods with predefined parameters, developers and users can express their intentions in natural language. For example, "Create a new user account with admin privileges" is more intuitive than constructing one or two API calls with the correct endpoint and parameters.

Since ACIs are not bound by rigid contracts, they can understand and adapt to new requests without requiring explicit updates or version changes. This flexibility reduces maintenance overhead and accelerates development cycles.

Just imagine how would you build an API endpoint for converting names into base64 encoding, that responds to diverse consumer needs:

# Simple call, could be made by an ACI client or even by end-user
curl localhost -d "How does my name look like in base64? It's John"
Your name "John" in base64 is: Sm9obg==

# Call with a structured response that can be used by an ACI client
curl localhost -d "Write a JSON document with `code: (base64 of 'OpenACI')`"
{"code":"T3BlbkFDSQ=="}

# Call with a JSON request body and a JSON response
# – no different to a normal JSON API (could even
# be used during an API-to-ACI transition phase!)
curl localhost -d '{"intent":"convertToBase64","name":"OpenACI"}'
{"name":"OpenACI","base64":"T3BlbkFDSQ=="}

# Call with multiple parameters (ACI calls the intent
# handler multiple times - no need to change the
# implementation!)
curl localhost -d "Show me an ascii table with the base64 representation of these names: John Connor, Sarah Connor, Kyle Reese"
| Name          | Base64                |
|---------------|-----------------------|
| John Connor   | Sm9obiBDb25ub3I=      |
| Sarah Connor  | U2FyYWggQ29ubm9y      |
| Kyle Reese    | S3lsZSBSZWVzZQ==      |
Enter fullscreen mode Exit fullscreen mode

Now compare the API you pictured, with the ACI implementation that actually responded the requests illustrated above:

import { HttpAci } from '@openaci/http';
import { z } from 'zod';

const app = new HttpAci({ llmName: 'gpt-4o-mini' });

const schema = z.object({
    name: z.string(),
});

app.intent('Convert name to base64', schema, ({ entities }) => {
    const { name } = entities;
    return Buffer.from(name).toString('base64');
});
Enter fullscreen mode Exit fullscreen mode

Human-centric design

These interfaces position humans as the primary consumers, supporting them in any role — whether as end-users or developers. This is crucial: ACIs serve both of them through the same conversational framework. This unification simplifies the architecture and reduces the need for separate interfaces like UIs or specialized APIs.

For developers, as said above, we can quickly integrate functionalities without spending time learning a new API. We can simply write client code that states what we want to achieve, and the ACI interprets and executes the intent.

For end-users, they can customize their interaction with the system on the fly. For instance, a user could ask, either by chat or voice, "Show me all the documents I created yesterday" without navigating through multiple UI screens. Apps could still offer a default visual UI, but users could leverage ACIs to customize and adapt the interface to their needs and preferences, to the detail.

Accessibility and convenience

With the previous point in mind, accessibility is a fundamental aspect of ACIs. Not only for inclusivity but also for convenience. ACIs are multi-language and multi-modal. By supporting multiple languages and modalities (text, voice, visuals), ACIs make systems more accessible to a diverse range of users, including those with disabilities.

Beyond human interaction

As LLMs continue evolving and AI agents become more sophisticated, they also become consumers of ACIs. In this sense, a consumer isn't only human but anyone capable of interacting with these interfaces using natural language. This opens up possibilities for distributed multi-agent systems that can collaborate and negotiate using natural language, although what will probably happen is that AI agents will leverage the flexibility of ACIs to agree on the best data format and will exchange messages in the most optimized way without our involvement.

The Road Ahead

ACIs represent a transformative approach to software interaction, aligning technology more closely with human communication patterns. They have the potential to reduce development overhead by eliminating the need for multiple intermediary layers; they will definitively empower users, who will gain direct and personalized access to system capabilities without depending on someone else thinking an interface that might not fulfill all the user's needs or preferences. ACIs will also foster innovation, since with fewer barriers to integration, new services and collaborations can emerge more rapidly.

Now, there's still a road ahead. While I believe ACIs could be implemented today for some use cases, reality is that the performance, even for the fastest models, and the economies of scale, have yet to tilt in favor of widespread adoption for high-traffic applications. We are not yet at a stage where we could replace an API that is hit with anything above hundreds of requests per second. The current cost structure remains prohibitive for many use cases.

But of course this is just a matter of time. We've seen the mind blowing breakthrough in just the past 12 months. I believe the shift from APIs to ACIs already lets us reimagine how we interact with software.

Conclusion

Application Conversational Interfaces could be a transformative shift in how we interact with software. By dissolving the traditional layers between users and applications, ACIs could promise a future where interaction is more intuitive, personalized, and accessible than ever before. I don't see this just as an incremental improvement — it's a fundamental reimagining of our relationship with technology.

However, as with any paradigm shift, ACIs bring forth questions and challenges that I think we, as a community, need to address:

How will ACIs reshape the roles of developers and designers? With interfaces being dynamically generated, what new skills will professionals need to cultivate?

What are the implications for user privacy and security in a world dominated by intent-based interactions? How do we ensure that the convenience of ACIs doesn't compromise data protection?

How can we overcome the current performance and cost barriers to make ACIs viable for high-traffic applications? What innovations are needed in hardware or software to support this shift?

These questions are not just technical—they touch on ethical, social, and economic dimensions that will shape the future of our digital world.

Join the conversation

The journey towards fully realizing the potential of ACIs is just beginning, and it invites collaboration and dialogue. Your insights, experiences, and ideas are invaluable in navigating this new landscape.

The OpenACI specification

With this paradigm in sight, we want to propose an open specification for Application Conversational Interfaces. It's called OpenACI and its first draft will be published next week.

In the meanwhile, you can play with a very early (and simplistic) prototype of an HTTP OpenACI implementation in our GitHub repo:

GitHub logo openaci / http-node

OpenACI Node implementation

Application Conversational Interfaces (ACIs) introduce a new paradigm, in a way comparable to what GraphQL was for RPC or RESTful APIs, but in this case proposing something entirely different to replace the term “API”.

We can enumerate a few radically different points when compared to APIs:

  1. ACIs don't have a fixed or pre-established contract, schema, methods or a precise workflow. They are intent-based rather than procedural call-based.

  2. These interfaces put humans as the main consumer, without making a distinction whether they are end-users or developers.

  3. With the previous point in mind, accessibility is an important aspect of ACIs. Not only for inclusivity but also for convenience. ACIs are multi-language and multi-modal.

  4. As LLMs continue evolving and AI agents perform better at reasoning, they will also qualify as consumers of ACIs. In this sense, we can iterate over the concept and think of a consumer as anybody capable…

Looking forward to your feedback! If you want to join us discussing and defining the OpenACI spec, write me at lfarzati@gmail.com or ping me on LinkedIn.

[Anthropic Claude 3.5 Sonnet was used to review, proofread, and improve the readability of some sections in this article]

Top comments (1)

Collapse
 
iris1031 profile image
Iris

Hi Luis, thanks a bunch for sharing your insights on agent and web interaction! We're super excited to introduce you to TEN(github.com/TEN-framework/TEN-Agent), the world's first real-time multimodal agent framework for next-gen AI agents. This open-source platform lets developers quickly build agents that handle voice, video, data streams, images, and text all in real-time. Since we're open-source, we'd love to hear any feedback you might have. And if there's anything we can do to make TEN more accessible to others, just give us a shout! Looking forward to your thoughts!