<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Portia AI</title>
    <description>The latest articles on DEV Community by Portia AI (@portia_ai_mark).</description>
    <link>https://dev.to/portia_ai_mark</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3141027%2Fc939face-9418-4df6-8785-47a80b6bf01b.png</url>
      <title>DEV Community: Portia AI</title>
      <link>https://dev.to/portia_ai_mark</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/portia_ai_mark"/>
    <language>en</language>
    <item>
      <title>Design Highlight: Handling data at scale with Portia multi-agent systems</title>
      <dc:creator>Portia AI</dc:creator>
      <pubDate>Thu, 22 May 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/portia-ai/design-highlight-handling-data-at-scale-with-portia-multi-agent-systems-7nb</link>
      <guid>https://dev.to/portia-ai/design-highlight-handling-data-at-scale-with-portia-multi-agent-systems-7nb</guid>
      <description>&lt;p&gt;At Portia, we love building in public. Our &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;agent framework is open-source&lt;/a&gt; and we want to involve our community in key design decisions. Recently, we’ve been focussing on improving how agents handle production data at scale in Portia. This has sparked some exciting design discussions that we wanted to share in this blog post. If you find these discussions interesting, we’d love you to be involved in future discussions! Just get in contact (details in block below) - we can’t wait to hear from you.&lt;/p&gt;

&lt;h3&gt;
  
  
  Calling All Devs
&lt;/h3&gt;

&lt;p&gt;We’d love to hear from you on the design decisions we’re making 💪 Check out the &lt;a href="https://github.com/portiaAI/portia-sdk-python/discussions/449" rel="noopener noreferrer"&gt;discussion thread&lt;/a&gt; for this blog post to have your say. If you want to join our wider community too (or just fancy saying hi!), head on over to our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;discord&lt;/a&gt;, our &lt;a href="https://www.reddit.com/r/PortiaAI/" rel="noopener noreferrer"&gt;reddit community&lt;/a&gt;, or our &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;repo on GitHub&lt;/a&gt; (Give us a ⭐ while you’re there!).&lt;/p&gt;

&lt;p&gt;If you’re new to Portia, we’re building a multi-agent framework that’s designed to enable people to run agents reliably in production. Efficiently handling large and complex data sources is one of the key aspects of this, along with agent permissions, observability and reliability. We’ve seen numerous agent prototypes that work well on small datasets in restricted scenarios, but then start to fall over when faced with the scale and complexity of production data. We want to make sure this doesn’t happen when agents are built with Portia. In this blog post, we’ll explore the design decisions we’ve made to enable this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real agents handling data at scale&lt;a href="https://blog.portialabs.ai/multi-agent-data-at-scale#real-agents-handling-data-at-scale" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;As with all good design discussions, we work backwards from real life use-cases that we’re looking to enable / improve. We’re working with many agent builders and below are a selection of the exciting use-cases we’ve seen that require efficiently processing large data sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A debugging agent that can process many large server log files along with other debug information to diagnose issues.&lt;/li&gt;
&lt;li&gt;A research agent that can process many documents and search results over time as it conducts research into a particular company or person.&lt;/li&gt;
&lt;li&gt;A finance assistant capable of researching over a company’s financial data in a mixture of sheets and docs to answer questions - for example, “from this week’s sales data, identify the top 3 selling products and how their sales are split by Geography”&lt;/li&gt;
&lt;li&gt;A personal assistant capable of having long-running interactions with the user, including taking actions such as scheduling events and sending emails, adapting to their preferences over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In order to handle each of these use-cases well, our agents need to handle data correctly across complex, multi-step plans without being thrown by large documents or making repetitive mistakes. However, we were finding that these agent builders were hitting a couple of key issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Plan run state overload:&lt;/strong&gt; Our execution agent has access to the full state of the plan run. When it runs a tool, it stores the output of the tool run into that state for future use. Over time though, if tools were producing large amounts of data, this state could get very large and congested. As this was passed into the LLM, this would then reduce the accuracy with which the execution agent could retrieve the correct information for each step from the plan run state:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt; : A debugging agent might download and then analyse the logs from 10 different servers and analyse each of them. It might then move on to another task, but the logs from each of those 10 servers would still be in its plan run state, distracting from other useful information when processing future steps.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Tool calling with large inputs:&lt;/strong&gt; Our execution agent calls a language model to produce the arguments for calling each tool. However, when we wanted to call a tool with a large argument (e.g. &amp;gt;1k tokens), we would either hit the output token limits of the model or we would hit latencies that would make the system incredibly slow.

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Example:&lt;/strong&gt; A finance agent might want to read in a large spreadsheet and then pass its contents into a processing tool to extract the key data it needs. We saw occasions where just generating the args for the processing tool took more than 5 minutes because it needed to print out the full contents of the spreadsheet!&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;We needed to fix these two issues, so our agent builders could stop wrestling with context windows and focus on shipping features.&lt;/p&gt;

&lt;h2&gt;
  
  
  An aside on long-context models&lt;a href="https://blog.portialabs.ai/multi-agent-data-at-scale#an-aside-on-long-context-models" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Before diving into potential solutions, let’s explain why we thought this is a problem worth solving even with the vast context windows seen from the latest models (e.g. &lt;a href="https://ai.meta.com/blog/llama-4-multimodal-intelligence/" rel="noopener noreferrer"&gt;Llama4 has a 10m token window&lt;/a&gt; while &lt;a href="https://openai.com/index/gpt-4-1/" rel="noopener noreferrer"&gt;GPT 4.1 has a 1m context window&lt;/a&gt;). These models have certainly changed the equation - before their arrival, we hit context window limits a lot more than we currently do. However, using these models with large data sources is still difficult and problematic for multiple reasons:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;th&gt;SOTA models boast strong accuracy scores in needle-in-a-haystack tests, but &lt;em&gt;real&lt;/em&gt; scenarios are more complex, requiring reasoning over and connecting different pieces of information in the context, and models get much weaker at this when the context is large.&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Filling GPT 4.1’s context window will cost you $2 of processing for every LLM call. Agentic systems typically make many LLM calls, so $2 can quickly make your system very expensive!&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;As the token number increases, so does the latency, particularly for output tokens. &lt;a href="https://platform.openai.com/docs/guides/latency-optimization/3-use-fewer-input-tokens#generate-fewer-tokens" rel="noopener noreferrer"&gt;OpenAI states&lt;/a&gt; that while doubling input tokens increases latency 1-5%, doubling output tokens &lt;em&gt;doubles&lt;/em&gt; output latency. When compounded with the fact that agentic systems make many LLM calls, this can make systems very sluggish.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Failure Modes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Interestingly, language models fail in different ways when the context length gets large. There’s a great study on this from &lt;a href="https://www.databricks.com/blog/long-context-rag-performance-llms" rel="noopener noreferrer"&gt;databricks&lt;/a&gt;. This adds instability to the system because the prompts you’ve been iterating on in low data scenarios suddenly don’t work as you expected in production.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Preventing our plan run state becoming overloaded&lt;a href="https://blog.portialabs.ai/multi-agent-data-at-scale#preventing-our-plan-run-state-becoming-overloaded" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Given the above, we can’t just rely on long-context models and need to handle the issue with overloading our plan run state within our framework. To solve this, we needed to reduce the size of the context used by our execution agent, and we did this as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, we introduced agent memory. For large outputs (the default is above 1k tokens) we store the full value in agent memory with just a reference in the plan run state. This prevents previous large outputs from clogging up the plan run state when they are no longer needed.

&lt;ul&gt;
&lt;li&gt;You can configure where your agent stores memories through our &lt;code&gt;storage_class&lt;/code&gt; configuration option (see our &lt;a href="https://docs.portialabs.ai/manage-config#manage-storage-options" rel="noopener noreferrer"&gt;docs&lt;/a&gt; for more details). If you choose Portia cloud, you’ll be able to view the memories in our dashboard:&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fflaesuf76t874rk0u9ks.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fflaesuf76t874rk0u9ks.png" alt="A screenshot of the Portia dashboard, showing memories from previous plan runs." width="800" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Our planner selects inputs for each step of our plan. If one of these inputs is in agent memory, we fetch the value from memory, as we know it is specifically needed for this step. This allows the execution agent to fully utilise the large values in agent memory when needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2obsl1fnu4m1thc93652.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2obsl1fnu4m1thc93652.png" alt="A sequence diagram, showing how agent memory fills up over multiple steps" width="800" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can check out the code for this feature &lt;a href="https://github.com/portiaAI/portia-sdk-python/pull/319" rel="noopener noreferrer"&gt;in this PR&lt;/a&gt; and the docs are &lt;a href="https://docs.portialabs.ai/agent-memory" rel="noopener noreferrer"&gt;here&lt;/a&gt;. For our first implementation of agent memory, we decided to only allow pulling the full value from agent memory, rather than indexing the values in a vector database (or other form of database) and allowing queries based on that. A key reason for this (as well as wanting to keep our initial implementation as simple as possible) is that the way memories need to be queried is very task dependent. There are times when a semantic similarity search of memories is required (e.g. a debugging agent looking for similar errors among log files), while other times require filtering on exact values (e.g. a debugging agent looking for logs between two timestamps from a particular service), a projection of the values (e.g. a finance assistant just taking several columns from a spreadsheet) or access to the full value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Our future vision - a memory agent&lt;a href="https://blog.portialabs.ai/multi-agent-data-at-scale#our-future-vision---a-memory-agent" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Ultimately, we’ll want to support all these patterns, but doing this efficiently requires indexing and querying the memories intelligently based on the task. We believe that this will be best done by a separate memory agent - an agent within our multi-agent system that indexes and queries agent memories so that the required pieces can be retrieved for the task and passed to the execution agent. This clearly adds complexity to the system though! So we wanted to see how our agent builders use agent memory before jumping to conclusions on the best way to index and query the memories.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tell us what you think!
&lt;/h3&gt;

&lt;p&gt;We’d love to hear what you thought of the decision to not automatically ingest agent memories into a vector database. Is it something you’d like to see? Get involved in the &lt;a href="https://github.com/portiaAI/portia-sdk-python/discussions/449" rel="noopener noreferrer"&gt;Github discussion&lt;/a&gt; and let us know.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solving Tool calling with large inputs&lt;a href="https://blog.portialabs.ai/multi-agent-data-at-scale#solving-tool-calling-with-large-inputs" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Our introduction of agent memory meant that the execution agent was managing the context it sent to the language model much better. However, we were still facing the issue, mentioned above, where the language model struggled to produce large arguments for tools when needed. In order to solve this, we provided the language model with the ability to use templates to input variables. When calling the language model, the execution agent would outline in the prompt that, if the language model simply wanted to use a value from agent memory verbatim, it didn’t have to copy the value out - it could just put, for example &lt;code&gt;{{$large_memory_value}}&lt;/code&gt;. We then extended the execution agent to retrieve $large_memory_value from agent memory when this was done and template the value in, so that the tool received the full value.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkjt1yr5rej1zcr2u97bb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkjt1yr5rej1zcr2u97bb.png" alt="A diagram showing how templating works when providing large memory values to tools." width="724" height="276"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can check out the code for this feature in this PR: &lt;a href="https://github.com/portiaAI/portia-sdk-python/pull/369" rel="noopener noreferrer"&gt;Add ability for execution agent to template large outputs&lt;/a&gt;. Interestingly, after a bit of initial tuning, we found that the language model was able to determine correctly whether it should template a variable or not. This has led to a massive improvement in latency and cost of our agents calling tools with large data sources. For example, a personal assistant use-case that involved analysing a large spreadsheet reduced in time from 3-5 minutes to &amp;lt;10s.&lt;/p&gt;

&lt;h3&gt;
  
  
  What do you think?
&lt;/h3&gt;

&lt;p&gt;What do you think of allowing language models to template variables rather than copy them out fully? Do you have a better approach for this? Get involved in the &lt;a href="https://github.com/portiaAI/portia-sdk-python/discussions/449" rel="noopener noreferrer"&gt;GitHub discussion&lt;/a&gt; and let us know.&lt;/p&gt;

&lt;h2&gt;
  
  
  Going forwards&lt;a href="https://blog.portialabs.ai/multi-agent-data-at-scale#going-forwards" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;We believe this work sets a great foundation for building multi-agent systems with Portia that handle data at scale, and we’ve got an exciting roadmap of features to keep making this even better:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-ingest knowledge / memories: we want to allow kicking off our agents with knowledge and memories already loaded, rather than requiring the agent to fetch all the information needed as part of the run&lt;/li&gt;
&lt;li&gt;Improved pagination handling: we want to allow our execution agent to more efficiently use paginated APIs&lt;/li&gt;
&lt;li&gt;Memory agent: as mentioned above, we’re excited about the possibilities opened up by a separate memory agent. Once we’ve got a good idea of how agent memory is being used in its current form, we’d love to start discussions on how this new agent might fit into our system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’re really looking forward to finding out what these new large data capabilities will unlock for agent builders working on Portia!&lt;/p&gt;

&lt;h3&gt;
  
  
  What do &lt;em&gt;you&lt;/em&gt; think?
&lt;/h3&gt;

&lt;p&gt;Hopefully you enjoyed this blog post. If you did (or even if you didn’t!), we’d love to hear from you. Did you agree with the design decisions we are taking? Do you think we should take a different approach? If you’ve got thoughts and ideas, we’d love to hear about them in the &lt;a href="https://github.com/portiaAI/portia-sdk-python/discussions/449" rel="noopener noreferrer"&gt;GitHub discussion&lt;/a&gt; associated with this post. And we love chatting about code even more, so if you’ve got an idea, fork our repo and we’d love to review the code 🚀&lt;/p&gt;

&lt;h2&gt;
  
  
  Join the conversation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Like this article?&lt;/strong&gt; – &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;Give us a ⭐ on GitHub&lt;/a&gt;. It really helps!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse our website and try our (modest) playground at &lt;a href="http://www.portialabs.ai/" rel="noopener noreferrer"&gt;www.portialabs.ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Head over to our docs at &lt;a href="https://docs.portialabs.ai/" rel="noopener noreferrer"&gt;docs.portialabs.ai&lt;/a&gt; or get immersed in our &lt;a href="https://github.com/portiaAI/portia-sdk-python/tree/main/portia/open_source_tools" rel="noopener noreferrer"&gt;SDK&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation on our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Watch us embarrass ourselves on our &lt;a href="https://www.youtube.com/@PortiaAI" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://www.producthunt.com/posts/portia-ai" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>portia</category>
      <category>memory</category>
      <category>design</category>
    </item>
    <item>
      <title>Beyond APIs: Software interfaces in the agent era</title>
      <dc:creator>Portia AI</dc:creator>
      <pubDate>Fri, 09 May 2025 16:40:31 +0000</pubDate>
      <link>https://dev.to/portia-ai/beyond-apis-software-interfaces-in-the-agent-era-1p2a</link>
      <guid>https://dev.to/portia-ai/beyond-apis-software-interfaces-in-the-agent-era-1p2a</guid>
      <description>&lt;p&gt;For decades, APIs have been the standard for connecting software systems. Whether REST, gRPC, or GraphQL, APIs follow the same principle: well-structured interfaces that are defined ahead of time to expose data and functionality to third parties. But as AI Agents start taking on more autonomous operations this rigid model is limiting what they can do. &lt;/p&gt;

&lt;p&gt;APIs work well when requirements are known in advance, but agents often lack full context at the start. They explore, iterate and adapt based on their goals and real-time learning. Relying solely on predefined API calls can restrict an agent’s ability to interact dynamically with software.&lt;/p&gt;

&lt;p&gt;Like many in our industry, we have been dealing a lot with the challenges of agent to software interfaces. We think the future of these interfaces will move beyond static APIs toward more flexible, expressive, and adaptive mechanisms. More on our thinking below, we’d love to hear your thoughts!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.portialabs.ai%2Fassets%2Fimages%2Flost_agent-f2fd515a47c258184a863f19f2e714d9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.portialabs.ai%2Fassets%2Fimages%2Flost_agent-f2fd515a47c258184a863f19f2e714d9.png" alt="Lost Agent" width="800" height="801"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The limitations of APIs for agents&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;APIs are designed for predictable, developer-driven interactions. The developer writes a request, expects a response, and handles errors explicitly. While this works well for traditional software integrations, it introduces several friction points when applied to autonomous agents operating in dynamic environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Rigid interfaces don’t work well with dynamic reasoning&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;APIs define fixed contracts with specific endpoints, request formats, and outputs. But agents operate in a less deterministic way. Whilst an API structure may work well for one use case another may be completely impossible given the set of endpoints and data the API exposes. For example, an API might provide a &lt;code&gt;fetch_customer_data(id)&lt;/code&gt; function, but an agent will likely not start with an ID, but perhaps a name or email. This forces agents to reason about chaining multiple API calls for tasks that could be a single step.  &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. API interfaces need to be integrated ahead of time&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To achieve good performance APIs often need to be wrapped in a tool that is passed to the agent. This means that work is needed ahead of time to integrate the API. Again this limits what the agent can achieve. It is unable to discover and integrate APIs it needs at runtime based on the task that it has been asked to achieve. &lt;/p&gt;

&lt;p&gt;Even for coding agents the requirements for authentication and documentation require some work to integrate the API ahead of time. Not to mention that for a production system your agent needs to be able to handle a whole other set of software engineering concerns, for example pagination, caching and rate-limiting  &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Handling errors and change management&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In traditional software, API failures are usually logged and errors are returned to the caller. But agents can dynamically react to errors and adjust their path toward the goal. This is a completely new paradigm and one of the advantages of agents, but existing APIs often don’t provide enough information or direction for the agent to plan how it will recover from an encountered error effectively.&lt;/p&gt;

&lt;p&gt;Likewise, having done all the work to integrate an API, your agent is tied to the current implementation. Any versioning changes or worse breaking changes will mean your shiny agent is now functionally broken. &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What would good look like?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If APIs aren’t enough, what does an agent-first interface look like? Instead of rigid, predefined endpoints, future agents to software interfaces should be &lt;strong&gt;adaptive, declarative, and goal-oriented&lt;/strong&gt;. Rather than requiring agents to conform to static API contracts, the software itself should expose capabilities in a way that agents can &lt;strong&gt;reason about, compose, and execute dynamically&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Self describing and discoverable Interfaces&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Agents should not need hardcoded API specifications or be coded ahead of time to connect to them. Instead, it should be possible for agents to describe available actions, parameters, and expected outputs on the third party software. This could include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema-based discovery&lt;/strong&gt; - agents should be able to discover and call third party systems dynamically. This means they should be able to connect, list available functionality and integrate with them all at runtime.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution hints&lt;/strong&gt;  - Interfaces should provide execution hints beyond basic documentation. For example, “This action requires a valid session token” or “This function is expensive, use sparingly”.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine grained agent native authentication&lt;/strong&gt; - Most APIs authentication and authorization controls are designed around human users who sign up ahead of time. Future software to agent interfaces should allow an agent to securely sign up and have sensible authorization controls for what they can do. We’ve explored how we think authentication might evolve &lt;a href="https://blog.portialabs.ai/agent-auth-part-II" rel="noopener noreferrer"&gt;here&lt;/a&gt;. &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Flexible, goal-oriented invocation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Today callers of an API need to work out which set of APIs to call to achieve their goal. But in the future agents should be able to express intent, and the system should handle how to achieve the goal. This would mean software that exposes declarative interfaces where the agent specifies &lt;em&gt;what&lt;/em&gt; it wants to achieve, and the system determines &lt;em&gt;how&lt;/em&gt;. &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Robust error handling and introspection&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Instead of returning opaque error codes, software should provide rich, structured feedback that agents can reason about. This could include execution hints that nudge agents towards other approaches (This API requires X, you can get it by calling API Y) or clear guidance if the error can’t be worked around. It would also likely involve providing additional context in the error response so in the case a human needs to be included in the resolution, simple steps can be provided to them. &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Support for parallel execution&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Agents benefit from running multiple actions in parallel but this isn’t a model many APIs support. Pagination for example is an area where the design of the API has a big impact on how easily the agent can fetch data in parallel. Page based APIs (GET /blogs?page=1) are easier to load in parallel then than token based ones (GET /blogs?token=1s2fsf) where the result of the current page is needed to load the next one. &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Path Forward&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To move beyond APIs, we need to rethink &lt;strong&gt;how software communicates its capabilities&lt;/strong&gt; to agents in a way that is &lt;strong&gt;flexible, interpretable, and adaptable&lt;/strong&gt;. The goal isn’t to replace APIs outright but to evolve toward a model where agents don’t just consume endpoints—they &lt;strong&gt;understand and navigate software functionality intelligently&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Let’s explore some real-world approaches and frameworks pushing toward this future.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. LLM assisted APIs aka Tools&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A feature of some agentic frameworks (like Portia AI) is to wrap the APIs that are provided to the agent in a level of LLM smarts. This gives more flexibility in how the tool is called. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the format of a timestamp in an input field has changed the LLM can try again with the correct format as long as the API returns a descriptive error field.
&lt;/li&gt;
&lt;li&gt;If the name of a field in the response has changed the LLM can still use the response without needing a change to the tool definition.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_posts_since&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args_schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;GetPostsInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_direct&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_posts_since&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://my_hosted_blog.com/posts?since=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Raise an error for non-200 responses
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RequestException&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error fetching posts: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ &lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent only has to reason about function signatures rather than full API calls.
&lt;/li&gt;
&lt;li&gt;Allows some flexibility for the agent to try new approaches and recover from some errors.
&lt;/li&gt;
&lt;li&gt;The agent can handle extraction of the relevant data from the response itself. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ &lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Still requires explicit coding of the functions ahead of time.
&lt;/li&gt;
&lt;li&gt;No built-in reasoning about dependencies or execution order which is usually required with REST APIs. &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Generating Code Instead of Calling APIs&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Instead of integrating directly with APIs, agents can generate and execute code using SDKs that wrap underlying functionality. This is common today for AI-assisted development, where models write API client code dynamically instead of calling endpoints directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An LLM generates a Python snippet using a cloud provider’s SDK instead of making raw API requests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-bucket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ &lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Leverages existing SDK ecosystems (AWS SDK, Google Cloud SDK, etc.).
&lt;/li&gt;
&lt;li&gt;Gives agents more control over execution (e.g., handling exceptions, retries).
&lt;/li&gt;
&lt;li&gt;More resilient to API changes as the SDK abstracts away versioning differences.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ &lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requires agents to understand and generate valid code. Depending on the goal this might require accurately calling several functions within the SDK.
&lt;/li&gt;
&lt;li&gt;Performance relies on well documented SDK which work exactly as described.
&lt;/li&gt;
&lt;li&gt;Execution environments need to be properly sandboxed to prevent rogue code being executed. Either you have no sandboxing which is a security nightmare in production or you sandbox the code that is generated but this is an engineering challenge and limits functionality.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Computer and / or browser use&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Some agents bypass APIs entirely, interacting with software through web browsers just like humans do. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instead of integrating with a banking API, an agent navigates the bank’s website, logging in, clicking buttons, and extracting data from web pages.
&lt;/li&gt;
&lt;li&gt;Browser agents go beyond basic automation tools like Selenium, Playwright, and Puppeteer by leveraging multi-modal discovery of web pages and reasoning to navigate them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✅ &lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works even when APIs aren’t available.
&lt;/li&gt;
&lt;li&gt;Mimics real human interactions, reducing integration friction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ &lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can be more fragile as changes to UI or new pop ups can break automation and are far more common than API changes.
&lt;/li&gt;
&lt;li&gt;The approach is orders of magnitude slower than API access and also far more expensive as more LLM usage is needed to guide the computer usage
&lt;/li&gt;
&lt;li&gt;Can be a challenge to deal with authentication, a lot of energy has been expended in the last decade trying to prevent automation from interacting with websites (i.e. Captchas). We tinkered with browser agents a few months back. Check it out &lt;a href="https://www.linkedin.com/pulse/chatgpt-operator-out-so-whats-next-browser-agents-mounir-mouawad-eyfwe/?trackingId=2ZxzyR5S1X3SzrpDo0eTPw%3D%3D" rel="noopener noreferrer"&gt;here if you're interested(↗)&lt;/a&gt; and keep an eye on a fresh look in the coming weeks!&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Dynamically discoverable tools&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;One of the most promising directions for agent-software interaction is Model Context Protocol (or MCP), which emphasizes dynamic tool discovery, self-registration, and agent-to-agent collaboration. Instead of hardcoded integrations, software components expose self-describing capabilities that agents can discover and use on demand. We’ve written about our experience &lt;br&gt;
&lt;a href="https://blog.portialabs.ai/portia-mcp-stripe-example" rel="noopener noreferrer"&gt;using MCP here&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;An agent contacts an MCP registry and lists a set of MCP servers relevant to its task. The agent can self-register with these servers using &lt;a href="https://datatracker.ietf.org/doc/html/rfc7591" rel="noopener noreferrer"&gt;OAuth Dynamic Registration&lt;/a&gt;. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Once registered the agent can list all the tools a server has and call them based on the metadata supplied by them. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;✅ Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No integration required ahead of time. Thanks to the registry and dynamic registration new tools can be integrated on the fly.
&lt;/li&gt;
&lt;li&gt;More composable and modular, tools can be combined in new ways based on the goal at hand.
&lt;/li&gt;
&lt;li&gt;Maintenance of tools is handled by the third party who owns the MCP server. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ &lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whilst we believe MCP will be a big part of the puzzle in the future, current implementations are somewhat lacking. There are few registries and authentication has only &lt;a href="https://spec.modelcontextprotocol.io/specification/2025-03-26/basic/authorization/" rel="noopener noreferrer"&gt;been added to the standard(↗)&lt;/a&gt; in the last couple of days.&lt;/li&gt;
&lt;li&gt;Even with good implementations in the future, the quality of tools can vary largely and agents will need to be able to identify good servers/tools for good performance. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.portialabs.ai%2Fassets%2Fimages%2Fmcp_market_map-c0439b6b810cc89bc971b5b0ef238d0c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.portialabs.ai%2Fassets%2Fimages%2Fmcp_market_map-c0439b6b810cc89bc971b5b0ef238d0c.png" alt="MCP Market" width="800" height="674"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Agent &amp;lt;&amp;gt; Agent interfaces&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Instead of imperative API calls (&lt;code&gt;POST /create_user&lt;/code&gt;), in the future declarative workflows will let agents specify what they want to achieve, and the external system will determine how to execute it. We see some natural language APIs today but true flexibility will be achieved when we can have inter-system agent handoff. &lt;/p&gt;

&lt;p&gt;Note we see this as different to intra-system agent handoffs; many multi-agent systems today allow you to have agents talking to agents, but we believe in the future the interface between systems will also be agent &amp;lt;&amp;gt; agent. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Instead of calling your billing software APIs manually, an agent would submit a high-level goal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;"Ensure customer X has an active subscription and has been notified of their renewal"&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;The system maps this to the correct sequence of operations, handling retries and dependencies automatically.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;✅ &lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shifts complexity from the agent to the software, moving the need to understand the domain logic of the problem to the party with the most context. In the above example I don’t need to understand how a user entity relates to a subscription and the set of API calls to update both.
&lt;/li&gt;
&lt;li&gt;Removes overhead of API management, versioning etc since execution logic is abstracted.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ &lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Will require strong guardrails, authentication controls, and human in the loop controls etc.
&lt;/li&gt;
&lt;li&gt;This is a new paradigm in software engineering and this may limit adoption. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Bringing it all together&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;At Portia, we’re excited about the future of agent-first software interfaces. The limitations of traditional APIs don’t mean the end of structured integrations, but rather the beginning of a new era—one where software exposes its capabilities in ways that agents can reason about, adapt to, and dynamically interact with.&lt;/p&gt;

&lt;p&gt;From LLM-assisted smart tools to dynamically discoverable interfaces like MCP, the path forward is clear: agents need more flexible, self-describing, securely authenticated and goal-oriented mechanisms to interact with software. &lt;/p&gt;

&lt;p&gt;We believe the best agent &amp;lt;&amp;gt; software interfaces are still ahead of us, and we’re excited to push the boundaries. If you’re thinking about these challenges too, we’d love to hear your thoughts!&lt;/p&gt;

&lt;h2&gt;
  
  
  Join the conversation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Like this article?&lt;/strong&gt; – &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;Give us a ⭐ on GitHub&lt;/a&gt;. It really helps!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse our website and try our (modest) playground at &lt;a href="http://www.portialabs.ai/" rel="noopener noreferrer"&gt;www.portialabs.ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Head over to our docs at &lt;a href="https://docs.portialabs.ai/" rel="noopener noreferrer"&gt;docs.portialabs.ai&lt;/a&gt; or get immersed in our &lt;a href="https://github.com/portiaAI/portia-sdk-python/tree/main/portia/open_source_tools" rel="noopener noreferrer"&gt;SDK&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation on our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Watch us embarrass ourselves on our &lt;a href="https://www.youtube.com/@PortiaAI" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://www.producthunt.com/posts/portia-ai" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>portia</category>
      <category>mcp</category>
      <category>api</category>
      <category>tools</category>
    </item>
    <item>
      <title>Visualise your Obsidian notes with Qwen3</title>
      <dc:creator>Portia AI</dc:creator>
      <pubDate>Thu, 08 May 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/portia-ai/visualise-your-obsidian-notes-with-qwen3-3jj5</link>
      <guid>https://dev.to/portia-ai/visualise-your-obsidian-notes-with-qwen3-3jj5</guid>
      <description>&lt;p&gt;Many users with stringent security, privacy or latency requirements have told us they prefer to run their own LLM instances locally. We recently added support for interfacing with Ollama models running locally.&lt;/p&gt;

&lt;p&gt;To explore how we might use a local LLM practically, we decided to build an app that could turn an Obsidian note into a concept map – a visual diagram that shows how different ideas in the note are related. As an early stage startup we've actually been building our internal apps on top of local LLMs to keep our costs low: we use the obsidian app in this post to visualise notes coming out of our weekly engineering design meetings!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.portialabs.ai%2Fassets%2Fimages%2Fmicroservices-ea0f6666ca41f7afb10f5c2e3c6d52b0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.portialabs.ai%2Fassets%2Fimages%2Fmicroservices-ea0f6666ca41f7afb10f5c2e3c6d52b0.png" alt="An example concept map for the subject of microservices." width="800" height="589"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;A Microservice Concept Map&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The app reads a single note, uses Qwen3 4B to analyse its contents, extracts relationships between key concepts, and outputs a PNG file with a graph-style visualisation. The entire process runs locally, using Portia AI to handle orchestration between tools.&lt;/p&gt;
&lt;h2&gt;
  
  
  Obsidian is a great app for storing notes
&lt;/h2&gt;

&lt;p&gt;If you're serious about organising ideas, thoughts, or long-form research, &lt;a href="https://obsidian.md/" rel="noopener noreferrer"&gt;Obsidian&lt;/a&gt; is one of the best apps available. It's fast, extensible, and designed around a powerful but simple principle: your notes are plain text files that are stored locally by default (in Markdown).&lt;/p&gt;
&lt;h2&gt;
  
  
  Why local LLMs are worth using
&lt;/h2&gt;

&lt;p&gt;Most people experience language models through cloud-based APIs like ChatGPT or Claude. These are large, powerful models hosted on someone else's infrastructure. But for many developer workflows – especially apps that run against your own local data – there's a strong case for running smaller models directly on your own machine. The Portia AI SDK supports all models that can be run by Ollama.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tradeoff&lt;/th&gt;
&lt;th&gt;Local LLMs&lt;/th&gt;
&lt;th&gt;Hosted LLMs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Privacy&lt;/td&gt;
&lt;td&gt;Everything stays on your computer&lt;/td&gt;
&lt;td&gt;All prompts and data are sent over the internet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency&lt;/td&gt;
&lt;td&gt;Near-instant results&lt;/td&gt;
&lt;td&gt;Slower, network-dependent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;Free after setup&lt;/td&gt;
&lt;td&gt;Pay-per-token or subscription&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control&lt;/td&gt;
&lt;td&gt;Full control over the model and environment&lt;/td&gt;
&lt;td&gt;Limited access to fine-tuning or weights&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy and context&lt;/td&gt;
&lt;td&gt;Smaller context windows, less precision&lt;/td&gt;
&lt;td&gt;Larger, more capable, and usually more accurate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup complexity&lt;/td&gt;
&lt;td&gt;Requires installation and some configuration&lt;/td&gt;
&lt;td&gt;Works out of the box with API key&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Why we chose Qwen3
&lt;/h2&gt;

&lt;p&gt;We decided to use the &lt;a href="https://qwenlm.github.io/blog/qwen3/" rel="noopener noreferrer"&gt;Qwen3&lt;/a&gt; family of models from Alibaba’s open-source LLM line. These models are trained with multilingual capabilities and perform well even at smaller sizes.&lt;/p&gt;

&lt;p&gt;The Qwen3 4B model in particular offers a nice balance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It can run comfortably on machines with under 10GB of VRAM&lt;/li&gt;
&lt;li&gt;It’s available through Ollama, which makes setup simple.&lt;/li&gt;
&lt;li&gt;It’s reasonably competent at task execution and factual recall, especially for structured prompts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That said, Qwen3 4B can and will make mistakes, especially when extracting subtle relationships or summarising long-form content. That’s a tradeoff we accept for speed and local control.&lt;/p&gt;
&lt;h2&gt;
  
  
  What does this app &lt;em&gt;do&lt;/em&gt;?
&lt;/h2&gt;

&lt;p&gt;Let's talk about how to run the app, and what it does, before going into the code.&lt;/p&gt;

&lt;p&gt;You can run the app with &lt;code&gt;uv run main.py NOTE&lt;/code&gt; where &lt;code&gt;NOTE&lt;/code&gt; should be the name of one of your notes in an Obsidian vault. In the provided Obsidian vault, there's a note called &lt;code&gt;DDD&lt;/code&gt;, all about Domain Driven Design. So you could call the app with &lt;code&gt;uv run main.py DDD&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The app will then configure an explicit plan, consisting of the following steps, using Portia’s plan builder:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;List all available vaults&lt;/td&gt;
&lt;td&gt;MCP Tool call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fetch the note from the obsidian vaults&lt;/td&gt;
&lt;td&gt;MCP Tool call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create a concept map visualization using the extracted relationships&lt;/td&gt;
&lt;td&gt;Custom visualisation Tool&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We chose to configure an explicit plan rather than rely on Portia's planning agent because tests showed Qwen3 4b to be unreliable at planning. In this case, the plan is always going to be the same, so it makes some sense to outline it explicitly in code using the PlanBuilder interface.&lt;/p&gt;

&lt;p&gt;The plan is passed to Portia, which has been configured to use Qwen3 4b via the Ollama interface. (We'll show you that below.)&lt;/p&gt;
&lt;h3&gt;
  
  
  What is Ollama?
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; is a free, open-source app that allows you to run large language models (LLMs) locally on your computer or a server. It currently supports &lt;a href="https://www.ollama.com/library" rel="noopener noreferrer"&gt;30 different models&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Diving into the code
&lt;/h2&gt;

&lt;p&gt;Let's take a deeper look at the app, and what the code looks like. All the code is available in our &lt;a href="https://github.com/portiaAI/portia-agent-examples/tree/local-llm/local-llm" rel="noopener noreferrer"&gt;example code GitHub repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We're not going to cover every line of code in this project. If you want to see all of the code, check out the full code example. We'll guide you through all the important code below.&lt;/p&gt;
&lt;h3&gt;
  
  
  Configuring Portia to use a local LLM
&lt;/h3&gt;

&lt;p&gt;Portia supports running local LLMs via &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;. At the moment, that’s over 30 models, including Meta’s Llama Series, the Qwen series of models that we’re using here, and many others. The only requirement is that when you specify it in code, the model name begins with "ollama/" and then the specifier for the model you wish to run.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;default_log_level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEBUG&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;default_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama/qwen3:4b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;execution_agent_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ExecutionAgentType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ONE_SHOT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/portiaAI/portia-agent-examples/blob/local-llm/local-llm/main.py#L117-L121" rel="noopener noreferrer"&gt;This code in GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Portia offers two types of execution agents that take care of executing a step. &lt;a href="https://github.com/portiaAI/portia-sdk-python/blob/main/portia/execution_agents/default_execution_agent.py" rel="noopener noreferrer"&gt;DEFAULT&lt;/a&gt; agent which is parsing and verifying arguments of the tool to reduce hallucinations or made up values before the tool is called. This is recommended for complex tasks and tools that have complex parameters (with defaults..etc). &lt;a href="https://github.com/portiaAI/portia-sdk-python/blob/main/portia/execution_agents/one_shot_agent.py" rel="noopener noreferrer"&gt;ONE_SHOT&lt;/a&gt; is faster and more cost efficient when the tool call is simple. We generally recommend using &lt;code&gt;ONE_SHOT&lt;/code&gt; for smaller models (like Qwen3 4b) as our default agent is optimised more for larger, more capable models&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding the required tools
&lt;/h3&gt;

&lt;p&gt;An &lt;a href="https://github.com/StevenStavrakis/obsidian-mcp" rel="noopener noreferrer"&gt;MCP server&lt;/a&gt; already exists for Obsidian, and luckily Portia makes our lives much easier by supporting MCP out-of-the-box! The code below installs and runs &lt;code&gt;obsidian-mcp&lt;/code&gt; locally via &lt;code&gt;npx&lt;/code&gt;. The visualisation tool is part of this project, and is included in the code-base. (We'll tell you more about the visualisation tool in a moment.)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;obsidian_mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;McpToolRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_stdio_connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;server_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;obsidian&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;obsidian-mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OBSIDIAN_VAULT_PATH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add all tools to the registry
&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;obsidian_mcp&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nc"&gt;ToolRegistry&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nc"&gt;VisualizationTool&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;

&lt;span class="n"&gt;portia&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Portia&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/portiaAI/portia-agent-examples/blob/local-llm/local-llm/main.py#L124-L140" rel="noopener noreferrer"&gt;This code in GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the tools have been configured, they're passed to Portia's constructor, along with the required configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vibe coding a visualisation tool 🏄🏻‍♂️
&lt;/h3&gt;

&lt;p&gt;Because we're on-trend at Portia, we decided to vibe-code the visualisation component, which renders concept maps from the relationships extracted in each note. Omar wrote the visualisation tool quickly, guided more by intuition and immediate usefulness (and a dash of Spidey sense) than formal design specs.&lt;/p&gt;

&lt;p&gt;It was an ideal candidate for this approach because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The requirements were loose: "make a diagram that looks decent"&lt;/li&gt;
&lt;li&gt;It could be tested easily and repeatedly with mock data&lt;/li&gt;
&lt;li&gt;The failure modes (e.g. cluttered layout, hard-to-read arrows) were visual and obvious&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It was fun to build, and it works reliably, but we don't recommend using this code in production, and we're not going to talk about it here! One does not simply vibe code one’s way into production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Making a plan
&lt;/h3&gt;

&lt;p&gt;We've designed a plan explicitly for completing this task. Using the PlanBuilder interface is useful when you want to implement a simple and / or repeatable plan. It's also useful in cases when the underlying LLM available is not strong at planning tasks as is the case in this example.&lt;/p&gt;

&lt;h3&gt;
  
  
  Teaching Portia’s planning agent
&lt;/h3&gt;

&lt;p&gt;Another option to increase reliability is to use Portia's new &lt;a href="https://blog.portialabs.ai/improve-planning-with-user-led-learning" rel="noopener noreferrer"&gt;User Led Learning&lt;/a&gt; feature to guide future planning in the right direction.&lt;/p&gt;

&lt;p&gt;The following code can be found in the &lt;a href="https://github.com/portiaAI/portia-agent-examples/blob/local-llm/local-llm/main.py#L25" rel="noopener noreferrer"&gt;&lt;code&gt;create_plan_local&lt;/code&gt;&lt;/a&gt; function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="nc"&gt;PlanBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Create a concept map image from the note with title &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;note_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;List all available vaults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp:obsidian:list_available_vaults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fetch the note named &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;note_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; from the obsidian vaults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp:obsidian:read_note&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Create a concept map visualization using the extracted relationships. Title the image &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;note_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; and output the image to the directory &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;OBSIDIAN_VAULT_PATH&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/visualizations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;visualization_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/portiaAI/portia-agent-examples/blob/local-llm/local-llm/main.py#L41-L59" rel="noopener noreferrer"&gt;This code in GitHub&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Putting it all together
&lt;/h3&gt;

&lt;p&gt;And, honestly, that's kind of &lt;em&gt;it&lt;/em&gt;. This plan can be provided to Portia's &lt;code&gt;run_plan&lt;/code&gt; method, and Qwen will read the Obsidian note specified with &lt;code&gt;args.note&lt;/code&gt;, will generate a concept map, and add it to a visualisations directory in your Obsidian vault!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_plan_local&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;portia&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;note&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;portia&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/portiaAI/portia-agent-examples/blob/local-llm/local-llm/main.py#L136-L138" rel="noopener noreferrer"&gt;This code in GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The quality of the results can be a little varied, depending on the source material, and how the Qwen3 model is feeling when you run it.&lt;/p&gt;

&lt;h3&gt;
  
  
  What did we learn?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Qwen3 4b is very capable, given its small size and requirements.&lt;/li&gt;
&lt;li&gt;It can be unreliable at planning and sometimes even tool call generation for some cases that require big inputs.&lt;/li&gt;
&lt;li&gt;Planning issues can be avoided if you are able to explicitly design a plan or use &lt;a href="https://blog.portialabs.ai/improve-planning-with-user-led-learning" rel="noopener noreferrer"&gt;User Led Learning&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Our &lt;code&gt;ONE_SHOT&lt;/code&gt; agent is catching up on accuracy of the &lt;code&gt;DEFAULT_AGENT&lt;/code&gt; over time due to the fast paced improvement on the models. We’re constantly evaluating the performance of both agents (on openai 4o and claude 3.5 latest). The &lt;code&gt;DEFAULT_AGENT&lt;/code&gt; is still doing better at resolving more complex tasks that require tools with lots of parameters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are many reasons you might want to run local models, and they come with upsides and downsides. Ultimately whether you use something like Qwen3 locally, or a larger model remotely comes down to your own requirements, and what suits them best&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;First, you should definitely give our &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;SDK Repo on GitHub&lt;/a&gt; a ⭐️!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you enjoyed this post, check out our other post on &lt;a href="https://blog.portialabs.ai/improve-planning-with-user-led-learning" rel="noopener noreferrer"&gt;User Led Learning&lt;/a&gt;, a Portia feature that can dramatically increase the reliability of your agent's planning.&lt;/li&gt;
&lt;li&gt;If you want to build agents that can interact with websites check out our most recent post on &lt;a href="https://dev.to/portia_ai_mark/a-unified-framework-for-browser-and-api-authentication-4a8h-temp-slug-651855"&gt;local and remote browser integration&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Join the conversation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Like this article?&lt;/strong&gt; – &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;Give us a ⭐ on GitHub&lt;/a&gt;. It really helps!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse our website and try our (modest) playground at &lt;a href="http://www.portialabs.ai/" rel="noopener noreferrer"&gt;www.portialabs.ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Head over to our docs at &lt;a href="https://docs.portialabs.ai/" rel="noopener noreferrer"&gt;docs.portialabs.ai&lt;/a&gt; or get immersed in our &lt;a href="https://github.com/portiaAI/portia-sdk-python/tree/main/portia/open_source_tools" rel="noopener noreferrer"&gt;SDK&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation on our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Watch us embarrass ourselves on our &lt;a href="https://www.youtube.com/@PortiaAI" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://www.producthunt.com/posts/portia-ai" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>portia</category>
      <category>local</category>
      <category>smallmodels</category>
      <category>qwen</category>
    </item>
    <item>
      <title>A unified framework for browser and API authentication</title>
      <dc:creator>Portia AI</dc:creator>
      <pubDate>Thu, 01 May 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/portia-ai/a-unified-framework-for-browser-and-api-authentication-1jf7</link>
      <guid>https://dev.to/portia-ai/a-unified-framework-for-browser-and-api-authentication-1jf7</guid>
      <description>&lt;p&gt;The core of the Portia authorization framework is the ability for an agent to pause itself to solicit a user's authorization for an action it wants to perform. With delegated OAuth, we do this by creating an OAuth link that the user clicks on to grant Portia a token that can be used for the API requests made by the agent. We generally like API based agents for reliability reasons – they're fast, predictable and the rise of MCP means integration is getting easier.&lt;/p&gt;

&lt;p&gt;However, there are some actions which are not easily accessible by API (my supermarket doesn't have a delegated OAuth flow surprisingly!), and so, there is huge power in being able to switch seamlessly between browser based and API based tasks. The question was, how to do this consistently and securely with our authorization framework.&lt;/p&gt;

&lt;p&gt;With OAuth, authorization is done via a token. The protocol for obtaining it has been solidified over many years, but fundamentally if you have access to the token, you have access to the API. With browser based auth, the authentication is fairly baked into the browser itself using cookies or local storage. Then you layer on bot protections – 20 years of sophistication has been built into detecting nefarious bots, which is how an agent looks to a website irrespective of whether the agent is actually doing something useful.&lt;/p&gt;

&lt;p&gt;Luckily, the age of agents means various players are rethinking this, and we found a browser infrastructure provider called BrowserBase. Their product allows the creation of browser sessions with a lot of the components that we needed to make this work. Combining &lt;a href="https://www.browserbase.com/" rel="noopener noreferrer"&gt;&lt;code&gt;BrowserBase&lt;/code&gt;&lt;/a&gt; with &lt;a href="https://browser-use.com/" rel="noopener noreferrer"&gt;&lt;code&gt;BrowserUse&lt;/code&gt;&lt;/a&gt; for goal orientated tasks and the Portia framework for authorization means we can offer a very similar paradigm for our developers as with OAuth based tools.&lt;/p&gt;

&lt;p&gt;The below shows a quick video of an agent we built using a combination of API and browser based tasks to accomplish LinkedIn outreach.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/hSq8Ww-hagg"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool creation
&lt;/h2&gt;

&lt;p&gt;With Portia, users can add Browser based capabilities to their applications with one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;PortiaToolRegistry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;BrowserToolForUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.linkedin.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This indicates to the planner that it can navigate to the LinkedIn website to achieve the overall user goal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Browser agent
&lt;/h2&gt;

&lt;p&gt;When execution gets to a step requiring the browser, we create a session, either locally or remotely (using BrowserBase). The browser agent then navigates the website to achieve the step task.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljtak8va4mj7e1b6sz2n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljtak8va4mj7e1b6sz2n.png" alt="a screenshot of the browser, running in debug mode, with various elements of the page highlighted in different colours"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Authentication
&lt;/h2&gt;

&lt;p&gt;The browser agent is instructed such that if it encounters authentication, it should return from its task. We then produce an ActionClarification which contains a link for the user to click on to perform the authentication action. If the developer is using our end user concept for scalable auth, the end user has a unique session on BrowserBase and a unique login URL. They can then log-in and the cookies are saved remotely using BrowserBase's secure cookie store.&lt;/p&gt;

&lt;h2&gt;
  
  
  API or Browser Agents: Who wins?
&lt;/h2&gt;

&lt;p&gt;In the rapidly evolving world of agents, we frequently get asked as to how to think about Browser vs API based agents. Our general rule of thumb is 'use API based tools when available', but here's a quick comparison between the two:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Browser agents&lt;/th&gt;
&lt;th&gt;API agents&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Much slower – can require 10-100x the LLM calls vs API based agents&lt;/td&gt;
&lt;td&gt;Faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Much more expensive&lt;/td&gt;
&lt;td&gt;Cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reliability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Less predictable in terms of task completion, but more likely to succeed on retry. Possible to get blocked by bot protections&lt;/td&gt;
&lt;td&gt;Predictable. Requires more investment to create the tools that can be used in agentic systems.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Types of tasks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Exploratory research tasks. Small well defined tasks between API based tasks&lt;/td&gt;
&lt;td&gt;Larger tasks, particularly those involving data processing and linking multiple systems together. Use whenever available&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What about those bot protections?
&lt;/h2&gt;

&lt;p&gt;When we first started working on this feature, I assumed that we would nearly always be blocked by bot protections. Thankfully, this turned out not to be the case frequently - on many websites, if you prove that there is genuinely a human in the loop at the point of authentication, your agent can proceed, though often the 2FA or multi factor checks that the human needs to do are harder than if you are browsing the web as a human. Some websites still have more fundamental infrastructure blocks but the approach that many seem to be taking of authorizing as long as it's genuinely on behalf of a human feels balanced and appropriate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Join the conversation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Like this article?&lt;/strong&gt; – &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;Give us a ⭐ on GitHub&lt;/a&gt;. It really helps!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse our website and try our (modest) playground at &lt;a href="http://www.portialabs.ai/" rel="noopener noreferrer"&gt;www.portialabs.ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Head over to our docs at &lt;a href="https://docs.portialabs.ai/" rel="noopener noreferrer"&gt;docs.portialabs.ai&lt;/a&gt; or get immersed in our &lt;a href="https://github.com/portiaAI/portia-sdk-python/tree/main/portia/open_source_tools" rel="noopener noreferrer"&gt;SDK&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation on our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Watch us embarrass ourselves on our &lt;a href="https://www.youtube.com/@PortiaAI" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://www.producthunt.com/posts/portia-ai" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>portia</category>
      <category>authentication</category>
      <category>agents</category>
      <category>browseragents</category>
    </item>
    <item>
      <title>A deep dive into our “User Led Learning” feature</title>
      <dc:creator>Portia AI</dc:creator>
      <pubDate>Thu, 17 Apr 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/portia-ai/a-deep-dive-into-our-user-led-learning-feature-bj6</link>
      <guid>https://dev.to/portia-ai/a-deep-dive-into-our-user-led-learning-feature-bj6</guid>
      <description>&lt;p&gt;At Portia, we believe building agents for production means &lt;strong&gt;balancing AI autonomy with human control&lt;/strong&gt; – something we call the ‘spectrum of autonomy’. We have previously seen how clarifications can be used during plan runs to handle the human:agent interface. With our new &lt;em&gt;User Led Learning&lt;/em&gt; feature, we’re bringing this level of feedback into the planning process as well.&lt;/p&gt;

&lt;p&gt;Developers now have a powerful way to shape the Planning agent’s behavior—without rewriting prompts or tweaking models. When you generate a plan using the Portia AI SDK, that plan can be stored in the Portia cloud where it can be highlighted as a preferred plan with a simple thumbs-up. Each “like” tells the Portia planning agent, &lt;em&gt;this was a good plan for this type of user intent&lt;/em&gt;—and over time, those signals help planning agents make better decisions on their own. It’s a subtle but powerful shift along the spectrum of autonomy: agents become more capable and self-directed, while still staying grounded in what users actually want.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvsrzjgpvs8j80gplby1e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvsrzjgpvs8j80gplby1e.png" alt="A diagram showing an overview of Portia AI's planning and execution stages." width="800" height="528"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Portia AI’s overview&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What does User-Led Learning solve?
&lt;/h2&gt;

&lt;p&gt;We see three areas where the tension between AI autonomy and the predictability users want is at its peak:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Conversational user experiences:&lt;/strong&gt; When you interact with an agent, your instructions are usually written in natural language. That’s great for usability, but tough for precision. Human language is full of &lt;strong&gt;implications&lt;/strong&gt;. You might say “send a message,” but really mean “send a WhatsApp message. Or you might expect the agent to summarize the message afterward – even if you never said that out loud. These gaps between what’s said and what’s &lt;em&gt;meant&lt;/em&gt; are where agents can go off-course.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context specific workflows:&lt;/strong&gt; A business might want to automate a complex set of steps that are specific to their data collection pipelines for example. This isn’t something an LLM can reason about reliably based on their pre-training data. For example, every business may have their own workflows for completing KYC / KYB processes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;App specific tool chaining complexity:&lt;/strong&gt; Some apps require chaining several tools together but the tool descriptions and arguments are not sufficient to ensure the LLM chains them in the right sequence with high reliability across production-scale volumes of agentic workflows. For example, sending an email may require an id which first must be retrieved by mapping to your email address.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F996dh0pcjah97yjrbvnn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F996dh0pcjah97yjrbvnn.png" alt="A meme captured from the movie Anchorman, with the caption 'LLMs ... 60% of the time, they work every time.'" width="671" height="372"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;They work every time, sometimes.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  So how does it work?
&lt;/h2&gt;

&lt;p&gt;By surfacing the plans you like, you give the LLM guidance about your preferences so it can bias towards them when it encounters user prompts with similar intent. At a high level this process involves the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You can “Like” plans saved to Portia Cloud from the dashboard to signal that they are your “ground truth”.&lt;/li&gt;
&lt;li&gt;You can then pull “Liked” plans based on semantic similarity to the user intent in a query by using our freshly minted portia.storage.get_similar_plans method.&lt;/li&gt;
&lt;li&gt;Finally you can ingest those similar plans as example plans in the Planning agent using the portia.plan method’s example_plans property.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let’s take a scenario that we’ve written about before – &lt;a href="https://blog.portialabs.ai/portia-mcp-stripe-example" rel="noopener noreferrer"&gt;building a refund agent&lt;/a&gt; to process refund requests. This usually results in a relatively complex plan - usually around nine steps broadly covering the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Load the refund policy and customer request from file&lt;/li&gt;
&lt;li&gt;Use LLM smarts to assess the request&lt;/li&gt;
&lt;li&gt;Request human approval&lt;/li&gt;
&lt;li&gt;Process the refund through 3 Stripe interactions – find the customer ID, find the relevant payment for that customer, create the refund.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Without user-led learning: Improving reliability through painstaking prompt engineering
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Getting set up
&lt;/h3&gt;

&lt;p&gt;The code blocks in this post are available for you in our &lt;a href="https://github.com/portiaAI/portia-agent-examples/tree/main/improving-planning-with-ull" rel="noopener noreferrer"&gt;examples repository on GitHub&lt;/a&gt;. Make sure you have followed the steps in the README to get setup correctly, including minting a Stripe test API key and installing project dependencies.&lt;/p&gt;

&lt;p&gt;With the kind of multi-step plan we’re looking at here, the quality of your prompt is very important. For example, take the vague_prompt in the code snippet below (accessible in our examples repository on GitHub in the file &lt;a href="https://github.com/portiaAI/portia-agent-examples/blob/main/improving-planning-with-ull/01_ull_vague_prompt_no_examples.py" rel="noopener noreferrer"&gt;01_ull_vague_prompt_no_examples.py&lt;/a&gt;). . We found that such a relatively vague prompt resulted in the correct order of steps only 82% of the time. The LLM would sometimes get mixed up in the ordering of Stripe interactions or omit one of them e.g. it would skip loading payment intents for the Customer. At times it would even skip the critical step of requesting human approval!&lt;/p&gt;
&lt;h3&gt;
  
  
  01_ull_vague_prompt_no_examples.py
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;common&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;init_portia&lt;/span&gt;

&lt;span class="c1"&gt;# Define the prompts for testing
&lt;/span&gt;&lt;span class="n"&gt;vague_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Read the refund request email from the customer and decide if it should be approved or rejected.
If you think the refund request should be approved, check with a human for final approval and then process the refund.

To process the refund, you&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ll need to find the customer in Stripe and then find their payment intent.

The refund policy can be found in the file: ./refund_policy.txt

The refund request email can be found in &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inbox.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; file
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="c1"&gt;# This is function initializes Portia and all the tools.
# You can find it in common.py
&lt;/span&gt;&lt;span class="n"&gt;portia_instance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;init_portia&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Generate a plan and print it out.
# 18% of the time, steps will be in the wrong order!
&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;portia_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vague_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pretty_print&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;We can improve the results of this by spending more time on the prompt, being more prescriptive about what we want to be done and when. Here’s a better prompt we had arrived at after some prompt engineering and eval running, which you can run a few times from &lt;a href="https://github.com/portiaAI/portia-agent-examples/blob/main/improving-planning-with-ull/02_ull_good_prompt_no_examples.py" rel="noopener noreferrer"&gt;02_ull_good_prompt_no_examples.py&lt;/a&gt; in our examples repository on GitHub:&lt;/p&gt;

&lt;p&gt;A more prescriptive prompt&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read the refund request email from the customer and decide if it should be approved or rejected.
If you think the refund request should be approved, check with a human for final approval and then process the refund.

Stripe instructions -- To create a refund in Stripe, you need to:
* Find the Customer using their email address from the List of Customers in Stripe.
* Find the Payment Intent ID using the Customer from the previous step, from the List of Payment Intents in Stripe.
* Create a refund against the Payment Intent ID.

The refund policy can be found in the file: ./refund_policy.txt

The refund request email can be found in "inbox.txt" file.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above prompt has better results because it breaks down the steps and lists them in order. This &lt;em&gt;significantly&lt;/em&gt; increases reliability – in this case when we tested it, the generated plans were correct 94% of the time – that’s a 12% improvement (percentage points that is 🧐)! But this has two issues. Firstly, there’s still a 6% error rate – not terrible, but not perfect, and secondly, it’s very prescriptive. Instead of giving the agent the autonomy to do the planning for us, we’re pretty much having to program the plan ourselves.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enter user-led learning: hone in on good plans and let Portia do the rest
&lt;/h3&gt;

&lt;p&gt;With user-led learning, the first thing we did was to optimise how the Planning agent uses example plans. Secondly, we wanted to be able to capture reinforcing signals from end users as they continuously run plans in production, so that the example plans fed back to the Planning agent are reflective of the latest workflows in a particular context. So we introduced the ability to “like” plans in order to signal that those are preferred outcomes. Then you have the ability for the Planning agent to pull the most semantically relevant “liked” plans to use as example plans.&lt;/p&gt;

&lt;p&gt;To bring this to life, let’s first simulate the process of a satisfactory &lt;code&gt;Plan&lt;/code&gt; being created and run several times. In a real world scenario, you would be using Portia Cloud (i.e. have the PORTIA_API_KEY env variable set, such that the default storage class for all your configs is CLOUD). All plans and plan runs generated would be automatically saved to the cloud and accessible in the dashboard. So outside of this exercise you would just visit the dashboard and “like” your favourite plans as they emerge! For now we’re going to create a plan using the &lt;a href="https://docs.portialabs.ai/SDK/portia/plan#planbuilder-objects" rel="noopener noreferrer"&gt;&lt;code&gt;PlanBuilder&lt;/code&gt;&lt;/a&gt; and save it to Portia Cloud so we can then like them from the dashboard. Here’s the code (accessible in &lt;a href="https://github.com/portiaAI/portia-agent-examples/blob/main/improving-planning-with-ull/04_ull_create_example_plans.py" rel="noopener noreferrer"&gt;04_ull_create_example_plans.py&lt;/a&gt; in our examples repository on GitHub). Notice the subtle prompt (and therefore plan!) differences we're introducing.&lt;/p&gt;

&lt;h3&gt;
  
  
  04_ull_create_example_plans.py
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;portia.plan&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PlanBuilder&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;common&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;init_portia&lt;/span&gt;

&lt;span class="c1"&gt;# Create example plans for refund processing
&lt;/span&gt;&lt;span class="n"&gt;example_plans&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="c1"&gt;# Example 1: Create refund given user email
&lt;/span&gt;&lt;span class="n"&gt;plan1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;PlanBuilder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Create a refund for a customer with email john.doe@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find the customer in Stripe by email john.doe@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp:stripe:list_customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$customer_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract customer ID from response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extract_customer_id_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$customer_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$customer_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find payment intents for the customer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp:stripe:list_payment_intents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$payment_intents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$customer_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract payment intent ID from response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extract_payment_intent_id_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$payment_intent_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$payment_intents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Create the refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp:stripe:create_refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$refund_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$payment_intent_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;example_plans&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plan1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# In the sample code, we add two more example plans here.
&lt;/span&gt;
&lt;span class="n"&gt;portia&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;init_portia&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;example_plans&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;portia&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Plans saved in Portia cloud storage.

Now you should go to the Portia dashboard and &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;like&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; them.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once these plans have been saved, you can go to the &lt;a href="https://app.portialabs.ai/dashboard/plans" rel="noopener noreferrer"&gt;Plans page&lt;/a&gt; on the Portia dashboard and click on the thumbs up next to the three plans you just created. (They’ll be on the last page of the list of plans.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frsvlh3cl4bnb48rzibs9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frsvlh3cl4bnb48rzibs9.png" alt="A screenshot of the Plans page in the Portia dashboard, showing three plans with green filled thumbs-up icons, showing that they've been approved." width="800" height="100"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Approving plans in the Portia dashboard&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The next step is to use the &lt;code&gt;portia.storage.get_similar_plans&lt;/code&gt; method to match a user prompt to the preferred plans. Given the semantic similarity between the vague prompt we introduced in this post and the prompts that were used to create our three preferred plans, we expect the &lt;code&gt;get_similar_plans&lt;/code&gt; method will retrieve all three of them. Note that this method allows you to play with the similarity threshold and to limit the number of similar plans you want to retrieve as well. To see how this comes together, make sure you have liked the plans created above in the dashboard then run the code below (accessible in &lt;a href="https://github.com/portiaAI/portia-agent-examples/blob/main/improving-planning-with-ull/05_ull_vague_with_examples.py" rel="noopener noreferrer"&gt;05_ull_vague_with_examples.py&lt;/a&gt; in our examples repository on GitHub).&lt;/p&gt;

&lt;h3&gt;
  
  
  05_ull_vague_with_examples.py
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;common&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;init_portia&lt;/span&gt;

&lt;span class="c1"&gt;# Define the prompts for testing
&lt;/span&gt;&lt;span class="n"&gt;vague_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Read the refund request email from the customer and decide if it should be approved or rejected.
If you think the refund request should be approved, check with a human for final approval and then process the refund.

To process the refund, you&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ll need to find the customer in Stripe and then find their payment intent.

The refund policy can be found in the file: ./refund_policy.txt

The refund request email can be found in &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inbox.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; file
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;portia_instance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;init_portia&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;example_plans&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;portia_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_similar_plans&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vague_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;example_plans&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No example plans were found in Portia storage. Did you remember to create and &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;like&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; the plans from the previous step?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;example_plans&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; similar plans were found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;portia_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;vague_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;example_plans&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;example_plans&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pretty_print&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To give you a sense of the reliability gains you can achieve with user-led learning, check out the chart below. Notice that we were able to increase plan reliability from 82% to 98% with just a single example plan and that two were enough to achieve 100% reliability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sbtwjjssma6ja56l53k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sbtwjjssma6ja56l53k.png" alt="A chart showing that zero approved plans results in 82% success rate, whereas two and three approved plans result in 100%, in a test of 50 plans generated." width="738" height="457"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Reliability improves quickly with only a few approved plans.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  In conclusion
&lt;/h2&gt;

&lt;p&gt;User-led learning allows you to bias your Planning agent towards previous plan runs you consider a success. You simply “like” plans you like (doh!) amongst those you have saved in the Portia dashboard. Portia can then fetch the most semantically similar plans to a user prompt and load those as example plans to help steer the Planning agent. Et voilà!&lt;/p&gt;

&lt;p&gt;Give this feature a try and let us know how you find it. And as always, please show us some love if you like our content by giving our SDK a star ⭐️ over on &lt;a href="https://github.com/portiaAI/portia-sdk-python/tree/main/portia/open_source_tools" rel="noopener noreferrer"&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Join the conversation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Like this article?&lt;/strong&gt; – &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;Give us a ⭐ on GitHub&lt;/a&gt;. It really helps!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse our website and try our (modest) playground at &lt;a href="http://www.portialabs.ai/" rel="noopener noreferrer"&gt;www.portialabs.ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Head over to our docs at &lt;a href="https://docs.portialabs.ai/" rel="noopener noreferrer"&gt;docs.portialabs.ai&lt;/a&gt; or get immersed in our &lt;a href="https://github.com/portiaAI/portia-sdk-python/tree/main/portia/open_source_tools" rel="noopener noreferrer"&gt;SDK&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation on our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Watch us embarrass ourselves on our &lt;a href="https://www.youtube.com/@PortiaAI" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://www.producthunt.com/posts/portia-ai" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>planning</category>
      <category>agents</category>
    </item>
    <item>
      <title>More features for your production agent … and a fundraising announcement</title>
      <dc:creator>Portia AI</dc:creator>
      <pubDate>Wed, 16 Apr 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/portia-ai/more-features-for-your-production-agent-and-a-fundraising-announcement-184f</link>
      <guid>https://dev.to/portia-ai/more-features-for-your-production-agent-and-a-fundraising-announcement-184f</guid>
      <description>&lt;p&gt;&lt;a href="https://blog.portialabs.ai/we-are-live" rel="noopener noreferrer"&gt;We came out of stealth a few weeks ago&lt;/a&gt;. Since then we’ve been working with our first few design partners on developing their production agents and have been heads down building out our SDK to solve their problems. &lt;strong&gt;To equip us with enough runway to grow, we’ve also been lucky enough to raise £4.4 million from some of the best investors we could ever hope for: &lt;a href="https://www.generalcatalyst.com/" rel="noopener noreferrer"&gt;General Catalyst&lt;/a&gt; (lead), &lt;a href="https://www.firstminute.capital/" rel="noopener noreferrer"&gt;First Minute Capital&lt;/a&gt;, &lt;a href="https://stemai.vc/" rel="noopener noreferrer"&gt;Stem AI&lt;/a&gt; and some outstanding angel investors 🚀&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this post we want to give you a sense of what’s coming over the next couple of months.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0j5xv3ldy5kz9e7wd5lm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0j5xv3ldy5kz9e7wd5lm.png" alt="A diagram showing an overview of Portia AI's planning and execution stages." width="800" height="528"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Portia AI’s overview&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  If you're new here
&lt;/h2&gt;

&lt;p&gt;Portia AI is an open source SDK with a cloud component, focused on making it easy for developers to build agents in production. Our three pillars are predictability, controllability and authentication. What does this even mean and why does it matter?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Predictability: AI agents are attractive because they leverage LLM reasoning to offer a degree of autonomous decision making. Yet many pilot projects have fallen short of moving to production because they lack preemptive visibility into agents behavior. With our Planning agent, developers and / or their end users can pre-express and iterate on the intended course of action of the LLM (&lt;a href="https://docs.portialabs.ai/generate-plan" rel="noopener noreferrer"&gt;Plan&lt;/a&gt;) before execution begins.&lt;/li&gt;
&lt;li&gt;Controllability: Many companies we spoke to are concerned that existing options do not offer the ability to monitor agents’ progress or intervene when needed. These limitations are especially critical in regulated industries such as Financial Services, or in end-user-facing applications where compliance, auditability, and trust are non-negotiable. Portia’s Execution agents update the plan run state (&lt;a href="https://docs.portialabs.ai/run-plan" rel="noopener noreferrer"&gt;&lt;code&gt;PlanRunState&lt;/code&gt;&lt;/a&gt;) as they go and are able to pause execution to solicit human input in a structured interaction called a &lt;a href="https://docs.portialabs.ai/understand-clarifications" rel="noopener noreferrer"&gt;&lt;code&gt;Clarification&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Authentication: Users want to securely authenticate agents into their applications and confine them to a specific scope. We offer a cloud hosted catalogue of &lt;a href="https://docs.portialabs.ai/run-portia-tools" rel="noopener noreferrer"&gt;tools with built-in authentication&lt;/a&gt;. Get the full story from &lt;a href="https://blog.portialabs.ai/we-are-live" rel="noopener noreferrer"&gt;our recent blog post&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Teach our Planning agent new things 🧠&lt;a href="https://blog.portialabs.ai/funding-announcement-april-2025#teach-our-planning-agent-new-things-" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Our early adopters already love that they can supply &lt;strong&gt;example plans&lt;/strong&gt; to the Planning agent. This is a tried and tested approach known as “few-shot prompting”. This week, we released a feature that allows developers to simply “like” plans in the Portia dashboard and then we do the rest. Our Planning agent will retrieve the most relevant “liked” plans from Portia Cloud based on the user prompt and load those in as guidance for the agent. We found that the Planning agent is able to reliably adapt complex plans from previously completed tasks to a new task, even when the user only provides a high level request (e.g. “Retrieve additional data to complete this supplier’s KYB missing information”). We found that Portia was able to produce an 8-step and even a 16-step data collection plan with 100% reliability using user-led learning. Previously this would have been impossible to produce those without extensive prompt engineering.&lt;/p&gt;

&lt;p&gt;We will be sharing a step-by-step cookbook shortly if you’re curious to get hands-on with this feature. Make sure you’re signed up to our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; or &lt;a href="https://www.linkedin.com/company/portiaai/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; channel so you don’t miss it.&lt;/p&gt;

&lt;p&gt;Our current design partners have also shared that they may want access to different variants of our Planning agent, with a live A/B testing ability to select the best suited plan for the objective at hand. For example, the Planning agent underpinning some of their more generalised agent use cases should be optimised to handle very large tool sets (~500 tools) while another should be optimised for planning large, multi-step data processing tasks. With a smaller local model deployment, we have been able to reduce tool selection errors by 50% using this approach and are open to trialling this with more partners. Do give us a shout with this &lt;a href="https://tally.so/r/wolZQ5" rel="noopener noreferrer"&gt;contact form&lt;/a&gt; to learn more.&lt;/p&gt;

&lt;h2&gt;
  
  
  We’re even more showtime ready 🕺🏼&lt;a href="https://blog.portialabs.ai/funding-announcement-april-2025#were-even-more-showtime-ready-" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;We’re making it simpler than ever to deploy the Portia SDK for complex tasks in any production environment. We now support all the commonly used models including OpenAI, Anthropic, Mistral, Gemini, Azure OpenAI and Bedrock. You can also wire up your own LLM instance into Portia AI so you can use your preferred local model in your own private deployment environment, such as Llama or DeepSeek. In Q2 we are introducing the ability to handle very large inputs (e.g. large pdf files, books etc) as well as a context-aware approach to handling the mess of API paginations. Stay tuned for more on this (&lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;, &lt;a href="https://www.linkedin.com/company/portiaai/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Elegant auth UX for web agents 🔜&lt;a href="https://blog.portialabs.ai/funding-announcement-april-2025#elegant-auth-ux-for-web-agents-" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Using our clarification construct to handle human&amp;lt;!-- --&amp;gt;:agent&amp;lt;!-- --&amp;gt; interfaces, we are releasing a headless browser agent that can handle seamless handovers to humans during a session whenever a login is needed, before resuming its task. With our solution, end users will be able to enter their login details directly into the website within the browser session: they will never be compelled to share them with an intermediary party. Be the first to get your hands on this one! (&lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;, &lt;a href="https://www.linkedin.com/company/portiaai/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;If you’re looking to build agents in production that behave reliably and can be steered, please give us a whirl and share your thoughts. And if you need a white glove partner to help you deploy them, get in touch with us using &lt;a href="https://tally.so/r/wolZQ5" rel="noopener noreferrer"&gt;this form&lt;/a&gt;. &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;Our SDK&lt;/a&gt; is available (give us a star ⭐ ️), you can get hands-on with some examples in our &lt;a href="https://github.com/portiaAI/portia-agent-examples" rel="noopener noreferrer"&gt;examples repo&lt;/a&gt; or check out this short code-along &lt;a href="https://youtu.be/g5qnYCmvXA8?si=LCRwjjOqh_rW9Idx" rel="noopener noreferrer"&gt;intro video&lt;/a&gt; on our YouTube channel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Join the conversation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Like this article?&lt;/strong&gt; – &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;Give us a ⭐ on GitHub&lt;/a&gt;. It really helps!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse our website and try our (modest) playground at &lt;a href="http://www.portialabs.ai/" rel="noopener noreferrer"&gt;www.portialabs.ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Head over to our docs at &lt;a href="https://docs.portialabs.ai/" rel="noopener noreferrer"&gt;docs.portialabs.ai&lt;/a&gt; or get immersed in our &lt;a href="https://github.com/portiaAI/portia-sdk-python/tree/main/portia/open_source_tools" rel="noopener noreferrer"&gt;SDK&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation on our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Watch us embarrass ourselves on our &lt;a href="https://www.youtube.com/@PortiaAI" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://www.producthunt.com/posts/portia-ai" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>portia</category>
      <category>tools</category>
      <category>funding</category>
      <category>agents</category>
    </item>
    <item>
      <title>Agent-Agent interfaces and Google's new A2A protocol</title>
      <dc:creator>Portia AI</dc:creator>
      <pubDate>Mon, 14 Apr 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/portia-ai/agent-agent-interfaces-and-googles-new-a2a-protocol-1ond</link>
      <guid>https://dev.to/portia-ai/agent-agent-interfaces-and-googles-new-a2a-protocol-1ond</guid>
      <description>&lt;p&gt;This week, Google &lt;a href="https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/#:~:text=The%20A2A%20protocol%20will%20allow,various%20enterprise%20platforms%20or%20applications." rel="noopener noreferrer"&gt;announced&lt;/a&gt; their new Agent-to-Agent protocol, A2A, designed to standardise how AI agents collaborate, even when run by different organisations using different underlying models. Positioned as complementary to MCP – which standardises agent access to external tools – A2A aims to standardise direct agent-agent communication. Google even declared &lt;a href="https://google.github.io/A2A/#/topics/a2a_and_mcp" rel="noopener noreferrer"&gt;A2A ♥️ MCP&lt;/a&gt;, highlighting their vision for synergy between these protocols.&lt;/p&gt;

&lt;p&gt;At Portia, we’ve been thinking about how agents interact with external systems via tools and agents for some time. You may have even read our post two weeks ago, &lt;a href="https://blog.portialabs.ai/beyond%20apis#5-agent--agent-interfaces" rel="noopener noreferrer"&gt;Software interfaces in the agent era&lt;/a&gt;. We divided the topic of agent integration with external systems into five categories based on increasing complexity, and A2A sits firmly at the top, in the Agent-Agent interface level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpotaxz220tojwxlelzb1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpotaxz220tojwxlelzb1.png" alt="A diagram showing the increasing complexity going from manual tools to agent-agent communication." width="800" height="647"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Increasing complexity of communication&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Understandably, some of the reaction to A2A has been that it isn’t clear whether it is needed (and if it is, whether it is needed yet) and &lt;a href="https://x.com/jerryjliu0/status/1910014927521341801" rel="noopener noreferrer"&gt;how it fits together with MCP&lt;/a&gt;. In particular, with tools and agents ultimately both being a way to get a task done and facing many of the same challenges (discovery, task definition, input / output definition, auth etc.), some people are questioning whether we need another protocol on top of MCP, or whether it is enough to just wrap agents in tools. We’ve been diving into A2A over the last couple of days and wanted to share our thoughts on these topics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agents vs Tools&lt;a href="https://blog.portialabs.ai/agent-agent-a2a-vs-mcp#agents-vs-tools" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Both agents and tools are mechanisms for achieving tasks, but they generally differ significantly in complexity, autonomy, and interaction patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Task Definition:&lt;/strong&gt; Tools handle narrow, clearly defined tasks; agents handle broad, higher-level, open-ended goals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomy:&lt;/strong&gt; Tools just do what they’ve been programmed to do; agents act autonomously, breaking down goals and seeking additional info if needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input:&lt;/strong&gt; Tools take structured input; agents understand natural language.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single Step vs Multi Step:&lt;/strong&gt; Tools are generally single-shot, with a call either returning outputs or an error. Conversely, agents break a task down and work through it in multiple steps. This may involve the agent proactively reaching out to collect more information for the task, or even asking the user some clarifying questions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State:&lt;/strong&gt; Tools are stateless; agents can build context over time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Length:&lt;/strong&gt; Tools generally run quickly, with most APIs returning in less than a second; agents may work over minutes, hours, or days.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr8ezd35w0pziioxbb0lg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr8ezd35w0pziioxbb0lg.png" alt="A diagram showing the increasing complexity going from manual tools to agent-agent communication." width="800" height="489"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The grey area between agent-agent and agent-tool communication&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;While this can seem a clear and natural divide, our work at Portia shows there's often a grey area between them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://browser-use.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Browser Use&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;:&lt;/strong&gt; Though agent-like in behavior (e.g., autonomous navigation via natural language), we’ve had success using browser tools in a single-turn, tool-like way to retrieve structured data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deep Research:&lt;/strong&gt; Some implementations behave like slower search tools, others like full agents asking clarifying questions. Sometimes the same implementation can display both, depending on the query.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic Tools:&lt;/strong&gt; Tools can show agent-like traits: holding state (e.g., counters), running long processes (e.g., ML training), or even handling tasks that might require an agent in more complex scenarios (e.g. document retrieval).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single vs Multi step:&lt;/strong&gt; Even the clearest distinction – single vs. multi-step interaction – isn’t absolute. Just as agents ask for additional information, tools throw errors detailing the info they need. Often the loop needed to handle both is the same.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the distinction between the categories quite blurry, it certainly adds complexity to the ecosystem if you need different protocols for the different sides of the spectrum.&lt;/p&gt;

&lt;h2&gt;
  
  
  A2A &amp;amp; MCP&lt;a href="https://blog.portialabs.ai/agent-agent-a2a-vs-mcp#a2a--mcp" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;A2A (Agent-to-Agent) is a protocol designed for enabling autonomous agents to communicate, discover each other, and collaborate on tasks. Positioned at the agentic end of the spectrum, A2A focuses on agents that can take higher-level responsibility for executing tasks, compared to MCP which is more tool-oriented.&lt;/p&gt;

&lt;p&gt;To demonstrate the difference, imagine booking a dinner using an agent. With MCP, a restaurant booking platform might expose tools such as ‘find_restaurants’ or ‘book_restaurant’. My agent must then use these tools to achieve the goal of organising dinner.&lt;/p&gt;

&lt;p&gt;Conversely, with A2A, the restaurant booking platform provides an agent with a skill for finding and booking restaurants – a concept deliberately looser than a tool. The remote agent will then take control of the full task, including tracking its state and deciding when to communicate with my local agent and when to mark the task as complete.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7dwyj8f41n4k9gbkmkq3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7dwyj8f41n4k9gbkmkq3.png" alt="Communication between an agent and tools using the MCP protocol." width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Restaurant reservations with MCP&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs6gkqn2qad7il37odckh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs6gkqn2qad7il37odckh.png" alt="Communication between two agents using A2A." width="800" height="449"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Restaurant reservations with A2A&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To dive a bit deeper, let’s take a look at the core components of A2A:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent Description&lt;/strong&gt; : A2A agents have JSON "agent cards" outlining skills, auth methods, and input/output formats. These are higher-level and less structured than MCP's task-focused tool descriptions.&amp;lt;!-- --&amp;gt;

&lt;ul&gt;
&lt;li&gt;As the skills description within A2A is deliberately more vague, it will be interesting to see how people handle defining the boundary around what a particular agent can and can’t do.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc134l3jk9gvc7opvphpz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc134l3jk9gvc7opvphpz.png" alt="JSON showing an A2A agent card." width="800" height="993"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;A2A Agent card&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent Discovery:&lt;/strong&gt; Agents can be discovered via a well-known URL (&lt;code&gt;/.well-known/agent.json&lt;/code&gt;). Registries are likely to be added, similar to MCP’s tool registries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-step Interactions&lt;/strong&gt; : A2A supports long-running tasks through multi-message exchanges, allowing agents to schedule, negotiate, and send progress updates. MCP does not yet have support for this richness of multi-message exchanges.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offline Handling&lt;/strong&gt; : As tasks are long-running and agents may not have a session open for the full duration, A2A has support for agents sending push notifications that are received later by the client (e.g. for task updates). This is not supported natively in MCP.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth:&lt;/strong&gt; A2A supports all OpenAPI auth schemes (e.g., API keys, OAuth2, JWTs), offering more flexibility than MCP’s OAuth2-only approach. However, this also means that my local agent needs to handle all of these auth schemes too.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outputs:&lt;/strong&gt; Task results are called "artifacts". These are the equivalent to MCP’s tool outputs but with the key difference that artifacts are split into parts by default.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  MCP vs A2A: Our Predictions&lt;a href="https://blog.portialabs.ai/agent-agent-a2a-vs-mcp#mcp-vs-a2a-our-predictions" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;User pov&lt;/th&gt;
&lt;th&gt;Agent::tools&lt;/th&gt;
&lt;th&gt;Agent::MCP servers&lt;/th&gt;
&lt;th&gt;Agent::Agent (A2A)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Capability discovery and selection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;🔴 Tools have to be manually added to agent. Selection is limited to LLM’s ability to cope with large tool sets.&lt;/td&gt;
&lt;td&gt;🟡 Tools are automatically discovered through MCP. Selection is still limited to LLM’s ability to cope with large tool sets.&lt;/td&gt;
&lt;td&gt;🟡 Agent capabilities are advertised through their agent card. Once registries have been added to the protocol, agents will be discovered through a registry.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ease of interface to other systems&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;🔴 Developer has to understand the other system’s API in order to manually write / select tools.&lt;/td&gt;
&lt;td&gt;🟡 Agent calls MCP tools with a single-step interaction. My agent needs to understand the external system to determine how to chain tool calls together to achieve a goal.&lt;/td&gt;
&lt;td&gt;🟢 My agent connects to a remote agent to access other systems’ capabilities. The remote agent determines how to use these capabilities to achieve my goal.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;🔴 Developer has to implement their own auth on tools for the agent to use&lt;/td&gt;
&lt;td&gt;🟡 Auth (based on OAuth2) has recently been released, though is yet to be widely adopted&lt;/td&gt;
&lt;td&gt;🟡 Launches with all auth schemes supported by OpenAPI, though this means my agent will need to support whichever auth scheme is supported by the remote agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Task control &amp;amp; completion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;🟢 My agent has full control over how the task is performed, including deciding when it is complete. All output from the other system is accessible to my agent via tools, and my agent determines what information is retained during and after runtime&lt;/td&gt;
&lt;td&gt;🟢 As with simple tool usage, my agent has full control over the task, its completion, the output of tools and how information is retained.&lt;/td&gt;
&lt;td&gt;🟡 My agent relies on controlling the remote agent through negotiation and relies on the information sharing and retention decisions of the remote agent. It also relies on the remote agent to decide when a task is deemed complete or when further input is needed.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A summary of the various communication schemes between agents and tools. Traffic light symbols give an assessment of how well each technology solves the user problem.&lt;/p&gt;

&lt;p&gt;As discussed in our previous blog post (&lt;a href="https://blog.portialabs.ai/beyond%20apis" rel="noopener noreferrer"&gt;Beyond APIs&lt;/a&gt;), we believe agent-to-agent communication will eventually become widespread. This communication will be interactive, multi-turn and goal-oriented, rather than utilising single-shot, transactional, rigid APIs and tools that are common now. At Portia, we’ve built our &lt;a href="https://docs.portialabs.ai/understand-clarifications" rel="noopener noreferrer"&gt;clarifications architecture&lt;/a&gt; to handle this and it’s exciting to see the ecosystem progressing in this direction.&lt;/p&gt;

&lt;p&gt;However, we do not foresee A2A getting the same rapid adoption of MCP. MCP addressed a clear, mainstream problem: enabling agents to interact with APIs. Agent builders wanted to move beyond simple chat or RAG systems without building custom tools for every API, while API providers wanted to support agents without adapting to every agent framework. MCP elegantly solved this MxN problem by allowing providers to repackage their existing APIs and documentation into an MCP server easily.&lt;/p&gt;

&lt;p&gt;In contrast, agent-to-agent communication hasn’t yet become mainstream and is significantly more complex. Deploying an agent in front of an API introduces challenges like managing ambiguous multi-turn requests, maintaining state, handling offline clients, and gracefully resolving cascading errors. Additionally, with the distinction between tools and agents not clear cut, it adds complexity to the ecosystem to have different protocols for both.&lt;/p&gt;

&lt;p&gt;Therefore, in the short-term we expect companies to continue to focus on MCP and we expect to see a growing usage of agents within tools. We then expect to see MCP evolve to handle these ‘agents in tools’ use-cases more natively and elegantly, leading to MCP covering all of the 'tool-agent’ spectrum.&lt;/p&gt;

&lt;h2&gt;
  
  
  Join the conversation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Like this article?&lt;/strong&gt; – &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;Give us a ⭐ on GitHub&lt;/a&gt;. It really helps!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse our website and try our (modest) playground at &lt;a href="http://www.portialabs.ai/" rel="noopener noreferrer"&gt;www.portialabs.ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Head over to our docs at &lt;a href="https://docs.portialabs.ai/" rel="noopener noreferrer"&gt;docs.portialabs.ai&lt;/a&gt; or get immersed in our &lt;a href="https://github.com/portiaAI/portia-sdk-python/tree/main/portia/open_source_tools" rel="noopener noreferrer"&gt;SDK&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation on our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Watch us embarrass ourselves on our &lt;a href="https://www.youtube.com/@PortiaAI" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://www.producthunt.com/posts/portia-ai" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>portia</category>
      <category>mcp</category>
      <category>api</category>
      <category>tools</category>
    </item>
    <item>
      <title>Build a refund agent with Portia AI and Stripe's MCP server</title>
      <dc:creator>Portia AI</dc:creator>
      <pubDate>Thu, 20 Mar 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/portia-ai/build-a-refund-agent-with-portia-ai-and-stripes-mcp-server-19f3</link>
      <guid>https://dev.to/portia-ai/build-a-refund-agent-with-portia-ai-and-stripes-mcp-server-19f3</guid>
      <description>&lt;p&gt;Anthropic open sourced its &lt;a href="https://www.anthropic.com/news/model-context-protocol" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt;, or MCP for short, at the end of last year. The protocol is picking up steam as the go-to way to standardise the interface between agent frameworks and apps / data sources, with the &lt;a href="https://github.com/modelcontextprotocol/servers?tab=readme-ov-file#%EF%B8%8F-official-integrations" rel="noopener noreferrer"&gt;list of official MCP server implementations&lt;/a&gt; growing rapidly. Our early users have already asked for an easy way to expose tools from an MCP server to a Portia client so we just released support for MCP servers in our SDK ⭐️.&lt;/p&gt;

&lt;p&gt;In this blog post we show how you can combine the power of Portia AI’s abstractions with any tool set from an MCP server to create unique agent workflows. The example we go over is accessible in our agent examples repository &lt;a href="https://github.com/portiaAI/portia-agent-examples/tree/main/refund-agent-mcp" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connect to MCP servers with the Portia SDK&lt;a href="https://blog.portialabs.ai/portia-mcp-stripe-example#connect-to-any-mcp-server-using-portia-ai%E2%80%99s-sdk" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Connecting to an MCP server allows you to load all tools from that server into a &lt;code&gt;ToolRegistry&lt;/code&gt; subclass called an &lt;code&gt;McpToolRegistry&lt;/code&gt;, which you can then combine with any other tools you offer to Portia’s planning and execution agents.&lt;/p&gt;

&lt;p&gt;We allow developers to load tools from MCP servers into an &lt;code&gt;McpToolRegistry&lt;/code&gt; using the two commonly available methods today (to find out more about these options, see the &lt;a href="https://modelcontextprotocol.io/docs/concepts/transports" rel="noopener noreferrer"&gt;official MCP docs&lt;/a&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;stdio (Standard Input / Output):&lt;/strong&gt; The server kicks off as a subprocess of the python process where your Portia client is running. Portia’s SDK only requires you to provide a server name and command with args to spin up the subprocess, giving you the flexibility to integrate any server written in any language using any execution mechanism. This method is useful for local prototyping e.g. you can load a local MCP server repo, kick off a process and interact with its tools in no time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;sse (Server-Sent Events):&lt;/strong&gt; The server is accessible over HTTP. This could be a locally or remotely deployed server. We just need to specify the current server name and URL for the Portia SDK to interact with it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx40h0xoy34ltokzg2nzn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx40h0xoy34ltokzg2nzn.png" alt="Standard Input / Output" width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F360sxovsxvpp93zye5i8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F360sxovsxvpp93zye5i8.png" alt="Server-Sent Events" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In our Stripe example, once you provide the NPX command Portia’s SDK takes over and manages everything for you:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We will spin up the &lt;a href="https://github.com/stripe/agent-toolkit/tree/main/modelcontextprotocol" rel="noopener noreferrer"&gt;Stripe agent toolkit MCP&lt;/a&gt; server locally using the NPX command args&lt;/li&gt;
&lt;li&gt;Our built-in MCP client, which uses the official &lt;a href="https://github.com/modelcontextprotocol/python-sdk" rel="noopener noreferrer"&gt;MCP python SDK&lt;/a&gt; under the hood, will query the Stripe MCP server to understand what tools it provides, and make these available to your Planner and Execution Agents.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We can then extract the tools and automatically convert them to Portia &lt;a href="https://docs.portialabs.ai/intro-to-tools" rel="noopener noreferrer"&gt;&lt;code&gt;Tool&lt;/code&gt;&lt;/a&gt; objects using a stdio connection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;stripe_mcp_registry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;McpToolRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_stdio_connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;server_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stripe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@stripe/mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--tools=all&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--api-key=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;STRIPE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Use a clarification to loop in a human&lt;a href="https://blog.portialabs.ai/portia-mcp-stripe-example#use-a-clarification-to-loop-in-a-human" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Most use cases we’ve seen out there don't leverage the refund tool from Stripe MCP server because it is a high risk use case for an agent to act on. With Portia’s clarifications, we can ensure the agent pauses the plan run and solicits human approval before a plan run happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Clarifications: A brief recap
&lt;/h2&gt;

&lt;p&gt;During agentic workflows, there may be tasks where your organisation's policies require explicit approvals from specific people e.g. allowing bank transfers over a certain amount. Clarifications allow you to define these conditions so the agent running a particular step knows when to pause the plan run and solicit input in line with your policies. When Portia encounters a clarification and pauses a plan run, it serialises and saves the latest plan run state. Once the clarification is resolved, the obtained human input captured during clarification handling is added to the plan run state and the agent can resume step execution.&lt;br&gt;&lt;br&gt;
For more on clarifications, visit our &lt;a href="https://docs.portialabs.ai/understand-clarifications" rel="noopener noreferrer"&gt;docs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For our refund example, we want the following to happen:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Load a refund policy document and check the transaction details against it.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reject the request if not within the policy and fail the plan run&lt;/li&gt;
&lt;li&gt;Else make a recommendation to approve, along with rationale&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;→ The &lt;code&gt;RefundReviewerTool&lt;/code&gt; offers this functionality&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Given the context for agent’s refund approval recommendation,&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Solicit human approval and&lt;/li&gt;
&lt;li&gt;Reject the request and fail the plan run if the human did not approve it&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;→ The &lt;code&gt;RefundHumanApprovalTool&lt;/code&gt; offers this functionality&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the plan run passes the previous step successfully, create a refund using the appropriate tool loaded from Stripe’s MCP server.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Bringing it all together&lt;a href="https://blog.portialabs.ai/portia-mcp-stripe-example#bringing-it-all-together" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;With Portia AI, you don’t need to create individual agents and point them explicitly at each step in the above process or at the required tools. Our planning agent will do exactly that for you. All you need to do is to prompt it using the Plan method (the more detailed the prompt, the more reliable it will be) and share a superset of tools with it. In the &lt;code&gt;refund_agent.py&lt;/code&gt; code, we demonstrate the power of our planning agent by passing our &lt;code&gt;Portia&lt;/code&gt; client all the tools from the Stripe MCP tool registry, the two local tools described above (&lt;code&gt;RefundReviewerTool&lt;/code&gt; and &lt;code&gt;RefundHumanApprovalTool&lt;/code&gt;) and Portia’s catalogue of cloud tools (&lt;code&gt;DefaultToolRegistry&lt;/code&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;portia&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Portia&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;stripe_mcp_registry&lt;/span&gt;
        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;InMemoryToolRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_local_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="nc"&gt;RefundReviewerTool&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="nc"&gt;RefundHumanApprovalTool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nc"&gt;DefaultRegistry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;portia&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Read the refund request email from the customer and decide if it should be approved or rejected.
    If you think the refund request should be approved, check with a human for final approval and then process the refund.

    Stripe instructions:
    * Customers can be found in Stripe using their email address.
    * The payment can be found against the Customer.
    * Refunds can be processed by creating a refund against the payment.

    The refund policy can be found in the file: ./refund_policy.txt

    The refund request email is as follows:

    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer_email&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here you have the option of implementing an end user feedback loop to refine your plan before running it. We demonstrate this with the scheduling agent example &lt;a href="https://github.com/portiaAI/portia-agent-examples/blob/main/get_started_google_tools/README.md" rel="noopener noreferrer"&gt;here&lt;/a&gt;. You also have the option of providing example plans to the Portia planning agent for added reliability e.g. if you want this refund process to always follow the same set of steps. The above code will produce a plan in line with the instructions we outlined in the prompt and include the relevant tools automatically. Below is an abridged version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Read the refund policy from the file to understand the conditions for a valid refund."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"tool_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"file_reader_tool"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Review the refund request email against the refund policy to decide if the refund should be approved or rejected."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"tool_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"refund_reviewer_tool"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Request human approval for processing the refund if the review indicates approval."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"tool_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"human_approval_tool"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"If the refund is approved by the human reviewer, locate the customer in Stripe using their email address."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"tool_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp:stripe:list_customers"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Retrieve the payment intent associated with the found customer to identify the payment to refund."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"tool_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp:stripe:list_payment_intents"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Process the refund by creating a refund for the identified payment intent."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"tool_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp:stripe:create_refund"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And finally running this plan will allow you to test how it all comes together. Here’s a snazzy snappify animation of the PlanRunState across all steps of the plan run. Note how a clarification is raised and then approved by a human before the refund creation agent executes the final step.&lt;/p&gt;

&lt;h3&gt;
  
  
  On our roadmap: Supporting conditionals
&lt;/h3&gt;

&lt;p&gt;Offering conditionals means you won’t need to hardcode the if / else logic within the &lt;code&gt;RefundHumanApprovalTool&lt;/code&gt; definition: The Portia planning agent will be able to add conditions against the create refund step and insert a separate step to fail the plan conditioned on the human rejecting the refund. We already built this and are in the process of tuning it in staging before it’s ready for show time. Watch this space 👀!&lt;/p&gt;

&lt;h2&gt;
  
  
  Our reflections on working with MCP servers&lt;a href="https://blog.portialabs.ai/portia-mcp-stripe-example#our-reflections-on-working-with-mcp-servers" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Here's what we learned experimenting with MCP servers so far:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most MCP servers are making use of the tool primitives, but not the prompts or resources which are also supported in the MCP spec. It is not clear what a best in class implementation of those would be.&lt;/li&gt;
&lt;li&gt;The power of a standardised protocol is real! In our Stripe refund agent example we seamlessly integrate Stripe tools provided by their &lt;em&gt;Javascript&lt;/em&gt; MCP server into a Python Portia Agent.&lt;/li&gt;
&lt;li&gt;Provided the app owner who publishes an MCP server is maintaining it, MCP servers can be a powerful and pain-free way of discovering and loading tools into your AI app.&lt;/li&gt;
&lt;li&gt;The limitation is that you are beholden to the MCP server owner’s tool definition, and those can vary in quality (e.g. tool and / or args description does not offer enough guidance for an LLM to invoke the tool reliably at scale). This can be a particular problem with community provided servers, so make sure you check out the quality of the tool definitions.&lt;/li&gt;
&lt;li&gt;The MCP specification does not include output schema for tools, and many MCP tool descriptions do not describe exactly what the tool returns. This can create challenges for the Agent using the tool.&lt;/li&gt;
&lt;li&gt;We need an MCP discovery service that allows an LLM to discover MCP servers from an app owner and load the details to connect to them (an MCP DNS server if you’re into acronym salads 🥗). The MCP folks have a &lt;a href="https://github.com/modelcontextprotocol/specification/discussions/69" rel="noopener noreferrer"&gt;registry concept&lt;/a&gt; in the works to address this issue.&lt;/li&gt;
&lt;li&gt;Tool auth is a challenge for many people looking to deploy Agents in the real world - we’re &lt;a href="https://blog.portialabs.ai/agent-auth-part-I" rel="noopener noreferrer"&gt;well aware of that&lt;/a&gt;. MCP servers today are generally run locally with credentials such as API keys provided at start-up. Again, the MCP specification is moving quickly and a draft for Auth support has been &lt;a href="https://spec.modelcontextprotocol.io/specification/draft/basic/authorization/" rel="noopener noreferrer"&gt;published&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Join the conversation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Like this article?&lt;/strong&gt; – &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;Give us a ⭐ on GitHub&lt;/a&gt;. It really helps!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse our website and try our (modest) playground at &lt;a href="http://www.portialabs.ai/" rel="noopener noreferrer"&gt;www.portialabs.ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Head over to our docs at &lt;a href="https://docs.portialabs.ai/" rel="noopener noreferrer"&gt;docs.portialabs.ai&lt;/a&gt; or get immersed in our &lt;a href="https://github.com/portiaAI/portia-sdk-python/tree/main/portia/open_source_tools" rel="noopener noreferrer"&gt;SDK&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation on our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Watch us embarrass ourselves on our &lt;a href="https://www.youtube.com/@PortiaAI" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://www.producthunt.com/posts/portia-ai" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>portia</category>
      <category>mcp</category>
      <category>stripe</category>
    </item>
    <item>
      <title>Seamless human agent interactions with just-in-time authorization</title>
      <dc:creator>Portia AI</dc:creator>
      <pubDate>Thu, 13 Mar 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/portia-ai/seamless-human-agent-interactions-with-just-in-time-authorization-4cbi</link>
      <guid>https://dev.to/portia-ai/seamless-human-agent-interactions-with-just-in-time-authorization-4cbi</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/portia-ai/why-authentication-is-a-challenge-for-ai-agents-88n"&gt;part 1 of this series&lt;/a&gt;, we established why there is a need for a &lt;em&gt;Just-In-Time (JIT) authorization system&lt;/em&gt;, whereby an agent has the ability to authorize itself only at the point where it is very likely that they will 1/ need that authorization and 2/ that they are clear what they will use it for. In this section, we’ll look at how we have done this at Portia AI.&lt;/p&gt;

&lt;p&gt;A tenet of agentic systems is that they are designed to operate autonomously but JIT auth requires an interruption of the agentic system so that it can solicit human input.&lt;/p&gt;

&lt;p&gt;In reality, we think it’s becoming increasingly obvious that seamless hand-off back and forth between agents and humans to collaborate on a task needs to be a well supported expectation and yet most agentic frameworks make this hard work to do. In Portia, we refer to these agent-to-human requests as ‘clarifications’.&lt;/p&gt;

&lt;p&gt;If you’ve written an agent before, you’ve probably experienced an agent death loop – where the agent gets itself stuck and continually retries until you cancel the operation (or it hits its maximum retries).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk319rhjrgbin11ejyxqy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk319rhjrgbin11ejyxqy.png" alt="Traditional agent architecture with reflection" width="662" height="903"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For Just-In-Time auth, we want to accomplish a few things. Firstly, many agentic systems perceive a task as incomplete if they encounter a requirement for the end user to complete authentication. The agents then end up attempting retries – or rather, by trying to get the user to authenticate, they instead make the agent enter a death spiral. Sigh. We’ll refer to this problem as the ‘human-agent &lt;em&gt;short circuit&lt;/em&gt;’ problem.&lt;/p&gt;

&lt;p&gt;The second issue arises if your agentic system supports authorization within the flow of an agent, as you would need the end user to perform the actual authentication and take action, most typically by clicking a link. This then kicks off a somewhat complicated handshake to retrieve the authorization token and the agent needs to be made aware and resume its task from where it was. We’ll refer to this as the ‘human-agent &lt;em&gt;hand-off&lt;/em&gt;’ problem.&lt;/p&gt;

&lt;p&gt;This third problem is &lt;em&gt;almost&lt;/em&gt; trivial in the grand scheme of things. OAuth links are kinda long and ugly, but most agentic frameworks expect to hand things back to users in natural language. This means that a user would be presented with something rather incomprehensible like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Click the link to authenticate: &lt;a href="https://accounts.google.com/o/oauth2/v2/auth?redirect%5C_uri=https%3A%2F%2Fapi.portialabs.ai%2Fapi%2Fv0%2Foauth%2Fgoogle%2F&amp;amp;client%5C_id=1062040369470-6hqq9140gs1451mvb3fon3md1ekhnlns.apps.googleusercontent.com&amp;amp;scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.modify&amp;amp;state=APP%5C_NAME%3Dgoogle%253A%253Agmail%26WORKFLOW%5C_ID%3Dwkfl-87a960b7-f750-414b-8d5b-72c2c203c5fc%26END%5C_USER%5C_ID%3Dportia%253A%253A2%26ORG%5C_ID%3Dc31d809a-c6f3-48e2-9cf0-2cf079ead258%26CLARIFICATION%5C_ID%3Dclar-894c4a62-a092-4501-8501-174a9d78c7e5%26SCOPES%3D%2Bhttps%253A%252F%252Fwww.googleapis.com%252Fauth%252Fgmail.modify&amp;amp;access%5C_type=offline&amp;amp;response%5C_type=code&amp;amp;prompt=consent" rel="noopener noreferrer"&gt;https://accounts.google.com/o/oauth2/v2/auth?redirect\_uri=https%3A%2F%2Fapi.portialabs.ai%2Fapi%2Fv0%2Foauth%2Fgoogle%2F&amp;amp;client\_id=1062040369470-6hqq9140gs1451mvb3fon3md1ekhnlns.apps.googleusercontent.com&amp;amp;scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.modify&amp;amp;state=APP\_NAME%3Dgoogle%253A%253Agmail%26WORKFLOW\_ID%3Dwkfl-87a960b7-f750-414b-8d5b-72c2c203c5fc%26END\_USER\_ID%3Dportia%253A%253A2%26ORG\_ID%3Dc31d809a-c6f3-48e2-9cf0-2cf079ead258%26CLARIFICATION\_ID%3Dclar-894c4a62-a092-4501-8501-174a9d78c7e5%26SCOPES%3D%2Bhttps%253A%252F%252Fwww.googleapis.com%252Fauth%252Fgmail.modify&amp;amp;access\_type=offline&amp;amp;response\_type=code&amp;amp;prompt=consent&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It’s ugly and if the end user makes a mistake in copying that link, it won’t work! We’ll refer to this as the ‘human-agent &lt;em&gt;presentation&lt;/em&gt;’ problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making human-agent interaction a first class citizen for agentic AI&lt;a href="https://blog.portialabs.ai/agent-auth-part-II#making-human-agent-interaction-a-first-class-citizen-for-agentic-ai" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;These were 3 of the initial problems that we wanted to tackle with Portia AI. The first problem is solvable as long as it’s a fundamental part of the agentic system such that agent introspection comes after pre-inspection of a task’s output. Then, in the event that an agent-to-human clarification is raised, it can be returned immediately to the end user rather than the agent trapping itself in an endless death loop of retries. Most agentic systems make the assumption that human-in-the-loop actions should come after the agent has made its best attempt at completing its task (shown above). To handle this in Portia, it's fundamental that any tool call can return either a clarification or the output from the tool, and if a clarification is returned, it will be handed back to the developer to present to the end user.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fswgax10mp5qd3kzla8vm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fswgax10mp5qd3kzla8vm.png" alt="Agent architecture with short circuit" width="646" height="884"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We use this as a critical part of our auth system, but it’s useful more broadly as it creates an extremely flexible system that developers can use to hand off seamlessly between human control and agent control. For example, if a tool returns too many results, and the user needs to select the right one to proceed, developers can return a &lt;code&gt;multiple choice clarification&lt;/code&gt;, or in the future, even trigger this behaviour automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scaling to 1000s of users&lt;a href="https://blog.portialabs.ai/agent-auth-part-II#scaling-to-1000s-of-users" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The second issue, the ‘human-agent &lt;em&gt;hand-off&lt;/em&gt;’ problem, requires a set of events to be synchronized back and forth between human and agent (e.g. “auth needs to be completed”, “auth has completed and the agent can resume”, etc). It also requires the in-flight agent state to be saved so that it can be resumed after the end user has completed the authentication – this is relatively easy to do if you make the assumption that you have only one end-user or that they will immediately authenticate, but we wanted to create a production ready system that could be scaled up to 1,000s of end users, and we wanted end users to be able to respond in their own time to their agents, so they can get on with their day-to-day lives. So the Portia framework handles this for developers and we support the concept of end-users as a primitive in our framework so tasks, tool calls and authentication sessions can be attributed to individuals across your organisation or production use case.&lt;/p&gt;

&lt;h3&gt;
  
  
  Making it look good&lt;a href="https://blog.portialabs.ai/agent-auth-part-II#making-it-look-good" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The third issue, the ‘human-agent &lt;em&gt;presentation&lt;/em&gt;’ problem, is fairly easy to layer on to the previous concepts. Clarifications in Portia are structured, which means they can be easily rendered in different elegant UI formats to the end-user. Rather than an ugly link, the developer can easily render a button that hides the complexity from the end user. You can even configure the guidance you want to attach to your clarification:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Click the link to authenticate: &lt;a href="https://accounts.google.com/o/oauth2/v2/auth?redirect%5C_uri=https%3A%2F%2Fapi.portialabs.ai%2Fapi%2Fv0%2Foauth%2Fgoogle%2F&amp;amp;client%5C_id=1062040369470-6hqq9140gs1451mvb3fon3md1ekhnlns.apps.googleusercontent.com&amp;amp;scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.modify&amp;amp;state=APP%5C_NAME%3Dgoogle%253A%253Agmail%26WORKFLOW%5C_ID%3Dwkfl-87a960b7-f750-414b-8d5b-72c2c203c5fc%26END%5C_USER%5C_ID%3Dportia%253A%253A2%26ORG%5C_ID%3Dc31d809a-c6f3-48e2-9cf0-2cf079ead258%26CLARIFICATION%5C_ID%3Dclar-894c4a62-a092-4501-8501-174a9d78c7e5%26SCOPES%3D%2Bhttps%253A%252F%252Fwww.googleapis.com%252Fauth%252Fgmail.modify&amp;amp;access%5C_type=offline&amp;amp;response%5C_type=code&amp;amp;prompt=consent" rel="noopener noreferrer"&gt;https://accounts.google.com/o/oauth2/v2/auth?redirect\_uri=https%3A%2F%2Fapi.portialabs.ai%2Fapi%2Fv0%2Foauth%2Fgoogle%2F&amp;amp;client\_id=1062040369470-6hqq9140gs1451mvb3fon3md1ekhnlns.apps.googleusercontent.com&amp;amp;scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fgmail.modify&amp;amp;state=APP\_NAME%3Dgoogle%253A%253Agmail%26WORKFLOW\_ID%3Dwkfl-87a960b7-f750-414b-8d5b-72c2c203c5fc%26END\_USER\_ID%3Dportia%253A%253A2%26ORG\_ID%3Dc31d809a-c6f3-48e2-9cf0-2cf079ead258%26CLARIFICATION\_ID%3Dclar-894c4a62-a092-4501-8501-174a9d78c7e5%26SCOPES%3D%2Bhttps%253A%252F%252Fwww.googleapis.com%252Fauth%252Fgmail.modify&amp;amp;access\_type=offline&amp;amp;response\_type=code&amp;amp;prompt=consent&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;becomes (with minimal developer effort!):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/data%3Aimage%2Fpng%3Bbase64%2CiVBORw0KGgoAAAANSUhEUgAAAYUAAABHCAYAAADodFtDAAAABGdBTUEAALGPC%2FxhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAAUGVYSWZNTQAqAAAACAACARIAAwAAAAEAAQAAh2kABAAAAAEAAAAmAAAAAAADoAEAAwAAAAEAAQAAoAIABAAAAAEAAAGFoAMABAAAAAEAAABHAAAAABqaRM8AAAIxaVRYdFhNTDpjb20uYWRvYmUueG1wAAAAAAA8eDp4bXBtZXRhIHhtbG5zOng9ImFkb2JlOm5zOm1ldGEvIiB4OnhtcHRrPSJYTVAgQ29yZSA2LjAuMCI%2BCiAgIDxyZGY6UkRGIHhtbG5zOnJkZj0iaHR0cDovL3d3dy53My5vcmcvMTk5OS8wMi8yMi1yZGYtc3ludGF4LW5zIyI%2BCiAgICAgIDxyZGY6RGVzY3JpcHRpb24gcmRmOmFib3V0PSIiCiAgICAgICAgICAgIHhtbG5zOmV4aWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20vZXhpZi8xLjAvIgogICAgICAgICAgICB4bWxuczp0aWZmPSJodHRwOi8vbnMuYWRvYmUuY29tL3RpZmYvMS4wLyI%2BCiAgICAgICAgIDxleGlmOlBpeGVsWURpbWVuc2lvbj43MTwvZXhpZjpQaXhlbFlEaW1lbnNpb24%2BCiAgICAgICAgIDxleGlmOlBpeGVsWERpbWVuc2lvbj4zODk8L2V4aWY6UGl4ZWxYRGltZW5zaW9uPgogICAgICAgICA8ZXhpZjpDb2xvclNwYWNlPjE8L2V4aWY6Q29sb3JTcGFjZT4KICAgICAgICAgPHRpZmY6T3JpZW50YXRpb24%2BMTwvdGlmZjpPcmllbnRhdGlvbj4KICAgICAgPC9yZGY6RGVzY3JpcHRpb24%2BCiAgIDwvcmRmOlJERj4KPC94OnhtcG1ldGE%2BCr1uU5kAAA%2BJSURBVHgB7Z0JdE3XGsc%2FlRgjIrzSyZgYYqhW%2B3ilI03EFF5Rr0oUQYS2kaIERVRNRc3UPJTEkKREzKpVNZRqkCDmYak2VKNqSrzub%2Bs%2B9%2BQOyXLvzc1N7n%2BvlZx99j57%2Bp27vv%2FZw9mn0P%2BFIzgQAAEQAAEQEAQeAwUQAAEQAAEQUAQgCooEjiAAAiAAAugp4DcAAiAAAiBgIICegoEFfCAAAiDg8gQgCi7%2FEwAAEAABEDAQgCgYWMAHAiAAAi5PAKLg8j8BAAABEAABAwGIgoEFfCAAAiDg8gQgCi7%2FEwAAEAABEDAQgCgYWMAHAiAAAi5PAKLg8j8BAAABEAABAwE3gzd7H7ZIyp4PYkEABEAgrwgUKlTIbkXnKApKDPio%2FuxWOjICARAAARCwmgCLgV4Q9H5rM81WFJQIPHjwgKbPnE%2Fx6zdRyvFUun%2F%2FvrXlIR0IgAAIgIAdCLi7u1Otmr4U1Lo59Q%2FrSY899nA2wFZhKCQMv9mtszmYxeD0mXPUKzSCko4k26EZyAIEQAAEQMDeBOrV9aN5sz%2BnalUrS3GwRRjMioLqIWRmZtKbgR0gCPa%2Bg8gPBEAABOxMgIVha%2BJqKly4sBxSslYYLK4%2B4l7CNDFkhB6Cne8csgMBEACBXCDAtpptNttuW5yJKOh7CV%2BLOQQ4EAABEACB%2FEGAbTaP8Cg7bk2tTUSBM1HzCcdPnLImT6QBARAAARDIAwJss7mnwDbcWmdRFDhTrDKyFivSgQAIgIDjCbDNtqWXwDU2KwocYYvScHo4EAABEAABxxOw1XZbFAXHNwUlggAIgAAI5DUBi6Jgq9rkdcNQPgiAAAi4IgFbbbdFUXBFmGgzCIAACLg6AYiCq%2F8C0H4QAAEQ0BGAKOhguIrXza0w1ajuQ8WLFzNp8quvNKa3O7bTwj8e9CG92ew17Tw7j3Ha7K511riSJUtQn17vkY9PVYdWsV7d2jT20%2BFUtGhRh5aLwkDAmABEwZhIAT4vX%2F5xWrRgBh078gPt3P41nUw5QKujF1HlSs9orW4jN9cKkefubm7Upk0gNWr0ohafnUefNrvrVFyXdztSw383UKd5cjSug6dnKRoxfCDVqV0r1%2Brj7V2GenR%2FlypVNHCvK7YoaN0ygLy9vexSrnG77JIpMnEJAhAFl7jNRBUqlKfYNUvJp1oV6hHyPtX0a0iBrTrK1m%2FetFYYozImJO5nZNBLTQIoasxEkzh7BEQOiaDA5s3skZXVeeRFHSpUeJyiRg2lGjV8tHqv%2BGo11a3fhK5cuaqF2eLJi3bZUl%2BkdR4CEAXnuRe5WpNOYkjo6aefouDuYbR7915Kv3mTjh5Nob5hH9G1a79TgP8bZsuPX7ecwvr21OKC2rSgNTGL6fixfRQn4po0aaTFGXs6vf1f%2BmbHenr%2BuXpZoliAvvsmgTw8SsqhKvY%2FI%2BrGrnRpT5o0YTTt%2B2Er7d61kcaMjqRixUyHuVSGXMb2rXGy15OwPppef62JiqIvpnxGE8aN1M7Zo9qTXR34Oh5amzt7CiWLXtX6%2BJXET%2FLK8UZjAz7sK8s9sHe7LIPbohzXg4egPhkxiH46uIuWL51Lr7z8koxu1vRV2Vvjk3FjR1Ds2mUyvE3rQMmkRIni8pzLiBgQRts2x9KRw7tpycKZ9NRTT8g4%2Ftfg%2BWcl%2F%2BPJ%2B2jXzg3Ut093GZddu%2BrUqUXz531BR3%2F%2BXvYQmwc01fKDBwQUAYiCIlHAj7Vr16Rz587TGbEVut79lnZN9gZWrlqrD9b8latUonLlvOU5G7QZ08ZTWtp1GhU1gW7d%2BotWLJtHNWv4atcrT9M3XqHxwiDPmbuIDv2UpILl8a%2B%2FbtPM2Qvo7t179HPSUem%2FceMP4uGqhfOnU1NRzoKFy2lVTCy1f6uNLNPcjo%2BvvdqEJk8aQ3v27Kehw6Lo1p%2B3aPGiWVT%2B8X%2FJctiIPvmkwZByoGqPpTrIhOLfwIh%2BdObsOZq%2FYJkc5lkwb5qKogHhfal%2FvxBaF7uBpk6bI4ToZZo1Y5L2sZMqlStS39Ae5FmqFM2Z83B4btaMiXJL49TUMxQTEyfz2pi4jRYvWSn9Xl6eVE304tSe%2BEM%2FDqcP3%2B9D23fsoslTZlElkWfc2uXE80EsnDGrFkn%2BQyOjaO%2FeH2lY5EfUsoU%2FWWoXDx0uWzKHPD096ZNR4%2BjChUtC9CbTiy8%2Br7ULHhBgAtl%2BZAeICg4Bv1o16MRJ2%2Fay6tP7PWHEj1FoWIR8452N4ucTo8jXtyodP5Gqwar%2FbB35lD127GRaFb1OC1eeO3fuyPCRIwbTcfHRJnUNzy%2F8R8xfdAlmY%2FitvDxNiBYb%2FsrCKJ49e15lIY9cZrOAdpScfEKeH%2F75KH0rnpobNKhPGxO3ZrnW%2BMRSHTxKPXzi543Fxk%2F4QibjDcYGD%2FpADsGlpaVRP9FzmjR5Js2cNV%2FGX7h4iaK%2FWkBVxV72p0%2BflWFct4iBw6WfxWWJEKsXRL32HzhEiZu3yV7At9%2FtoS1bd8pr9P9YHHuFBNPcL5fQZ%2BOnyqjv9%2ByTdahapbIcYmr3Vhd5P%2B%2FevUuxcQnk7%2F%2B6ZJewcYtZtjyHUUR8lKVb975SONauWy%2BHErt3e4cOiDrBgYAiAFFQJAr48cYf6VSqlIdNreTJ16XLVmlboLBB6vf%2BoCx5smGMXrWQrl79VRi1xVnicjrhIRp%2B8WbPD%2Fu1S7kXwK5uHT8TUfjll6vkJZ6aQ3p2pYrPPE1VRK%2BGnX4oRwZY8e%2FosRQtVdKRY9JftmwZKlPGS64QertDW6r6T3klSz4UEuajROHIUcNHqZKEkLKrJCb0WRRyctXFXAN%2FVYuH%2BZQ7mXpazgWpc%2BbbNqgF1RRf3vIqXVr2jkp6lFDRJkfuEXDdeS5DOQ67fPmKOsURBCQBDB%2B5yA%2FhmDBy1cUyVHOOjUVOhpSHNdzEEyxPPufk0v%2B4KZ%2FseQjpURzPHfAOjxn3DWXcvXdPZlHczLxCu7ataMumdcRj4%2FfERmCpp04%2FSnHZXpuRkanFZ2Ya9qdXY%2F7nzl%2BkK0KU%2BO%2FU6TNymOnSpcuGNDpOmQ8MeWkXZOMpWqSIjM3IMP%2FZW15OvGNbvByi4qd%2FHgrKyRUr9nCpq6ozH3lobIkQeTgQ0BNAT0FPowD7eWjl3c4dqUP7IFq9Jl5rqTIwg4eMouUrYrRwYw8b65SUE%2FRc%2FbpaFI%2FzB3fpJL7bfZL27T8ow%2FkJtlFjfzmROXXyWGrm346u%2FvqblsbY417EXQs6Ij4Swl%2BN4vkPri%2B7evVqy2OS7slbBoh%2FwV070d59P1LHTu%2FJID%2B%2FGtQ7pJuKlnMMvjohZIPuLQTQ2OnrYBxnfJ6cfFwKV%2BKmbaTmYZgDL2Xl8fxHcUV0bdenS045KffEr1%2B%2FHu3%2Bfp%2BMKlvWmzq8FUQ8PBQkeghFhHAEBLYnHgZjwe7Zo4s%2BC%2BnXt%2BvgwcOyN8XzE%2BojLDyZ7lYYJsAEnIsHoKfgIj%2BAaDFpy2PYn0YNk6uJ%2BGUpXknEK2N49VFs3IYcScSsiSOe3A3%2FIJR4JUvkkAHyhSt9Qp585jH4sP4DpbGaPm2CNnmqv479yUJkXhf58coc7iUcPHRYTIZfEKuPoqjxSw3lSqKoUUPE1%2F%2BOUWqq6XzItWvX5SQ3D4Nwe0aPHJKliGQhVvwORr%2BwEClmE8aNyhJvrg4mFxgF3L59hzYkbKEhg8PlS308OTxm9FBKEiuEnnyigtHV5k%2B5jWzMO7Rvm0Vk1dUctyFhM33QvzfxqiReacQCGx4eSteuXxf36zrxS3atWvrLl%2BzGREVKUVLp%2BWjMdr3Ir1w5b7niiRcG8H3cvSuRRn4yWJ8MfhCwvHU22BQsAmyoe4cOEE%2Bee6Wx2bRxtVgxM1FMWv5CLVp3lCtZcmrxsuUx9PmUmRQc%2FD%2FakriWOr%2FTgSKHf6r1EvTpeb19eEQkNWncUBjlnvoozc8rk27fvk0rV3wpVglVoD%2FF6qHOXXv%2FEzZfLt28ePEydQ0OJf1wjspgnJiETU9Pl8tMo1cuIJ481bvZsxcSPyHzSp742K%2BIh9Cu%2F35Df4lcHaWvQ5ZICyfhEUNlD4VXHPFy2uYBzaiPYHv%2BwkULKbIGc49i2vR5VNuvplxtlTX24Vl4xDDasfM7OZHPS2J9fKpIDpyWVy9xHC%2B53bE1nn4XbUo9dSZLNsZsuRcW1n8QNWr4ghx6Wr50Dv0o2PBKJDgQ0BMoJCb2snyih0%2FZgPDTSsVqWK6mh1VQ%2FLys0dfXhy6KVTNsiK1xPJzBT6z2cLzaxniugod6eCyfJ7Nzcjwnkp5%2BU%2F5uzV3LSzh5%2BSv%2Fpi05c3WwdK0KZ45eXqXlEl0V9qjHnMrlMjw8PIiX7Bo7ngdisXzUdjEv7tHd%2B2e%2BxjhfnOdvAhdOH5I9bx6KNbeUO6fWQRRyIoR4EAABEMhHBGwVBcwp5KObjaqCAAiAQG4TgCjkNmHkDwIgAAL5iABEIR%2FdLFQVBEAABHKbAEQhtwkjfxAAARDIRwQsioI1s9b5qN2oKgiAAAgUSAK22m6LosC03N3xtmOB%2FNWgUSAAAgWSgD1stkVRYLWp7lutQIJDo0AABECgIBJgm50rPQXOlP9atXizIHJDm0AABECgQBJgm63st7UNNPvyGr%2FVfF%2FsOslvSrYM6iz2UTlpbf5IBwIgAAIg4AACfrWqU0L8Cvk2M2%2B9bq04mAwfqYz4FWnOePrUscSFwYEACIAACDgnAbbRbKvZZqvtLdiWW%2BNMegqcCfcU1B5IvD8K9xhmz1tMiZt20KlTZ032qbGmYKQBARAAARCwngDvm8UbJQY2f4NCe3WTPQTeUj1XRIGryaLA%2B67z5ngZ4oMhPJzERz5X%2B7Fb3xykBAEQAAEQsIUAf0eDBYA%2FfsU9BD7yOYdb20vg%2BpjtKaiKqh6DEgc%2BqjC%2Bhv1wIAACIAACjiOgDL4a6lfioMRAxVtbo2xFgTNVhl%2BJgTq3tkCkAwEQAAEQsA8BJQxKCNTRltxzfDtNX4jeb0uhSAsCIAACIGA%2FAva0zTmKgqq2PQtVeeIIAiAAAiDgXARMlqQ6V%2FVQGxAAARAAAUcSgCg4kjbKAgEQAAEnJwBRcPIbhOqBAAiAgCMJQBQcSRtlgQAIgICTE4AoOPkNQvVAAARAwJEEIAqOpI2yQAAEQMDJCUAUnPwGoXogAAIg4EgCEAVH0kZZIAACIODkBCAKTn6DUD0QAAEQcCQBiIIjaaMsEAABEHByAhAFJ79BqB4IgAAIOJIARMGRtFEWCIAACDg5AYiCk98gVA8EQAAEHEkAouBI2igLBEAABJycAETByW8QqgcCIAACjiQAUXAkbZQFAiAAAk5O4G82ZnhrxWC%2FUAAAAABJRU5ErkJggg%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/data%3Aimage%2Fpng%3Bbase64%2CiVBORw0KGgoAAAANSUhEUgAAAYUAAABHCAYAAADodFtDAAAABGdBTUEAALGPC%2FxhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAAUGVYSWZNTQAqAAAACAACARIAAwAAAAEAAQAAh2kABAAAAAEAAAAmAAAAAAADoAEAAwAAAAEAAQAAoAIABAAAAAEAAAGFoAMABAAAAAEAAABHAAAAABqaRM8AAAIxaVRYdFhNTDpjb20uYWRvYmUueG1wAAAAAAA8eDp4bXBtZXRhIHhtbG5zOng9ImFkb2JlOm5zOm1ldGEvIiB4OnhtcHRrPSJYTVAgQ29yZSA2LjAuMCI%2BCiAgIDxyZGY6UkRGIHhtbG5zOnJkZj0iaHR0cDovL3d3dy53My5vcmcvMTk5OS8wMi8yMi1yZGYtc3ludGF4LW5zIyI%2BCiAgICAgIDxyZGY6RGVzY3JpcHRpb24gcmRmOmFib3V0PSIiCiAgICAgICAgICAgIHhtbG5zOmV4aWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20vZXhpZi8xLjAvIgogICAgICAgICAgICB4bWxuczp0aWZmPSJodHRwOi8vbnMuYWRvYmUuY29tL3RpZmYvMS4wLyI%2BCiAgICAgICAgIDxleGlmOlBpeGVsWURpbWVuc2lvbj43MTwvZXhpZjpQaXhlbFlEaW1lbnNpb24%2BCiAgICAgICAgIDxleGlmOlBpeGVsWERpbWVuc2lvbj4zODk8L2V4aWY6UGl4ZWxYRGltZW5zaW9uPgogICAgICAgICA8ZXhpZjpDb2xvclNwYWNlPjE8L2V4aWY6Q29sb3JTcGFjZT4KICAgICAgICAgPHRpZmY6T3JpZW50YXRpb24%2BMTwvdGlmZjpPcmllbnRhdGlvbj4KICAgICAgPC9yZGY6RGVzY3JpcHRpb24%2BCiAgIDwvcmRmOlJERj4KPC94OnhtcG1ldGE%2BCr1uU5kAAA%2BJSURBVHgB7Z0JdE3XGsc%2FlRgjIrzSyZgYYqhW%2B3ilI03EFF5Rr0oUQYS2kaIERVRNRc3UPJTEkKREzKpVNZRqkCDmYak2VKNqSrzub%2Bs%2B9%2BQOyXLvzc1N7n%2BvlZx99j57%2Bp27vv%2FZw9mn0P%2BFIzgQAAEQAAEQEAQeAwUQAAEQAAEQUAQgCooEjiAAAiAAAugp4DcAAiAAAiBgIICegoEFfCAAAiDg8gQgCi7%2FEwAAEAABEDAQgCgYWMAHAiAAAi5PAKLg8j8BAAABEAABAwGIgoEFfCAAAiDg8gQgCi7%2FEwAAEAABEDAQgCgYWMAHAiAAAi5PAKLg8j8BAAABEAABAwE3gzd7H7ZIyp4PYkEABEAgrwgUKlTIbkXnKApKDPio%2FuxWOjICARAAARCwmgCLgV4Q9H5rM81WFJQIPHjwgKbPnE%2Fx6zdRyvFUun%2F%2FvrXlIR0IgAAIgIAdCLi7u1Otmr4U1Lo59Q%2FrSY899nA2wFZhKCQMv9mtszmYxeD0mXPUKzSCko4k26EZyAIEQAAEQMDeBOrV9aN5sz%2BnalUrS3GwRRjMioLqIWRmZtKbgR0gCPa%2Bg8gPBEAABOxMgIVha%2BJqKly4sBxSslYYLK4%2B4l7CNDFkhB6Cne8csgMBEACBXCDAtpptNttuW5yJKOh7CV%2BLOQQ4EAABEACB%2FEGAbTaP8Cg7bk2tTUSBM1HzCcdPnLImT6QBARAAARDIAwJss7mnwDbcWmdRFDhTrDKyFivSgQAIgIDjCbDNtqWXwDU2KwocYYvScHo4EAABEAABxxOw1XZbFAXHNwUlggAIgAAI5DUBi6Jgq9rkdcNQPgiAAAi4IgFbbbdFUXBFmGgzCIAACLg6AYiCq%2F8C0H4QAAEQ0BGAKOhguIrXza0w1ajuQ8WLFzNp8quvNKa3O7bTwj8e9CG92ew17Tw7j3Ha7K511riSJUtQn17vkY9PVYdWsV7d2jT20%2BFUtGhRh5aLwkDAmABEwZhIAT4vX%2F5xWrRgBh078gPt3P41nUw5QKujF1HlSs9orW4jN9cKkefubm7Upk0gNWr0ohafnUefNrvrVFyXdztSw383UKd5cjSug6dnKRoxfCDVqV0r1%2Brj7V2GenR%2FlypVNHCvK7YoaN0ygLy9vexSrnG77JIpMnEJAhAFl7jNRBUqlKfYNUvJp1oV6hHyPtX0a0iBrTrK1m%2FetFYYozImJO5nZNBLTQIoasxEkzh7BEQOiaDA5s3skZXVeeRFHSpUeJyiRg2lGjV8tHqv%2BGo11a3fhK5cuaqF2eLJi3bZUl%2BkdR4CEAXnuRe5WpNOYkjo6aefouDuYbR7915Kv3mTjh5Nob5hH9G1a79TgP8bZsuPX7ecwvr21OKC2rSgNTGL6fixfRQn4po0aaTFGXs6vf1f%2BmbHenr%2BuXpZoliAvvsmgTw8SsqhKvY%2FI%2BrGrnRpT5o0YTTt%2B2Er7d61kcaMjqRixUyHuVSGXMb2rXGy15OwPppef62JiqIvpnxGE8aN1M7Zo9qTXR34Oh5amzt7CiWLXtX6%2BJXET%2FLK8UZjAz7sK8s9sHe7LIPbohzXg4egPhkxiH46uIuWL51Lr7z8koxu1vRV2Vvjk3FjR1Ds2mUyvE3rQMmkRIni8pzLiBgQRts2x9KRw7tpycKZ9NRTT8g4%2Ftfg%2BWcl%2F%2BPJ%2B2jXzg3Ut093GZddu%2BrUqUXz531BR3%2F%2BXvYQmwc01fKDBwQUAYiCIlHAj7Vr16Rz587TGbEVut79lnZN9gZWrlqrD9b8latUonLlvOU5G7QZ08ZTWtp1GhU1gW7d%2BotWLJtHNWv4atcrT9M3XqHxwiDPmbuIDv2UpILl8a%2B%2FbtPM2Qvo7t179HPSUem%2FceMP4uGqhfOnU1NRzoKFy2lVTCy1f6uNLNPcjo%2BvvdqEJk8aQ3v27Kehw6Lo1p%2B3aPGiWVT%2B8X%2FJctiIPvmkwZByoGqPpTrIhOLfwIh%2BdObsOZq%2FYJkc5lkwb5qKogHhfal%2FvxBaF7uBpk6bI4ToZZo1Y5L2sZMqlStS39Ae5FmqFM2Z83B4btaMiXJL49TUMxQTEyfz2pi4jRYvWSn9Xl6eVE304tSe%2BEM%2FDqcP3%2B9D23fsoslTZlElkWfc2uXE80EsnDGrFkn%2BQyOjaO%2FeH2lY5EfUsoU%2FWWoXDx0uWzKHPD096ZNR4%2BjChUtC9CbTiy8%2Br7ULHhBgAtl%2BZAeICg4Bv1o16MRJ2%2Fay6tP7PWHEj1FoWIR8452N4ucTo8jXtyodP5Gqwar%2FbB35lD127GRaFb1OC1eeO3fuyPCRIwbTcfHRJnUNzy%2F8R8xfdAlmY%2FitvDxNiBYb%2FsrCKJ49e15lIY9cZrOAdpScfEKeH%2F75KH0rnpobNKhPGxO3ZrnW%2BMRSHTxKPXzi543Fxk%2F4QibjDcYGD%2FpADsGlpaVRP9FzmjR5Js2cNV%2FGX7h4iaK%2FWkBVxV72p0%2BflWFct4iBw6WfxWWJEKsXRL32HzhEiZu3yV7At9%2FtoS1bd8pr9P9YHHuFBNPcL5fQZ%2BOnyqjv9%2ByTdahapbIcYmr3Vhd5P%2B%2FevUuxcQnk7%2F%2B6ZJewcYtZtjyHUUR8lKVb975SONauWy%2BHErt3e4cOiDrBgYAiAFFQJAr48cYf6VSqlIdNreTJ16XLVmlboLBB6vf%2BoCx5smGMXrWQrl79VRi1xVnicjrhIRp%2B8WbPD%2Fu1S7kXwK5uHT8TUfjll6vkJZ6aQ3p2pYrPPE1VRK%2BGnX4oRwZY8e%2FosRQtVdKRY9JftmwZKlPGS64QertDW6r6T3klSz4UEuajROHIUcNHqZKEkLKrJCb0WRRyctXFXAN%2FVYuH%2BZQ7mXpazgWpc%2BbbNqgF1RRf3vIqXVr2jkp6lFDRJkfuEXDdeS5DOQ67fPmKOsURBCQBDB%2B5yA%2FhmDBy1cUyVHOOjUVOhpSHNdzEEyxPPufk0v%2B4KZ%2FseQjpURzPHfAOjxn3DWXcvXdPZlHczLxCu7ataMumdcRj4%2FfERmCpp04%2FSnHZXpuRkanFZ2Ya9qdXY%2F7nzl%2BkK0KU%2BO%2FU6TNymOnSpcuGNDpOmQ8MeWkXZOMpWqSIjM3IMP%2FZW15OvGNbvByi4qd%2FHgrKyRUr9nCpq6ozH3lobIkQeTgQ0BNAT0FPowD7eWjl3c4dqUP7IFq9Jl5rqTIwg4eMouUrYrRwYw8b65SUE%2FRc%2FbpaFI%2FzB3fpJL7bfZL27T8ow%2FkJtlFjfzmROXXyWGrm346u%2FvqblsbY417EXQs6Ij4Swl%2BN4vkPri%2B7evVqy2OS7slbBoh%2FwV070d59P1LHTu%2FJID%2B%2FGtQ7pJuKlnMMvjohZIPuLQTQ2OnrYBxnfJ6cfFwKV%2BKmbaTmYZgDL2Xl8fxHcUV0bdenS045KffEr1%2B%2FHu3%2Bfp%2BMKlvWmzq8FUQ8PBQkeghFhHAEBLYnHgZjwe7Zo4s%2BC%2BnXt%2BvgwcOyN8XzE%2BojLDyZ7lYYJsAEnIsHoKfgIj%2BAaDFpy2PYn0YNk6uJ%2BGUpXknEK2N49VFs3IYcScSsiSOe3A3%2FIJR4JUvkkAHyhSt9Qp585jH4sP4DpbGaPm2CNnmqv479yUJkXhf58coc7iUcPHRYTIZfEKuPoqjxSw3lSqKoUUPE1%2F%2BOUWqq6XzItWvX5SQ3D4Nwe0aPHJKliGQhVvwORr%2BwEClmE8aNyhJvrg4mFxgF3L59hzYkbKEhg8PlS308OTxm9FBKEiuEnnyigtHV5k%2B5jWzMO7Rvm0Vk1dUctyFhM33QvzfxqiReacQCGx4eSteuXxf36zrxS3atWvrLl%2BzGREVKUVLp%2BWjMdr3Ir1w5b7niiRcG8H3cvSuRRn4yWJ8MfhCwvHU22BQsAmyoe4cOEE%2Bee6Wx2bRxtVgxM1FMWv5CLVp3lCtZcmrxsuUx9PmUmRQc%2FD%2FakriWOr%2FTgSKHf6r1EvTpeb19eEQkNWncUBjlnvoozc8rk27fvk0rV3wpVglVoD%2FF6qHOXXv%2FEzZfLt28ePEydQ0OJf1wjspgnJiETU9Pl8tMo1cuIJ481bvZsxcSPyHzSp742K%2BIh9Cu%2F35Df4lcHaWvQ5ZICyfhEUNlD4VXHPFy2uYBzaiPYHv%2BwkULKbIGc49i2vR5VNuvplxtlTX24Vl4xDDasfM7OZHPS2J9fKpIDpyWVy9xHC%2B53bE1nn4XbUo9dSZLNsZsuRcW1n8QNWr4ghx6Wr50Dv0o2PBKJDgQ0BMoJCb2snyih0%2FZgPDTSsVqWK6mh1VQ%2FLys0dfXhy6KVTNsiK1xPJzBT6z2cLzaxniugod6eCyfJ7Nzcjwnkp5%2BU%2F5uzV3LSzh5%2BSv%2Fpi05c3WwdK0KZ45eXqXlEl0V9qjHnMrlMjw8PIiX7Bo7ngdisXzUdjEv7tHd%2B2e%2BxjhfnOdvAhdOH5I9bx6KNbeUO6fWQRRyIoR4EAABEMhHBGwVBcwp5KObjaqCAAiAQG4TgCjkNmHkDwIgAAL5iABEIR%2FdLFQVBEAABHKbAEQhtwkjfxAAARDIRwQsioI1s9b5qN2oKgiAAAgUSAK22m6LosC03N3xtmOB%2FNWgUSAAAgWSgD1stkVRYLWp7lutQIJDo0AABECgIBJgm50rPQXOlP9atXizIHJDm0AABECgQBJgm63st7UNNPvyGr%2FVfF%2FsOslvSrYM6iz2UTlpbf5IBwIgAAIg4AACfrWqU0L8Cvk2M2%2B9bq04mAwfqYz4FWnOePrUscSFwYEACIAACDgnAbbRbKvZZqvtLdiWW%2BNMegqcCfcU1B5IvD8K9xhmz1tMiZt20KlTZ032qbGmYKQBARAAARCwngDvm8UbJQY2f4NCe3WTPQTeUj1XRIGryaLA%2B67z5ngZ4oMhPJzERz5X%2B7Fb3xykBAEQAAEQsIUAf0eDBYA%2FfsU9BD7yOYdb20vg%2BpjtKaiKqh6DEgc%2BqjC%2Bhv1wIAACIAACjiOgDL4a6lfioMRAxVtbo2xFgTNVhl%2BJgTq3tkCkAwEQAAEQsA8BJQxKCNTRltxzfDtNX4jeb0uhSAsCIAACIGA%2FAva0zTmKgqq2PQtVeeIIAiAAAiDgXARMlqQ6V%2FVQGxAAARAAAUcSgCg4kjbKAgEQAAEnJwBRcPIbhOqBAAiAgCMJQBQcSRtlgQAIgICTE4AoOPkNQvVAAARAwJEEIAqOpI2yQAAEQMDJCUAUnPwGoXogAAIg4EgCEAVH0kZZIAACIODkBCAKTn6DUD0QAAEQcCQBiIIjaaMsEAABEHByAhAFJ79BqB4IgAAIOJIARMGRtFEWCIAACDg5AYiCk98gVA8EQAAEHEkAouBI2igLBEAABJycAETByW8QqgcCIAACjiQAUXAkbZQFAiAAAk5O4G82ZnhrxWC%2FUAAAAABJRU5ErkJggg%3D%3D" alt="Structured button" width="389" height="71"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we were designing Portia, we started with authentication, but quickly realized that the things that made it hard to do this were more general than just authentication and much more about the fundamentals of human-agent interaction. We look forward to hearing your thoughts and feedback on the product and our &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;open-source SDK&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Like this article?&lt;/strong&gt; – &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;Give us a ⭐ on GitHub&lt;/a&gt;. It really helps!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse our website and try our (modest) playground at &lt;a href="http://www.portialabs.ai/" rel="noopener noreferrer"&gt;www.portialabs.ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Head over to our docs at &lt;a href="https://docs.portialabs.ai/" rel="noopener noreferrer"&gt;docs.portialabs.ai&lt;/a&gt; or get immersed in our &lt;a href="https://github.com/portiaAI/portia-sdk-python/tree/main/portia/open_source_tools" rel="noopener noreferrer"&gt;SDK&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation on our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Watch us embarrass ourselves on our &lt;a href="https://www.youtube.com/@PortiaAI" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://www.producthunt.com/posts/portia-ai" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>portia</category>
      <category>authentication</category>
      <category>agents</category>
    </item>
    <item>
      <title>Why authentication is a challenge for AI agents</title>
      <dc:creator>Portia AI</dc:creator>
      <pubDate>Mon, 10 Mar 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/portia-ai/why-authentication-is-a-challenge-for-ai-agents-88n</link>
      <guid>https://dev.to/portia-ai/why-authentication-is-a-challenge-for-ai-agents-88n</guid>
      <description>&lt;p&gt;&lt;strong&gt;AI Agents&lt;/strong&gt; are a rapidly evolving technology in the AI space. The introduction of LLMs and the ability for LLMs to interact with other software autonomously has paved the way for a new wave of technological innovation. This is an exciting development but it needs appropriate guardrails to ensure that an agent is really enacting your wishes and not sending rogue emails on your behalf to your entire address book. This is the first of a 2-part series that discusses some of the challenges of appropriately authenticating and authorizing agents so they can safely fulfill requests.&lt;/p&gt;

&lt;h2&gt;
  
  
  The misaligned incentives of agents and authentication&lt;a href="https://blog.portialabs.ai/agent-auth-part-I#the-misaligned-incentives-of-agents-and-authentication" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Authentication is one of the most well-understood guardrails of the internet. These days, we take for granted that you cannot easily send an email on someone else’s behalf. In the earliest days of the internet, people wrote basic bots to brute force attack usernames and passwords. But today, through widely deployed methods like OAuth combined with captcha, 2FA, and IP checks, authentication ensures that the appropriate human is present and hard to impersonate.&lt;/p&gt;

&lt;p&gt;This presents a problem for agents. Their inherent value proposition is to act autonomously, whereas authentication has evolved precisely to ensure that the appropriate human is present for certain tasks.&lt;/p&gt;

&lt;p&gt;Most agentic systems solve this today by &lt;em&gt;pre-authenticating and authorizing&lt;/em&gt; the agent for any actions that it might want to take. The problem is that this pre-emptive access is far too broad and essentially removes one of the best safety guardrails we already have. You can replace it with a human-in-the-loop check instead, which many agentic systems have, but this is a bit like giving a burglar the keys to the castle but putting an electric fence around the perimeter – we’re essentially having to remove the authentication barrier and replace it with less sophisticated systems.&lt;/p&gt;

&lt;p&gt;Conversely, it’s also fundamentally limiting the potential of our agents – ideally, you want a system that grants the agent a minimal set of privileges so that it can achieve its tasks. &lt;em&gt;Pre-authorization&lt;/em&gt; means that, unless your end user is happy to sit and grant a bunch of authorizations to agents that they may not use, you’ll end up inadvertently limiting how many tools and systems your agent can access. Naive pre-authentication makes it hard to strike the balance between overgranting and undergranting and so ends up as both a limiting factor and also a far too permissive guardrail for agents.&lt;/p&gt;

&lt;p&gt;We’ve been trying to think about how just-in-time authorization can work for agents. How can we create an agentic system that enables agents to get the authorization they require only when it is clear that they require it?&lt;/p&gt;

&lt;h2&gt;
  
  
  An architecture for &lt;strong&gt;just-in-time&lt;/strong&gt; agent authentication and authorization&lt;a href="https://blog.portialabs.ai/agent-auth-part-I#an-architecture-for-just-in-time-agent-authentication-and-authorization" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Just-in-time&lt;/strong&gt; authorization means that an agent has the ability to authorize itself only at the point where it is very likely that they will 1/ need that authorization and 2/ that they are clear what they will use it for. If we think of agent execution as a graph of execution nodes against systems that might require authentication, there are 2 ways we can solve this problem:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  1/ Authorization at the point of execution&lt;a href="https://blog.portialabs.ai/agent-auth-part-I#1-authorization-at-the-point-of-execution" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;In this case, the agent must pause itself to retrieve the authorization it requires from the user. For this to be useful in an autonomous agent scenario, this means that any state up to that point must be saved so that it can be resumed once the authentication action has been completed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmut409rl3fg5gn7o3wz1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmut409rl3fg5gn7o3wz1.png" alt="alt text" width="800" height="566"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The advantage of this is that the agent can request exactly the authentication and authorization that it needs for the task it’s trying to execute. The disadvantage is that your agent can only proceed so far autonomously and you risk it getting stuck repeatedly every time it hits an authentication.&lt;/p&gt;

&lt;h3&gt;
  
  
  2/ Scoped authorization based on an articulated plan&lt;a href="https://blog.portialabs.ai/agent-auth-part-I#2-scoped-authorization-based-on-an-articulated-plan" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Some agentic systems these days rely on some amount of chain of thought reasoning and pre-planning. The idea with this kind of authentication is to try to pre-process the articulated plan to identify authentication requirements as early as possible or to group them together to minimize round trips back to the user. This has become a core part of how we have designed Portia AI – the plan and what the agent is attempting to do should always be clear to the human. It also allows us to scope the authorization provided to the agent accurately.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5xre4lo6rtjrlzz2br93.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5xre4lo6rtjrlzz2br93.png" alt="alt text" width="800" height="643"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, agentic systems are increasingly evolving to have adaptive planning such that the authentication requirements may change as the agent discovers more information about its goal or backtracks from a particular route. An adaptation of this is to try to do probabilistic pre-authentication based on some combination of the articulated plan or the domain that the agent is operating in. Ultimately however, this is just an optimization on top of pre-authentication described in the first section and so has the same fundamental limitations and concerns.&lt;/p&gt;

&lt;p&gt;So what is the right solution? Having the ability to authenticate at the point of execution is a fundamental building block for effective authentication and authorization. With this enabled, you can then build probabilistic optimization systems on top of it which are optimizing for the agent progressing as far as it can without human interruption, but which can recover in the case that they diverge from the probabilistic outcome.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/portia-ai/seamless-human-agent-interactions-with-just-in-time-authorization-4cbi"&gt;part 2&lt;/a&gt;, we will look at how you can build this just-in-time authentication into agentic systems and how we’ve solved it at &lt;a href="https://www.portialabs.ai/" rel="noopener noreferrer"&gt;*&lt;em&gt;Portia AI&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Like this article?&lt;/strong&gt; – &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;Give us a ⭐ on GitHub&lt;/a&gt;. It really helps!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse our website and try our (modest) playground at &lt;a href="http://www.portialabs.ai/" rel="noopener noreferrer"&gt;www.portialabs.ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Head over to our docs at &lt;a href="https://docs.portialabs.ai/" rel="noopener noreferrer"&gt;docs.portialabs.ai&lt;/a&gt; or get immersed in our &lt;a href="https://github.com/portiaAI/portia-sdk-python/tree/main/portia/open_source_tools" rel="noopener noreferrer"&gt;SDK&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation on our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Watch us embarrass ourselves on our &lt;a href="https://www.youtube.com/@PortiaAI" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://www.producthunt.com/posts/portia-ai" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>portia</category>
      <category>authentication</category>
      <category>agents</category>
    </item>
    <item>
      <title>Start building authenticated and predictable agents with Portia AI</title>
      <dc:creator>Portia AI</dc:creator>
      <pubDate>Tue, 04 Mar 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/portia-ai/start-building-authenticated-and-predictable-agents-with-portia-ai-3flh</link>
      <guid>https://dev.to/portia-ai/start-building-authenticated-and-predictable-agents-with-portia-ai-3flh</guid>
      <description>&lt;p&gt;Tired of your &lt;a href="https://www.evidentlyai.com/blog/ai-failures-examples" rel="noopener noreferrer"&gt;AI agents going off the rails&lt;/a&gt;? Well look no further 😅! We are releasing an open source developer framework that allows you to build agents that pre-express their actions, share their progress and can be interrupted by a human.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpk9bvbe1h2jhac8kr0km.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpk9bvbe1h2jhac8kr0km.jpg" alt="Expectations vs. reality" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Portia AI was born from our tinkering together with Fintech co-pilots (closer to our home turf). We both believed that AI agents represented a paradigm shift in how software interacts with users and their environment. Amongst other major changes in the past year, AI has become a primary user interaction layer as evidenced by the overwhelming focus of traditional players in the space (e.g. &lt;a href="https://www.salesforce.com/uk/news/press-releases/2024/12/17/agentforce-2-0-announcement" rel="noopener noreferrer"&gt;Salesforce’s Agentic 2.0 press release&lt;/a&gt;), the shift from “Buy” to “Build” in the SaaS space is accelerating as folks like &lt;a href="https://www.inc.com/sam-blum/klarna-plans-to-shut-down-saas-providers-and-replace-them-with-ai.html" rel="noopener noreferrer"&gt;Klarna leverage AI to automate across functions&lt;/a&gt;, and the rise of web agents and multimodal models.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem space&lt;a href="https://blog.portialabs.ai/we-are-live#the-problem-space" rel="noopener noreferrer"&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;We were inspired by the explosion of AI-powered use cases, but the challenges we encountered as we tinkered were also sobering. Through this experience and conversations with other developers we honed in on the following pain points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Planning&lt;/strong&gt; : Many use cases require visibility into the LLM’s reasoning, particularly for complex tasks requiring multiple steps and tools. LLMs also struggle picking the right tools as their tool set grows: a recurring limitation for production deployments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution&lt;/strong&gt; : Tracking an LLM’s progress mid-task is difficult, making it harder to intervene when guidance is needed. This is especially critical for enforcing company policies or correcting hallucinations (hello, missing arguments in tool calls!).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authentication&lt;/strong&gt; : Existing solutions often disrupt the user experience with cumbersome authentication flows or require pre-emptive, full access to every tool—an approach that doesn’t scale for multi-agent assistants.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Our proposed solution&lt;a href="https://blog.portialabs.ai/we-are-live#our-proposed-solution" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;While AI engineers with deep expertise have been hacking their way through these issues, we wanted to democratise the solutions for all developers with Portia AI. As a first step, we are offering an open source &lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;Github repo (↗)&lt;/a&gt;, augmented with elective cloud-hosted features to help speed up deployments, and accessible from the &lt;a href="https://app.portialabs.ai/" rel="noopener noreferrer"&gt;Portia dashboard (↗)&lt;/a&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pre-expressed plans:&lt;/strong&gt; Our open source planning agent guides your LLM to produce an explicit Plan in response to a prompt, weaving the relevant tools, inputs, and outputs for every step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stateful, controllable agents:&lt;/strong&gt; Portia will spin up a PlanRun and a series of execution agents to implement the generated plans and track the run state throughout execution. Using our Clarification abstraction you can define points where you want to take control of plan runs e.g. to resolve missing information or multiple choice decisions. Portia serialises the PlanRun state, and you can manage its storage / retrieval yourself or use our cloud offering for simplicity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensible, authenticated tool calling:&lt;/strong&gt; Bring your own tools on our extensible Tool abstraction, or use our growing plug and play authenticated tool library, which will include a number of popular SaaS providers over time (Google, Slack, Zendesk, Github etc.). All Portia tools feature just-in-time authentication with token refresh, offering security without compromising on user experience.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Intrigued?
&lt;/h3&gt;

&lt;p&gt;Give our live playground a try on &lt;a href="https://www.portialabs.ai/" rel="noopener noreferrer"&gt;our website&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;It’s early days–for us and for the ecosystem at large. Everything from LLM reasoning, to authentication and APIs in the age of AI agents is evolving rapidly. With Portia AI, we want to help developers ride this wave of innovation by combining intelligence, autonomy, and security.  &lt;/p&gt;

&lt;p&gt;If this resonates with you, let’s connect on our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord channel &lt;/a&gt;. We’re building and iterating based on feedback from our community, and we’d love to hear your thoughts. Together, let’s tackle the gnarly challenges standing in the way of the agentic future.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Emma &amp;amp; Mounir&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Join the conversation&lt;a href="https://blog.portialabs.ai/we-are-live#join-the-conversation" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/portiaAI/portia-sdk-python" rel="noopener noreferrer"&gt;Give us a ⭐ on GitHub&lt;/a&gt; – it really helps!&lt;/li&gt;
&lt;li&gt;Browse our website and try our (modest) playground at &lt;a href="http://www.portialabs.ai/" rel="noopener noreferrer"&gt;www.portialabs.ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Head over to our docs at &lt;a href="https://docs.portialabs.ai/" rel="noopener noreferrer"&gt;docs.portialabs.ai&lt;/a&gt; or get immersed in our &lt;a href="https://github.com/portiaAI/portia-sdk-python/tree/main/portia/open_source_tools" rel="noopener noreferrer"&gt;SDK&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Join the conversation on our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Watch us embarrass ourselves on our &lt;a href="https://www.youtube.com/@PortiaAI" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://www.producthunt.com/posts/portia-ai" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>hello</category>
      <category>portia</category>
      <category>launch</category>
    </item>
    <item>
      <title>What's next for Browser Agents? 🤔</title>
      <dc:creator>Portia AI</dc:creator>
      <pubDate>Fri, 28 Feb 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/portia-ai/whats-next-for-browser-agents-2l59</link>
      <guid>https://dev.to/portia-ai/whats-next-for-browser-agents-2l59</guid>
      <description>&lt;h2&gt;
  
  
  TLDR
&lt;/h2&gt;

&lt;p&gt;I've been tinkering with browser automation recently (e.g., building a bot to search and buy on Amazon), and Operator’s release got me thinking about the future of these tools. Here are 3 key challenges browser agents face today:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Moving from text-only to multi-modal AI models.
&lt;/li&gt;
&lt;li&gt;Solving authentication without blending in with bad bots.
&lt;/li&gt;
&lt;li&gt;Enabling human-in-the-loop collaboration that's seamless and smart.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In this post we unpack these challenges, share insights, and explore what’s next for browser agents. Would you trust browser agents with your day-to-day tasks? Let me know your thoughts! 👇&lt;/p&gt;

&lt;h2&gt;
  
  
  ChatGPT Operator is out – what's next for browser agents?&lt;a href="https://blog.portialabs.ai/browser-agents#chatgpt-operator-is-out----whats-next-for-browser-agents" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;AI start-ups building browser agents must be losing sleep over Operator’s release 😱. We recently tinkered with browser agents ourselves to automate searching on Amazon and buying an item. Having seen some of the demos on Operator, here are our three broad takeaways:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The future of browser agents is multi-modal models
&lt;/h3&gt;

&lt;p&gt;I tried building my Amazon browser agent using a “unimodal” text-based LLM to really grok that point [For the tech savvy, I used Browserbase as my headless browser to automate browser tasks with code]. Because it doesn’t understand web navigation I had to figure out the exact structure of the webpage I wanted to automate and spoon feed it to the LLM. Not only would a developer need to do this for every website, they’d need to revisit this every time a website changes structure. To make matters worse, my context window (the amount of data the LLM can hold in a given conversation) was constantly saturated with the amount of HTML code retrieved. I had to find all sorts of hacks esp. on websites like Amazon (filter for specific tags, convert to other formats like text or markdown etc.). Operator is multi-modal, meaning it was trained on and processes both text and visual data. It is able to navigate a webpage dynamically and to process elements visually rather than rely exclusively on verbose HTML dumps. There are a few other contenders in this space – I was able to successfully star a Github repository using the &lt;code&gt;browser-use&lt;/code&gt; open source framework for example (Star them on &lt;a href="https://github.com/browser-use/browser-use" rel="noopener noreferrer"&gt;github&lt;/a&gt;!). You can see a GIF below and how they interpret web page layout visually. The accuracy of such models is improving fast, even though some benchmarks claim they are still &amp;lt; 58% (see WebArena’s &lt;a href="https://webarena.dev/" rel="noopener noreferrer"&gt;leaderboard&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Multi-modal AI models are able to navigate the web based on their ability to interpret website content visually as well as textually.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdgcc5kbmtffactfa9zj.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdgcc5kbmtffactfa9zj.gif" alt="An animated GIF of an agent using a browser" width="800" height="687"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. There’s currently little to nothing distinguishing AI web agents (good bots!) from automated fraud (bad bots!)
&lt;/h3&gt;

&lt;p&gt;In our attempts at building browser agents, we got blocked at various points in the browsing session. Occasionally we would get blocked right at the very start of a browsing session and would have to use proxies and other incognito methods. Incidentally we’ve seen &lt;a href="https://x.com/rowancheung/status/1882489741829972254" rel="noopener noreferrer"&gt;a demo shared on X&lt;/a&gt; where it seems that Operator occasionally struggles with that as well. More importantly, we could not find a way to load or fill the fields on login pages even when we got the browser agent to hand over the session to me ☠️. Based on the demos I’ve seen, Operator solves authentication for some providers by handing control over to the user in order to complete a login (e.g. Booking.com, Thumbtack or Google Calendar &lt;a href="https://x.com/rowancheung/status/1882490355628560628" rel="noopener noreferrer"&gt;access&lt;/a&gt;). Based on the &lt;a href="https://openai.com/index/introducing-operator/" rel="noopener noreferrer"&gt;press release&lt;/a&gt;, they are relying on bespoke partnerships with those domains to identify Operator-managed browsing sessions and allow these sessions to proceed "while respecting established norms". For now none of the other multi-modal-based frameworks we’ve seen have an easy answer to this problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The ecosystem needs a reliable security standard to establish this handshake between browser agents and websites. We envision this looking like an adapted version of delegated OAuth, and it will hopefully level the playing field for startups and make browser agents safer.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. We’re missing a structured way to handle the back and forth between a web agent and a user
&lt;/h3&gt;

&lt;p&gt;While we are seeing frameworks for building AI agents start to support human intervention (aka “human in the loop”), none of the well known browser agent frameworks offer a structured way to handle the back and forth between the agent and the human user. Operator definitely stands out in that regard as you can see in this flight booking &lt;a href="https://x.com/rowancheung/status/1882490129924624700" rel="noopener noreferrer"&gt;example&lt;/a&gt;. I am not clear what the user experience would be like if the user was not immediately available to guide the LLM. Would it be able to “save its progress” and resume once the user responds to it? Can the human pre-emptively define conditions where the web agent should consult it beyond the obvious ones like making a payment e.g. “if you find any offers for travel insurance during the booking process, let me know so I can explore them before making a decision”?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Resilient and flexible human-in-the-loop support needs to be a core feature of browser agents to ensure that users have as much control as possible over the actions of their browser agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you feel about browser agents? Would you trust them? Would you use them more broadly to assist you in your day-to-day life? Let us know in the comments below, or join us on Discord and share your thoughts 🙏&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Join the conversation​&lt;a href="https://blog.portialabs.ai/browser-agents#join-the-conversation" rel="noopener noreferrer"&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Join the conversation on our &lt;a href="https://discord.gg/DvAJz9ffaR" rel="noopener noreferrer"&gt;Discord channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Watch us embarrass ourselves on our &lt;a href="https://www.youtube.com/@PortiaAI" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Follow us on &lt;a href="https://www.producthunt.com/posts/portia-ai" rel="noopener noreferrer"&gt;Product Hunt&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>hello</category>
      <category>portia</category>
      <category>browseragents</category>
      <category>authentication</category>
    </item>
  </channel>
</rss>
