<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: SapotaCorp</title>
    <description>The latest articles on DEV Community by SapotaCorp (@sapotacorp).</description>
    <link>https://dev.to/sapotacorp</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3948393%2F8481b860-d2b5-43c8-9641-4b83c9386e84.png</url>
      <title>DEV Community: SapotaCorp</title>
      <link>https://dev.to/sapotacorp</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sapotacorp"/>
    <language>en</language>
    <item>
      <title>Polymorphic lookups in Dataverse: hidden cost tutorials skip</title>
      <dc:creator>SapotaCorp</dc:creator>
      <pubDate>Sun, 24 May 2026 05:31:39 +0000</pubDate>
      <link>https://dev.to/sapotacorp/polymorphic-lookups-in-dataverse-hidden-cost-tutorials-skip-3n8g</link>
      <guid>https://dev.to/sapotacorp/polymorphic-lookups-in-dataverse-hidden-cost-tutorials-skip-3n8g</guid>
      <description>&lt;p&gt;The Customer lookup on a Sales Order table in Dataverse can point at either an Account row or a Contact row. One field, two possible target tables, resolved at runtime. The pattern is called a polymorphic lookup, and it looks elegant on a form designer - the user picks an account or a contact without caring which.&lt;/p&gt;

&lt;p&gt;The elegance has a cost that tutorials routinely leave out. After shipping polymorphic lookups on three projects and unwinding them on one, here is our working rule for when they earn their keep.&lt;/p&gt;

&lt;h2&gt;
  
  
  What polymorphic lookups buy you
&lt;/h2&gt;

&lt;p&gt;Two flavors in Dataverse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Built-in polymorphic columns on system tables. The customerid column on Account/Contact lookups, the regardingobjectid column on activity tables pointing at any primary entity. You cannot create more of these.&lt;/li&gt;
&lt;li&gt;  Polymorphic lookups on custom tables via many-to-one relationships to multiple targets. Added through the maker portal or solution XML.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The appeal is obvious: one column instead of two. A form with "who is this order for?" that can be either a B2B account or a B2C contact reads more naturally than "who is this order for (if account)" and "who is this order for (if contact)" as two separate fields.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three follow-on problems
&lt;/h2&gt;

&lt;p&gt;Problem 1: reporting and joining are messy.&lt;/p&gt;

&lt;p&gt;SQL people know the pattern: one column that references two tables means every join needs a CASE or UNION. The Dataverse equivalent is FetchXML or the Web API, which do not expose the relationship with the same syntactic ease as a simple lookup to a single target.&lt;/p&gt;

&lt;p&gt;Example: "Show me all sales orders with the customer name, regardless of whether the customer is an account or a contact." With a polymorphic lookup, the FetchXML needs two outer joins:&lt;/p&gt;

&lt;p&gt;Then you coalesce acct.name and ctct.fullname in the consumer. Every report, every view, every integration picks this up. Two non-polymorphic lookups would give you two simple joins, one of which is always null - clearer to reason about, less clever.&lt;/p&gt;

&lt;p&gt;Problem 2: query cost and indexing.&lt;/p&gt;

&lt;p&gt;Dataverse creates an index on lookup columns, but polymorphic lookups cover multiple target tables. A filter like customerid eq 'specific-guid' has to resolve the GUID against multiple index targets. The performance cost is small on lookups with two targets; it scales with the number of targets.&lt;/p&gt;

&lt;p&gt;More subtly, views that filter by "customer name" become expensive because they implicitly join both target tables. A large view with a polymorphic-lookup-based filter is where performance degrades first as the system grows.&lt;/p&gt;

&lt;p&gt;Problem 3: integration pain.&lt;/p&gt;

&lt;p&gt;Upserts from an external system into a polymorphic lookup require the integration to know which target table the value belongs to. The Web API syntax:&lt;/p&gt;

&lt;p&gt;versus&lt;/p&gt;

&lt;p&gt;The integration has to branch on "is this an account or a contact" for every write. A non-polymorphic lookup takes one fixed binding name. On a high-volume sync, this branching is 20% of the integration code.&lt;/p&gt;

&lt;h2&gt;
  
  
  When polymorphic lookups are genuinely the right tool
&lt;/h2&gt;

&lt;p&gt;Two scenarios make us reach for them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  You genuinely need to store heterogeneous references. An activity-like table (calls, emails, meetings) that logically relates to "whatever is being discussed" benefits from a polymorphic link. The alternative - a separate lookup column per possible target - explodes the schema.&lt;/li&gt;
&lt;li&gt;  The UI payoff outweighs the reporting cost. A form designer needs one field, not four, and users do not care about the underlying model. If reporting against this column is rare (audit log use case, not a weekly exec report), the cost is modest.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Patterns we use instead
&lt;/h2&gt;

&lt;p&gt;When the requirement is "customer is sometimes B2B, sometimes B2C, but in practice each order has exactly one customer," we pick one of:&lt;/p&gt;

&lt;p&gt;Pattern A: single target table (Contact with a Type field).&lt;/p&gt;

&lt;p&gt;Make Contact the one target table. Add a acme_contact_type choice column: Individual or Company Representative. For B2B orders, create a contact row that represents the company's billing contact (linked to the account via another lookup). Every sales order lookup points at Contact; every report groups by contact's account if the contact has one.&lt;/p&gt;

&lt;p&gt;Works when the B2B orders almost always go through an identifiable contact person anyway.&lt;/p&gt;

&lt;p&gt;Pattern B: two non-polymorphic lookups with an exclusion rule.&lt;/p&gt;

&lt;p&gt;Two lookups on the sales order: acme_customer_account and acme_customer_contact. A Business Rule enforces "exactly one of these must be filled." Reports join on whichever is non-null.&lt;/p&gt;

&lt;p&gt;Works when you want the clarity of strong-typed references at the cost of form real estate.&lt;/p&gt;

&lt;p&gt;Pattern C: a Customer Party table that accounts and contacts both reference up into.&lt;/p&gt;

&lt;p&gt;Create a acme_party table. Every account automatically creates a party row; every contact the same. The sales order's customer lookup points at acme_party, which has a simple owner reference back to either account or contact.&lt;/p&gt;

&lt;p&gt;Heaviest pattern, most data-warehouse-like, best for projects where you will want to unify reporting across customer types anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  The decision we ran on the last project
&lt;/h2&gt;

&lt;p&gt;A retail client with both corporate contracts (B2B) and direct customers (B2C). Initial instinct was polymorphic customer lookup on sales order.&lt;/p&gt;

&lt;p&gt;We caught ourselves and ran three questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Will exec reporting need to compare B2B and B2C performance? (Yes - monthly board review.)&lt;/li&gt;
&lt;li&gt; Will integrations distinguish account customers from contact customers? (Yes - different billing systems.)&lt;/li&gt;
&lt;li&gt; Does the UI benefit outweigh the query complexity? (No - users already work in contact-specific views for B2C and account-specific views for B2B.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three no-or-yes-with-cost answers. We went with Pattern B: two non-polymorphic lookups + Business Rule. Reports are straightforward joins. Integrations target one of two simple bindings. Users pick "Business customer" or "Individual customer" on form load, and the relevant lookup appears.&lt;/p&gt;

&lt;p&gt;If we had chosen polymorphic, the first board report would have taken a week of FetchXML wrestling. Two years in, we are still glad we did not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule of thumb
&lt;/h2&gt;

&lt;p&gt;If the polymorphic lookup is truly the point (activity-like tables, audit references, rare reporting use), use it. If it is a convenience over two semantically different relationships, the convenience will cost you more than two cleaner columns ever would.&lt;/p&gt;

</description>
      <category>powerplatform</category>
    </item>
    <item>
      <title>Dataverse webhooks to Service Bus: retry and dead-lettering</title>
      <dc:creator>SapotaCorp</dc:creator>
      <pubDate>Sun, 24 May 2026 05:31:18 +0000</pubDate>
      <link>https://dev.to/sapotacorp/dataverse-webhooks-to-service-bus-retry-and-dead-lettering-9he</link>
      <guid>https://dev.to/sapotacorp/dataverse-webhooks-to-service-bus-retry-and-dead-lettering-9he</guid>
      <description>&lt;p&gt;A Dataverse row change needs to trigger downstream processing - update an ERP, index into a search engine, notify a mobile app. The "sync plugin that makes HTTP calls" path works for demos and fails in production the first time the downstream service has a bad hour. Plug-ins with external HTTP calls hit the 2-minute plugin timeout, block the user's save, and accumulate retries that saturate everything downstream.&lt;/p&gt;

&lt;p&gt;The production-grade pattern: Dataverse fires a webhook into Azure Service Bus on the row change. Service Bus holds the message durably. A consumer (Azure Function, Logic App, or custom service) pulls from the queue on its own schedule, processes at its own pace, and dead-letters failures for later inspection.&lt;/p&gt;

&lt;p&gt;Here is the full pipeline, the retry semantics at each stage, and the gotchas we have debugged in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape
&lt;/h2&gt;

&lt;p&gt;Three queues total: the main queue, the dead-letter queue (automatic subqueue of the main), and optionally a "retry" queue for messages that failed once but might succeed on a delay.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 1: Dataverse webhook plugin
&lt;/h2&gt;

&lt;p&gt;Dataverse's built-in Service Endpoint mechanism posts messages to Azure Service Bus natively. You register a Service Endpoint pointing at your Service Bus queue's URL and SAS token. Plugins registered with ServiceEndpoint as the destination post automatically.&lt;/p&gt;

&lt;p&gt;Alternatively, a traditional async plugin that uses the Azure SDK to send to Service Bus works the same way and gives you finer control (batch sends, custom message properties, etc.). We use this for high-volume scenarios.&lt;/p&gt;

&lt;p&gt;Plugin registered as: post-operation, async, on the table Update (or whatever event triggers the downstream work).&lt;/p&gt;

&lt;p&gt;The plugin's job is tiny:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Build a message body from the current row (or just include the row's primary key plus enough metadata for the consumer to fetch the full row).&lt;/li&gt;
&lt;li&gt; Send to Service Bus.&lt;/li&gt;
&lt;li&gt; Return.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Durations measured: typically 100-300ms per execution. The async service handles this on its own schedule; user saves never wait on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 2: Service Bus queue configuration
&lt;/h2&gt;

&lt;p&gt;Queue settings that matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Max delivery count: how many times the queue will re-deliver a message before moving it to dead-letter. Default 10. We set to 5 for most scenarios - enough retries for transient issues, few enough that persistent failures don't churn for an hour.&lt;/li&gt;
&lt;li&gt;  Message time-to-live: maximum time a message stays queued. Default 14 days. We set to 7 days - long enough for a weekend outage to recover.&lt;/li&gt;
&lt;li&gt;  Lock duration: how long a consumer holds a message before visibility returns. Default 30 seconds. Set to maximum expected processing time + buffer (e.g., if processing takes 10 seconds max, set lock to 30 seconds).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Session and Partitioning features we usually do not enable. Sessions are for order-preserving scenarios; partitioning adds complexity we don't usually need at our volumes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 3: consumer (Azure Function with Service Bus trigger)
&lt;/h2&gt;

&lt;p&gt;A Function App with the Service Bus Queue trigger pulls messages automatically:&lt;/p&gt;

&lt;p&gt;Three completion paths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Complete: message processed successfully, removed from queue. This is the happy path.&lt;/li&gt;
&lt;li&gt;  Dead-letter: permanent failure (bad data, unknown row, business logic rejection). Moved to dead-letter queue for manual review.&lt;/li&gt;
&lt;li&gt;  Throw/abandon: transient failure. Message returns to the queue for retry until max delivery count is reached, then auto-moved to dead-letter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Categorizing exceptions into "permanent" and "transient" is the single most important decision in the consumer. Transient failures should retry; permanent ones should not churn the retry counter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retry behavior in detail
&lt;/h2&gt;

&lt;p&gt;When the consumer throws (transient path), Service Bus behavior:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; First delivery fails. Lock expires. Message becomes visible again after the lock expiration.&lt;/li&gt;
&lt;li&gt; Next poll pulls the message. Delivery count is 2. Same failure? Same outcome.&lt;/li&gt;
&lt;li&gt; Repeat until delivery count hits max (5 in our config).&lt;/li&gt;
&lt;li&gt; Message auto-moves to dead-letter queue with system-reason "MaxDeliveryCountExceeded."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The back-off between retries is configurable via the consumer's maxAutoLockRenewalDuration and retry policy settings. The Function App Service Bus trigger has reasonable defaults; tune only if your failure characteristics require it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dead-letter queue handling
&lt;/h2&gt;

&lt;p&gt;Messages in DLQ are the alerts that matter. Options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Manual review dashboard: a simple admin page that lists DLQ messages, shows their content, lets an operator decide: retry, discard, fix-data-then-retry.&lt;/li&gt;
&lt;li&gt;  Automated re-enqueue after a delay: for certain dead-letter reasons, moving the message back to the main queue on a schedule. Be careful with this - if the failure is actually permanent, you create a loop.&lt;/li&gt;
&lt;li&gt;  Automated alerting: DLQ depth &amp;gt; N → PagerDuty. For critical integrations, operators should know within minutes that messages are dead-lettering.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We use a combination: alerting for DLQ depth above zero (for critical queues) plus a manual review dashboard that captures the specific message and lets operators take action.&lt;/p&gt;

&lt;h2&gt;
  
  
  Idempotency requirements on the consumer
&lt;/h2&gt;

&lt;p&gt;Because retries happen, the consumer must be idempotent: processing the same message twice should produce the same result as processing once.&lt;/p&gt;

&lt;p&gt;Common patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Upsert at the destination: if updating a row in ERP, use the order ID as a natural key and upsert. Same message twice yields the same end state.&lt;/li&gt;
&lt;li&gt;  Dedupe by message ID: each Service Bus message has a unique MessageId. Track processed IDs in a fast store (Redis, Cosmos); skip repeats.&lt;/li&gt;
&lt;li&gt;  Compare-and-swap patterns: for sequence-sensitive updates, include a version number; reject updates with stale version.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without idempotency, every retry is a chance for corruption. We audit this explicitly on every new consumer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitoring the whole chain
&lt;/h2&gt;

&lt;p&gt;Observability per stage:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Dataverse plugin: trace logs, System Jobs list. Failures here are rare once the plugin is stable, but Azure Service Bus connectivity blips show up as plugin failures.&lt;/li&gt;
&lt;li&gt; Service Bus queue: Azure Monitor metrics for message count, DLQ depth, incoming rate, active connections. Alert on DLQ depth &amp;gt; 0 for critical queues.&lt;/li&gt;
&lt;li&gt; Consumer Function: Application Insights for exceptions, duration, failure rate. Alert on error rate &amp;gt; 1% or duration &amp;gt; SLA.&lt;/li&gt;
&lt;li&gt; Downstream system: whatever native monitoring the target has (ERP dashboards, search engine metrics).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Correlate via MessageId or a custom correlation header set at plugin send time. Every log entry across stages includes the correlation ID; one query traces a message from Dataverse to final destination.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern vs the alternatives
&lt;/h2&gt;

&lt;p&gt;Why not Power Automate?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Cost at scale: at one flow run per Dataverse change and a million changes per month, Power Automate licensing gets expensive.&lt;/li&gt;
&lt;li&gt;  Retry and dead-letter semantics are less mature: Power Automate has retry but less explicit dead-letter; operations visibility is weaker.&lt;/li&gt;
&lt;li&gt;  Concurrency limits differ: an Azure Function App can scale to hundreds of concurrent instances; a Power Automate flow has per-flow throughput limits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why not direct plugin-to-ERP?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  2-minute plugin timeout kills long integrations.&lt;/li&gt;
&lt;li&gt;  Synchronous retry blocks users.&lt;/li&gt;
&lt;li&gt;  Failures become support tickets instead of operations dashboard alerts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When Power Automate is enough (low volume, simple routing), use it. When the integration is core to the business and failures have operational consequences, the Dataverse → Service Bus → Function pattern pays its complexity back within the first outage it absorbs gracefully.&lt;/p&gt;

</description>
      <category>powerplatform</category>
    </item>
    <item>
      <title>Model-driven vs Canvas app: four questions before you commit</title>
      <dc:creator>SapotaCorp</dc:creator>
      <pubDate>Sun, 24 May 2026 05:25:52 +0000</pubDate>
      <link>https://dev.to/sapotacorp/model-driven-vs-canvas-app-four-questions-before-you-commit-f1o</link>
      <guid>https://dev.to/sapotacorp/model-driven-vs-canvas-app-four-questions-before-you-commit-f1o</guid>
      <description>&lt;p&gt;Every Power Apps engagement begins with the same pivot point. Canvas or model-driven? Newcomers default to Canvas - it looks more modern, the pixel-level control feels familiar from Figma and other design tools. Experienced teams default to model-driven - it scales, it gets ALM right out of the box, it is boring in a good way.&lt;/p&gt;

&lt;p&gt;Neither default is correct universally. After building in both shapes for three years, here is the four-question framework we run before committing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Question 1: does the core data model have more than ten tables with meaningful relationships?
&lt;/h2&gt;

&lt;p&gt;Model-driven wins when the answer is yes. The framework generates forms, views, and relationships automatically. Adding a new column shows up on the form without manual rebuild. Views support complex filters, saved queries, and personal pins without a line of code.&lt;/p&gt;

&lt;p&gt;Canvas wins when the answer is no. A Canvas app can be wired to two or three Dataverse tables cleanly. Past that, the formulas that bind controls to data get long enough that the maintenance cost exceeds the UI flexibility benefit.&lt;/p&gt;

&lt;p&gt;The boundary is not exactly ten tables - it is "does the app feel like a CRM or a workflow tool?" CRM-shaped apps are model-driven natural. Workflow-shaped apps (inspection forms, approval requests, dashboards) are Canvas natural.&lt;/p&gt;

&lt;h2&gt;
  
  
  Question 2: does the UI need to be pixel-specific?
&lt;/h2&gt;

&lt;p&gt;Canvas wins when the UI is the point. A customer-facing kiosk app, a mobile field-service tool with big buttons for gloved hands, a dashboard with a specific corporate brand treatment - Canvas lets you design the exact pixel layout.&lt;/p&gt;

&lt;p&gt;Model-driven wins when UI consistency and auto-layout trumps design control. An internal CRM where users need forms to follow a predictable pattern across 40 record types should not have bespoke layouts per type.&lt;/p&gt;

&lt;p&gt;The trap: a Canvas app designed for mobile portrait view breaks on desktop landscape. You either duplicate the screen for each form factor or build responsive layouts with App.Width conditional logic that is painful to maintain. Model-driven adapts to form factor for free.&lt;/p&gt;

&lt;h2&gt;
  
  
  Question 3: which devices need to run it?
&lt;/h2&gt;

&lt;p&gt;Model-driven apps run well in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Desktop browsers&lt;/li&gt;
&lt;li&gt;  Mobile via the Power Apps app (responsive-ish, some features degrade)&lt;/li&gt;
&lt;li&gt;  Tablets via the Power Apps app&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Canvas apps run well in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Mobile (native feel, offline support, camera/GPS/etc hardware access)&lt;/li&gt;
&lt;li&gt;  Tablets&lt;/li&gt;
&lt;li&gt;  Desktop via browser or Windows Player (functional but not as polished as model-driven desktop)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your app is desktop-first for back-office users, model-driven is cleaner. If it is mobile-first for field workers, Canvas is cleaner. The middle case ("both desktop and mobile equally") is where people get stuck, and the tiebreaker is usually Question 2 - if UI specificity matters, Canvas; if not, model-driven.&lt;/p&gt;

&lt;p&gt;Offline deserves a specific callout: Canvas apps have robust offline support via local collections and Dataverse mobile offline profiles. Model-driven apps have limited offline support via the Dataverse mobile offline sync, which is more restrictive. Offline-critical apps tend toward Canvas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Question 4: who owns this app a year from now?
&lt;/h2&gt;

&lt;p&gt;This is the question teams skip, and it is the one that usually decides.&lt;/p&gt;

&lt;p&gt;Model-driven ages well because the framework is structured. A developer who has never seen your app can open a model-driven app, understand the data model from the entity list, find a form, and make a change without reading formula trees. Handoff is short.&lt;/p&gt;

&lt;p&gt;Canvas apps age as well as the formula hygiene of the person who built them. A Canvas app with 40 screens, variables named var1 through var17, and 500-line OnSelect formulas on every button is unmaintainable. A Canvas app with disciplined naming, componentization, and consistent state management can be very maintainable - but it requires discipline the team may not have.&lt;/p&gt;

&lt;p&gt;If the app will be handed over to client admins with no Power Apps background, model-driven is the safer bet. If it will be maintained by a developer team with Canvas experience and you have time to enforce code hygiene, Canvas can work. If it will be handed over and the client does not have developers, model-driven wins decisively.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fifth question we do not ask explicitly
&lt;/h2&gt;

&lt;p&gt;"Does someone on the team have strong feelings?" If a senior maker on the client side loves Canvas, fighting them into model-driven on a close call usually costs more than the gain. Conversely, a team that already has 20 model-driven apps and no Canvas experience should not build a Canvas app for a borderline use case.&lt;/p&gt;

&lt;p&gt;Skill investment is real. Picking the framework your team maintains well is often the right answer even if the other framework would suit the problem slightly better.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the four questions apply
&lt;/h2&gt;

&lt;p&gt;Recent project: client asked for "a mobile-friendly app for their sales reps to log visits and orders on the road."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Q1 (data model depth): 6 tables with relationships, customer / visit / visit-contact / order / order-line / product. Medium.&lt;/li&gt;
&lt;li&gt;  Q2 (UI specificity): Reps want big buttons, step-by-step wizard, optional photo upload per visit. High specificity.&lt;/li&gt;
&lt;li&gt;  Q3 (devices): Mobile-first, occasional tablet.&lt;/li&gt;
&lt;li&gt;  Q4 (maintainer): Client's internal Power Apps team, two makers with Canvas background.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three answers pointed Canvas, one was neutral. We built Canvas. Two years on, the app has evolved through three UI refreshes without touching the data model - exactly the pattern Canvas suits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hybrid is an option
&lt;/h2&gt;

&lt;p&gt;The decision is not always one or the other. Some projects run model-driven for the back-office admin experience and a Canvas app for the mobile field experience, both hitting the same Dataverse tables.&lt;/p&gt;

&lt;p&gt;The cost is maintaining two apps. The benefit is that each app is fit for its audience. We run this pattern on any project where the "office user" and "field user" personas need genuinely different interfaces on the same data.&lt;/p&gt;

&lt;p&gt;If you are torn, write down your four answers and share them with the team. The decision usually becomes obvious once the questions are explicit rather than "Canvas feels right to me."&lt;/p&gt;

</description>
      <category>powerplatform</category>
    </item>
    <item>
      <title>When vector RAG falls apart: five signs you need a graph instead</title>
      <dc:creator>SapotaCorp</dc:creator>
      <pubDate>Sun, 24 May 2026 05:25:32 +0000</pubDate>
      <link>https://dev.to/sapotacorp/when-vector-rag-falls-apart-five-signs-you-need-a-graph-instead-5ge4</link>
      <guid>https://dev.to/sapotacorp/when-vector-rag-falls-apart-five-signs-you-need-a-graph-instead-5ge4</guid>
      <description>&lt;p&gt;A B2B SaaS team asked us to look at their support assistant after a sales engineer tried to use it to answer a customer call. The question was: "which of our enterprise customers complained about the new pricing tier in Q4 last year, and what is their current MRR?"&lt;/p&gt;

&lt;p&gt;The assistant returned a fluent paragraph about pricing complaints in general, with one specific customer name lifted from a chunk where they appeared in passing. The MRR was wrong. The complaint was actually from Q3, not Q4. The list of "enterprise customers" was missing two of the largest accounts.&lt;/p&gt;

&lt;p&gt;The team's first instinct was to throw a bigger model at it. The actual problem was structural: the question required reasoning over relationships between entities that a chunked vector store had already destroyed at index time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What chunking does to entity relationships
&lt;/h2&gt;

&lt;p&gt;When a CRM corpus or a support ticket history goes through a standard RAG pipeline, the chunker treats the text as a flat sequence. A chunk that mentions "Acme Corp called about pricing in October" is stored alongside the text. A separate chunk that mentions "Acme Corp's MRR is $48,000" is stored separately, in a different document. A third chunk says "Q4 starts in October."&lt;/p&gt;

&lt;p&gt;A vector retrieval over the question "enterprise customers complaining about pricing in Q4" might return the first chunk. It will not return the second chunk because it does not mention pricing. It will not return the third chunk because it does not mention Acme. The LLM gets one of the three pieces it needs and confidently makes up the other two.&lt;/p&gt;

&lt;p&gt;This is not a model problem. A larger LLM hallucinates more confidently with the same incomplete context. The fix has to happen earlier in the pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five signs the corpus needs a graph layer
&lt;/h2&gt;

&lt;p&gt;Sapota's audit checklist for "is this a Graph RAG problem" is five questions. If three or more come back yes, we recommend adding a graph.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Do the questions involve more than one named entity?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;"What products has customer X bought" is one entity. "What products has customer X bought that customers similar to X also returned" is three entities and two relationships. The second question is a graph query, not a search query, regardless of how it is phrased in natural language.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Do the questions filter on relationships, not just attributes?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;"Engineers with more than five years of experience" is an attribute filter. "Engineers who reported to managers who left in the last year" is a relationship filter. Vector search has no concept of relationships. Graph databases are built around them.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Does the right answer depend on aggregating across many entities?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;"Total revenue from accounts in the EMEA region" requires walking across all account entities, filtering by region, and summing. This is a SQL query disguised as natural language and should not be in a vector database at all. A graph database (or just a SQL agent) handles it cleanly.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Do the source documents contain dense entity references?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;CRM notes, meeting transcripts, customer interviews, sales call summaries: these documents are mostly references to people, accounts, products, and dates. Their meaning lives in the relationships between these entities, not in the prose. Chunking them preserves the prose and destroys the relationships.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Do the users ask follow-up questions that build on the previous answer?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;"Show me the top 10 customers by revenue. Now show me which of those have open support tickets. Now show me the average resolution time for those tickets." Each follow-up adds a hop to the implied query. By the third follow-up, a vector retrieval is essentially random.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we actually build
&lt;/h2&gt;

&lt;p&gt;The default pattern Sapota ships for Graph RAG is hybrid: a vector store for unstructured content (long-form notes, articles, documentation) plus a graph database for entities and relationships extracted from that content.&lt;/p&gt;

&lt;p&gt;The pipeline at index time:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Run an entity-and-relationship extraction pass over the corpus using an LLM (Claude Sonnet or GPT-4o, gpt-4o-mini for cost-sensitive cases). Output is a list of (entity, relationship, entity) triples plus the source chunk for each.&lt;/li&gt;
&lt;li&gt; Insert the triples into Neo4j (or LightRAG if the team wants the simpler embedded option). Each triple carries a reference to the source chunk in the vector store.&lt;/li&gt;
&lt;li&gt; Embed the original chunks into the vector store as usual.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At query time:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; A planning LLM classifies the query as graph-shaped, vector-shaped, or hybrid.&lt;/li&gt;
&lt;li&gt; Graph-shaped queries are translated to Cypher (or the equivalent for LightRAG) and executed against the graph. The result is a set of entities + the source chunks they came from.&lt;/li&gt;
&lt;li&gt; Vector-shaped queries hit the vector store as usual.&lt;/li&gt;
&lt;li&gt; Hybrid queries do both and merge: graph result narrows the candidate entities, vector retrieval pulls the supporting prose from the chunks linked to those entities.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The synthesis LLM gets both the structured graph result and the unstructured prose context, which is enough to answer multi-hop questions correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why most teams skip Graph RAG
&lt;/h2&gt;

&lt;p&gt;Three reasons, in our experience.&lt;/p&gt;

&lt;p&gt;The setup looks scary. Neo4j has a learning curve. Cypher is a new query language. The entity extraction step requires LLM calls during indexing, which is not free. Compared to "spin up a Pinecone index and embed everything," the operational complexity is real.&lt;/p&gt;

&lt;p&gt;The product team does not realize the queries are graph-shaped. Until a sales engineer or analyst tries to use the system for the questions they actually have, the team thinks the chatbot is for FAQ lookups. By the time the multi-hop questions surface, the architecture is already in production and the migration is painful.&lt;/p&gt;

&lt;p&gt;Most tutorials and managed RAG services do not cover it. Vector RAG has the brand recognition. Graph RAG is the unsexy older sibling that solves the harder problem. It is the same dynamic as relational databases vs document stores in 2010, and it tends to resolve the same way: the boring older option turns out to be necessary for serious use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  When vector-only is genuinely enough
&lt;/h2&gt;

&lt;p&gt;We do not always recommend adding a graph layer. The cases where vector-only RAG is the right call:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The corpus is genuinely document-shaped (long-form prose, blog posts, technical articles) without dense entity references.&lt;/li&gt;
&lt;li&gt;  The questions are mostly definitional or how-to, not multi-hop or comparative.&lt;/li&gt;
&lt;li&gt;  The team does not have the engineering capacity to maintain a graph plus a vector store, and the simpler architecture's failure modes are acceptable.&lt;/li&gt;
&lt;li&gt;  The product is early enough that the question distribution has not stabilized and over-engineering is the bigger risk.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A note on LightRAG and GraphRAG-Microsoft
&lt;/h2&gt;

&lt;p&gt;There are two managed-style options that reduce the operational lift: LightRAG (open-source from HKU) and Microsoft's GraphRAG. Both wrap the entity extraction, graph construction, and graph-aware retrieval into higher-level APIs.&lt;/p&gt;

&lt;p&gt;LightRAG is the lighter option and the one we use for projects that want graph capabilities without committing to a Neo4j cluster. GraphRAG-Microsoft is more opinionated and pulls in more of the Azure ecosystem. For teams already on Azure, it is a reasonable choice. For everyone else, LightRAG plus Neo4j when scale demands it is the path of least resistance.&lt;/p&gt;

&lt;h2&gt;
  
  
  If your AI cannot answer multi-hop questions
&lt;/h2&gt;

&lt;p&gt;If your team has shipped a vector RAG system and the users keep asking questions that require connecting entities across documents, that is the pattern we resolve. Sapota runs a one-week graph readiness assessment that takes the production query log, classifies which queries are graph-shaped, and ships the entity extraction pipeline plus the hybrid retrieval as a working integration.&lt;/p&gt;

&lt;p&gt;Reach out via the &lt;a href="https://www.sapotacorp.vn/ai-and-machine-learning" rel="noopener noreferrer"&gt;AI engineering page&lt;/a&gt; with three example questions that are failing in production. The diagnosis usually surfaces within the first call.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Shopify theme editor: design tokens merchants can edit</title>
      <dc:creator>SapotaCorp</dc:creator>
      <pubDate>Sun, 24 May 2026 05:20:05 +0000</pubDate>
      <link>https://dev.to/sapotacorp/shopify-theme-editor-design-tokens-merchants-can-edit-377i</link>
      <guid>https://dev.to/sapotacorp/shopify-theme-editor-design-tokens-merchants-can-edit-377i</guid>
      <description>&lt;p&gt;A merchant wants to experiment with design elements in the theme editor - button colors, font choices, border thickness, opacity. They're not comfortable editing Liquid code; they want to click, preview, save. The question for the developer: how do you expose the right knobs to the theme editor without giving merchants a way to break the theme?&lt;/p&gt;

&lt;h2&gt;
  
  
  The theme-editor control surface
&lt;/h2&gt;

&lt;p&gt;Shopify themes have a specific file that controls which settings merchants see in the theme editor: config/settings_schema.json.&lt;/p&gt;

&lt;p&gt;This schema defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Theme settings - global controls (site typography, brand colors, button defaults)&lt;/li&gt;
&lt;li&gt;  Section settings - per-section controls (image, heading, call-to-action per hero section)&lt;/li&gt;
&lt;li&gt;  Block settings - nested controls within sections (individual feature blocks)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A well-designed settings schema gives merchants meaningful control without overwhelming them with technical knobs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The file structure
&lt;/h2&gt;

&lt;p&gt;config/settings_schema.json defines what the merchant sees:&lt;/p&gt;

&lt;p&gt;config/settings_data.json holds the merchant's saved values. This file isn't edited by developers directly - it's updated every time the merchant saves settings in the theme editor.&lt;/p&gt;

&lt;p&gt;Liquid templates consume the settings via settings.color_button_primary, settings.font_heading, etc.&lt;/p&gt;

&lt;h2&gt;
  
  
  The correct answer for "where to expose design tokens"
&lt;/h2&gt;

&lt;p&gt;The file that controls merchant-editable design is config/settings_schema.json. Adding a new setting there makes it appear in the theme editor's global settings. Merchants click, preview, save; the change propagates across every section that references the setting.&lt;/p&gt;

&lt;p&gt;This is distinct from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  config/settings_data.json - the saved values, not the schema&lt;/li&gt;
&lt;li&gt;  sections/*.liquid - per-section settings defined in schema tags at the bottom of each section file&lt;/li&gt;
&lt;li&gt;  assets/*.css - compiled CSS, not editable through the editor&lt;/li&gt;
&lt;li&gt;  templates/*.json - page template structure, edited through the theme editor's customization mode&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setting types that fit design tokens
&lt;/h2&gt;

&lt;p&gt;Shopify's settings schema supports many types. For design tokens, the useful ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  color - hex color picker&lt;/li&gt;
&lt;li&gt;  color_background - background with optional transparency&lt;/li&gt;
&lt;li&gt;  color_scheme - references a color scheme defined elsewhere&lt;/li&gt;
&lt;li&gt;  font_picker - select from Shopify's font library (Google Fonts + system fonts)&lt;/li&gt;
&lt;li&gt;  range - numeric slider (for border thickness, opacity percentage, image border radius)&lt;/li&gt;
&lt;li&gt;  select - dropdown (for size options like "small/medium/large")&lt;/li&gt;
&lt;li&gt;  checkbox - boolean toggle&lt;/li&gt;
&lt;li&gt;  text - short text input (for custom CSS class names, if you really must)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the scenario at the top - button colors, fonts, border thickness, opacity - the right setting types are color, font_picker, and range respectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Consuming settings in Liquid and CSS
&lt;/h2&gt;

&lt;p&gt;In Liquid, settings are accessed via the global settings variable:&lt;/p&gt;

&lt;p&gt;For bulk CSS styling, many themes compile settings into CSS custom properties in the layout:&lt;/p&gt;

&lt;p&gt;Then component CSS uses the custom properties:&lt;/p&gt;

&lt;p&gt;This separation means design-system changes in the theme editor propagate everywhere in one place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Color schemes for complex theming
&lt;/h2&gt;

&lt;p&gt;Modern Shopify themes support color schemes - named combinations of background, text, button, and accent colors that apply to sections. A merchant can define "Light scheme" and "Dark scheme" and apply either to any section.&lt;/p&gt;

&lt;p&gt;Color schemes live in the schema too but have their own structure:&lt;/p&gt;

&lt;p&gt;This is the pattern Dawn and modern OS 2.0 themes use. Merchants pick a scheme per section; the theme renders with the scheme's colors consistently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Font pickers and the font_url filter
&lt;/h2&gt;

&lt;p&gt;For fonts, the font_picker setting type returns a font object with a family name and loading URL. Themes use the font_url filter to load the right stylesheet:&lt;/p&gt;

&lt;p&gt;This emits the proper @font-face declaration for the chosen font. Merchants can pick any of Shopify's library fonts (Google Fonts + system) and the theme adapts automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Settings organization for non-technical merchants
&lt;/h2&gt;

&lt;p&gt;A well-organized settings schema:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Groups related settings under headers ("Colors", "Typography", "Buttons")&lt;/li&gt;
&lt;li&gt;  Uses clear labels ("Primary button color" not "color_button_primary")&lt;/li&gt;
&lt;li&gt;  Includes help text explaining the effect&lt;/li&gt;
&lt;li&gt;  Orders settings from most-impactful to least&lt;/li&gt;
&lt;li&gt;  Provides sensible defaults&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A schema with 100 ungrouped settings is unusable even by technical merchants. A schema with 15 well-organized settings lets non-technical merchants confidently experiment.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to expose vs hide
&lt;/h2&gt;

&lt;p&gt;Not every theme value belongs in the settings schema. Good candidates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Brand colors&lt;/li&gt;
&lt;li&gt;  Typography choices&lt;/li&gt;
&lt;li&gt;  Layout parameters (max content width, spacing scale)&lt;/li&gt;
&lt;li&gt;  Button styles&lt;/li&gt;
&lt;li&gt;  Section-level toggles (show/hide features)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bad candidates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Things requiring CSS knowledge (box-shadow syntax, custom positioning)&lt;/li&gt;
&lt;li&gt;  Things that would break layout at wrong values (minimum padding below which content overlaps)&lt;/li&gt;
&lt;li&gt;  Implementation details (cache timeouts, third-party IDs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The boundary: if a wrong value breaks the theme meaningfully, it doesn't belong in merchant-editable settings.&lt;/p&gt;

&lt;h2&gt;
  
  
  What ships with a well-designed theme settings schema
&lt;/h2&gt;

&lt;p&gt;A theme that supports non-technical merchant self-service has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  config/settings_schema.json with grouped, labeled, defaulted settings&lt;/li&gt;
&lt;li&gt;  Design tokens consumed via CSS custom properties&lt;/li&gt;
&lt;li&gt;  Settings exposed through the theme editor in an intuitive order&lt;/li&gt;
&lt;li&gt;  Defaults that produce a usable theme without any customization&lt;/li&gt;
&lt;li&gt;  Documentation (or help text in the editor) explaining what each setting does&lt;/li&gt;
&lt;li&gt;  Versioning discipline - renaming a setting's ID in code breaks merchant's saved values&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Merchants who can self-serve design tokens stay happier with their theme. The developer's work is making the right set of knobs available without offering footguns.&lt;/p&gt;

</description>
      <category>shopify</category>
    </item>
    <item>
      <title>Dataverse security restructure: lessons applied too late</title>
      <dc:creator>SapotaCorp</dc:creator>
      <pubDate>Sun, 24 May 2026 05:19:44 +0000</pubDate>
      <link>https://dev.to/sapotacorp/dataverse-security-restructure-lessons-applied-too-late-12o2</link>
      <guid>https://dev.to/sapotacorp/dataverse-security-restructure-lessons-applied-too-late-12o2</guid>
      <description>&lt;p&gt;Dataverse gives you three access-control primitives that combine into a permission model: business units (BUs), security roles, and teams. On paper they are simple. In practice, every project that runs for more than a year develops the same failure mode: the security model grows by accretion - a new role for every department, a new team for every project, a new business unit every time someone says "but we need regional data separation." By year three, the model has twenty roles nobody remembers the purpose of, and access audits take a week.&lt;/p&gt;

&lt;p&gt;We have walked three projects through a security restructure. The first took five weeks because we waited too long. The last took a week because we caught it at month three. Here is the pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the primitives actually do
&lt;/h2&gt;

&lt;p&gt;Business unit: the hierarchical container for users and records. A row in Dataverse is owned by a user or team, which sits in a BU. BUs form a tree from the root down.&lt;/p&gt;

&lt;p&gt;Security role: a set of privileges per table (Create / Read / Write / Delete / Append / Append To / Assign / Share) and per scope (User / Business Unit / Parent:Child / Organization). Users get one or more roles.&lt;/p&gt;

&lt;p&gt;Team: a group of users that can own records collectively and can have roles assigned to the team (all members inherit).&lt;/p&gt;

&lt;p&gt;The common misreading is thinking of roles as job titles ("Sales Rep") and BUs as departments ("Sales"). The native mapping is actually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  BUs for data isolation (regions, countries, acquired companies whose data should not mix)&lt;/li&gt;
&lt;li&gt;  Roles for capability (Read accounts, Edit opportunities, Export to Excel)&lt;/li&gt;
&lt;li&gt;  Teams for cross-BU collaboration and shared ownership of records&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you try to use roles for data isolation ("a Europe Sales Rep role vs a US Sales Rep role") instead of BUs, you end up with N copies of the same role for each N regions. When you try to use BUs for capability ("a Read-Only BU") you get nonsense trees.&lt;/p&gt;

&lt;h2&gt;
  
  
  The symptoms of accretion
&lt;/h2&gt;

&lt;p&gt;A project's security model has drifted when we see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Ten or more custom roles with names like "Sales Rep v2", "Sales Rep Special Cases", "Sales Rep US Mountain Region"&lt;/li&gt;
&lt;li&gt;  Users who have four or five roles stacked on one account to get the right combined access&lt;/li&gt;
&lt;li&gt;  Teams created to hold users that do not actually share record ownership (used as a proxy for group assignment)&lt;/li&gt;
&lt;li&gt;  A BU tree deeper than three levels&lt;/li&gt;
&lt;li&gt;  Business owners who cannot answer "who can see this record?" without an engineer running a query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We inherited all five symptoms at a client last year, and the fix below is what we ran.&lt;/p&gt;

&lt;h2&gt;
  
  
  The restructure in eight steps
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Inventory what exists. Export the security role matrix to a spreadsheet. Every role, every table, every privilege. This single document is the starting point of every conversation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Group roles by intent. Look at the column of privileges per role and find clusters. Most "custom roles" actually map to 3-5 intents: Read-Only, Standard User, Power User, Admin, System Admin. Anything more granular is almost always a slight variation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define capability roles. Replace the cluster from step 2 with 3-5 canonical roles:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;  acme_read_only - read on every business table, no write&lt;/li&gt;
&lt;li&gt;  acme_standard_user - read + write on assigned records (User scope)&lt;/li&gt;
&lt;li&gt;  acme_power_user - read + write + append + assign on BU scope&lt;/li&gt;
&lt;li&gt;  acme_admin - all privileges, Organization scope&lt;/li&gt;
&lt;li&gt;  Plus one optional: acme_integration for service principal integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Identify real data isolation requirements. Most projects need ZERO BUs beyond the root. Ask concretely: "If a user sees a record they should not, does the business have an audit or compliance problem?" For most internal CRMs, the answer is no. For multi-country businesses with data residency rules, the answer is yes and you need one BU per country.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Collapse the BU tree. If you have seven BUs and zero of them have a compliance rationale, move all users to the root BU and delete the others. This is the biggest unlock - most "BU-scoped access" issues disappear when the tree flattens.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use teams for ownership, not assignment. A team should own records that multiple users share responsibility for (a deal desk queue, a support triage pool). Do not create teams to hold users who share a role - that is what the role is for.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Migrate users to the new role set. Assign the new canonical roles; remove the old custom roles one by one and verify nothing breaks. Do this in a UAT mirror first, not in Prod.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lock further role creation. New security roles require an explicit ticket and justification. "We need a new role for X" usually maps to "we need to add a capability to an existing role." Making the default answer "no" keeps the model clean for the long run.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The quarterly access review we run
&lt;/h2&gt;

&lt;p&gt;Even a clean model drifts. Every quarter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Export the role-to-user assignment matrix&lt;/li&gt;
&lt;li&gt;  Flag users with three or more roles (possible consolidation opportunity)&lt;/li&gt;
&lt;li&gt;  Flag roles with only one user (possible deletion)&lt;/li&gt;
&lt;li&gt;  Spot-check ten users: "can they see what they should and nothing more?" Walk their access in the UI&lt;/li&gt;
&lt;li&gt;  Review the team membership: do all the teams still exist for their original purpose?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;45 minutes per quarter, one engineer. The output is either "no changes" or a ticket to consolidate a role. Either outcome is healthy.&lt;/p&gt;

&lt;h2&gt;
  
  
  One gotcha that catches teams out
&lt;/h2&gt;

&lt;p&gt;Changing a security role that is actively assigned to users does not always propagate immediately. Role changes take effect on the next user action or after a cache refresh, which can be up to fifteen minutes.&lt;/p&gt;

&lt;p&gt;If you remove a privilege from a role expecting the change to apply now, and a user who has that role does something in the window between your change and their cache refresh, the old privilege is still in effect.&lt;/p&gt;

&lt;p&gt;For non-urgent changes: ignore it, the cache will clear. For security-critical revocations (compromised account, departing employee): disable the user account, not the role. Account disable is immediate; role changes are eventually consistent.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do if you are in the accretion phase now
&lt;/h2&gt;

&lt;p&gt;If your project is six months in and you already see four custom roles with overlapping purpose: stop adding, don't restructure yet. Let the scope settle for another month, note every "new role" ticket that comes in without approving it, then run the eight-step restructure at month three. One week of focused work saves four weeks later.&lt;/p&gt;

&lt;p&gt;If your project is two years in and unmaintainable: block out a dedicated two-week window, do the full restructure, treat it as a one-time debt payment, and then install the quarterly review. The pain does not go away on its own.&lt;/p&gt;

</description>
      <category>powerplatform</category>
    </item>
    <item>
      <title>What to monitor in an AI agent before you launch (and after)</title>
      <dc:creator>SapotaCorp</dc:creator>
      <pubDate>Sun, 24 May 2026 05:14:17 +0000</pubDate>
      <link>https://dev.to/sapotacorp/what-to-monitor-in-an-ai-agent-before-you-launch-and-after-1bap</link>
      <guid>https://dev.to/sapotacorp/what-to-monitor-in-an-ai-agent-before-you-launch-and-after-1bap</guid>
      <description>&lt;p&gt;A founder we work with texted six weeks after launching their AI agent: "Something is wrong. Cost has tripled this month, response times have gotten worse, but I don't actually know which part of the system is the problem. Help."&lt;/p&gt;

&lt;p&gt;The system was a multi-agent customer support setup with three agents and four tools. The team had built it well in most respects: solid prompts, good tool integration, decent eval before launch. They had skipped one thing: observability. Six weeks of production traffic had gone through the system with no traces, no per-step latency tracking, no cost attribution.&lt;/p&gt;

&lt;p&gt;Diagnosing the problem took us a week of forensic work, mostly because we had to retroactively instrument what should have been there from day one. The fix took an afternoon. By the time we found the actual issue (one of the agents had started looping more often after a vendor model update), the team had spent thousands of dollars in unnecessary LLM costs and lost weeks of customer trust.&lt;/p&gt;

&lt;p&gt;This is the most preventable category of agent failure. The minimum observability stack costs almost nothing to install before launch and saves weeks of debugging when something goes wrong. Here is what Sapota ships with every production agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three failure modes observability catches
&lt;/h2&gt;

&lt;p&gt;AI agents in production fail in three structurally different ways. None of them are visible without instrumentation.&lt;/p&gt;

&lt;p&gt;Silent quality drops. The agent's responses get worse over time. The corpus drifts, the model provider updates the underlying weights, the prompt template gets edited, the user query distribution shifts. Quality degrades by 5% per week, the team does not notice for two months, by which time customer-reported errors have spiked and trust has eroded.&lt;/p&gt;

&lt;p&gt;Cost spikes. Production cost runs 3 to 10 times the test projection. Sometimes this is because production queries are more complex than test queries. Sometimes it is because a specific user is abusing the system. Sometimes it is because one agent in a multi-agent setup has started looping more often. Without per-request, per-agent cost tracking, you cannot tell which.&lt;/p&gt;

&lt;p&gt;Cascade failures. Tool A starts timing out. Agent B retries it. Agent C waits for Agent B's output. The whole system slows to a crawl, but no single component is "down" enough to alert. Latency at the 95th percentile triples, but the average looks fine. Without traces across the agent system, the failure mode is invisible.&lt;/p&gt;

&lt;p&gt;Each of these failure modes has cost real production agents real money in the audits we have run. None of them needed to happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 1: Tracing
&lt;/h2&gt;

&lt;p&gt;The minimum trace setup captures, per request:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Request ID that propagates through every step&lt;/li&gt;
&lt;li&gt;  User ID for cost attribution and abuse detection&lt;/li&gt;
&lt;li&gt;  Input and output of each LLM call (input tokens, output tokens, latency, cost)&lt;/li&gt;
&lt;li&gt;  Tool calls with input, output, latency, and any errors&lt;/li&gt;
&lt;li&gt;  Exit reason: success, max iterations hit, timeout, abort&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The implementation is not complicated. Langfuse, Opik, LangSmith, and Helicone all provide this in 10 lines of code wrapping your LLM client. The marginal cost is roughly 5% latency overhead and storage cost that runs about 1% of LLM cost. Both are negligible.&lt;/p&gt;

&lt;p&gt;We default to Langfuse self-hosted for most clients. Open source, no per-trace pricing, runs on a single small VM. LangSmith is fine if you are already in the LangChain ecosystem. Opik is good if you want minimal setup. Pick one and install it before launch.&lt;/p&gt;

&lt;p&gt;The practical value: when something breaks, the traces tell you exactly where. The founder's looping agent showed up in traces as "Agent B exit reason: max_iter" appearing 4x more often than the baseline. Without traces, that pattern was invisible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 2: Metrics
&lt;/h2&gt;

&lt;p&gt;Aggregated dashboards built on top of traces. The metrics that matter for production agents:&lt;/p&gt;

&lt;p&gt;Quality metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Task completion rate (% of requests where the agent reaches a "success" exit, not max_iter or abort)&lt;/li&gt;
&lt;li&gt;  Faithfulness score (if you have a faithfulness gate, track its output)&lt;/li&gt;
&lt;li&gt;  User satisfaction (thumbs up rate, if you have feedback collection)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Performance metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Latency at p50, p95, p99 (averages hide the tail)&lt;/li&gt;
&lt;li&gt;  Iteration count per task (high iteration = likely loop death)&lt;/li&gt;
&lt;li&gt;  Tool call frequency by tool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cost metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Cost per task (mean, p95, max)&lt;/li&gt;
&lt;li&gt;  Cost per user per day (to catch abuse and runaway costs)&lt;/li&gt;
&lt;li&gt;  Cost by model (if using a multi-model setup)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reliability metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Error rate by error type (transient, permanent, validation)&lt;/li&gt;
&lt;li&gt;  Tool failure rate by tool&lt;/li&gt;
&lt;li&gt;  Fallback usage rate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A production dashboard with eight to twelve metrics covers most of what you need to see. We typically build this in Grafana on top of the trace store, or in the native dashboards of Langfuse / Opik / LangSmith.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 3: Alerts
&lt;/h2&gt;

&lt;p&gt;Metrics on a dashboard are useful for investigation. Alerts are what catch problems before customers do.&lt;/p&gt;

&lt;p&gt;The minimum alert set:&lt;/p&gt;

&lt;p&gt;Error rate spike. Alert if error rate exceeds 5% over any 5-minute window. Catches sudden breakage from a tool outage, model provider issue, or bad deploy.&lt;/p&gt;

&lt;p&gt;Latency degradation. Alert if p95 latency exceeds your SLA for 10 minutes. Catches gradual degradation from increased load, retry loops, or downstream service issues.&lt;/p&gt;

&lt;p&gt;Cost overrun. Alert if daily cost exceeds budget threshold. Catches runaway loops, abuse, or pricing changes from the model provider.&lt;/p&gt;

&lt;p&gt;Quality regression. Alert if weekly eval scores drop more than 5%. Catches the silent degradation that customers notice before you do.&lt;/p&gt;

&lt;p&gt;Critical service unavailability. Alert if a circuit breaker opens on any external dependency. Catches cascading failures before they cascade.&lt;/p&gt;

&lt;p&gt;Five alert rules is the minimum. Most production agents accumulate more over time as the team learns what to watch. The pattern that does not work: zero alert rules, "we'll check the dashboard sometimes."&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 4: Eval pipeline
&lt;/h2&gt;

&lt;p&gt;The fourth layer is offline evaluation. Traces tell you what is happening in production. Eval tells you whether what is happening is actually correct.&lt;/p&gt;

&lt;p&gt;The minimum eval pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  A ground-truth eval set of 100 to 500 questions with expected answers&lt;/li&gt;
&lt;li&gt;  A weekly cron that runs the eval set through the production pipeline&lt;/li&gt;
&lt;li&gt;  Tracking of pass rate over time&lt;/li&gt;
&lt;li&gt;  An alert if pass rate drops more than 5% week over week&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We use Ragas as the metric layer for most clients. It computes faithfulness, answer relevance, context recall, and answer correctness from a single eval run. The compute cost for a 100-question weekly eval is under $5.&lt;/p&gt;

&lt;p&gt;The eval set is the artifact most teams underestimate. It needs to represent the real production query distribution, not just what the engineering team thinks users will ask. The way to build a good eval set is to wait until you have two weeks of production traffic, sample 100 actual queries, write expected answers, and use that as the ground truth going forward. Refresh quarterly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we found in the founder's system
&lt;/h2&gt;

&lt;p&gt;When we instrumented the agent six weeks after launch, the traces immediately showed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  One specific agent (the routing agent) had started exiting with "max_iter" 4x more often than at launch&lt;/li&gt;
&lt;li&gt;  Latency p95 had drifted from 4 seconds to 12 seconds&lt;/li&gt;
&lt;li&gt;  Cost per task had drifted from $0.04 to $0.13&lt;/li&gt;
&lt;li&gt;  The change had started exactly when the underlying model provider had pushed an update three weeks earlier&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fix was prompt-level: the new model version interpreted one specific instruction differently than the old one, and the routing agent kept asking for clarification on requests that the old model handled directly. We tightened the routing prompt with two more few-shot examples. The agent stopped looping. Costs and latency returned to baseline.&lt;/p&gt;

&lt;p&gt;The whole investigation was four hours once we had traces. Without traces, the team had spent six weeks unable to even pin down what was wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The recommendation: install before launch
&lt;/h2&gt;

&lt;p&gt;The argument for waiting on observability is "we're moving fast, we'll add it after the launch." The argument for installing it before launch is that the cost of installing it later is much higher (retroactive instrumentation, lost data, catastrophic surprises) and the cost of installing it now is almost zero.&lt;/p&gt;

&lt;p&gt;The minimum stack:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Tracing (Langfuse / Opik / LangSmith): 1 day to install, propagate request IDs through your code&lt;/li&gt;
&lt;li&gt; Dashboard with 8-12 metrics: 1 day to build on top of traces&lt;/li&gt;
&lt;li&gt; Five alert rules: 1 day to define and wire to your alerting system&lt;/li&gt;
&lt;li&gt; Eval pipeline (Ragas + cron): 2 days to set up the eval set and weekly run&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One week of work. Less than the time spent debugging a single mystery production issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  If your agent is in production with no traces
&lt;/h2&gt;

&lt;p&gt;If your team has shipped an AI agent and you would not be able to answer "why did response time get worse last week" with data, the gap is observability.&lt;/p&gt;

&lt;p&gt;Sapota offers a one-week observability implementation that installs Langfuse (or your preferred stack), instruments your existing agent code, builds the production dashboard, configures the alert rules, and sets up the eval pipeline. We have done this for half a dozen agent systems, mostly for teams that shipped without instrumentation and started having mystery production issues.&lt;/p&gt;

&lt;p&gt;Reach out via the &lt;a href="https://www.sapotacorp.vn/ai-and-machine-learning" rel="noopener noreferrer"&gt;AI engineering page&lt;/a&gt; with your current agent stack and what kind of issues you are seeing. The first conversation usually surfaces which layer is missing and what to install first.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Canvas app performance: delegation, caching, and lazy patterns</title>
      <dc:creator>SapotaCorp</dc:creator>
      <pubDate>Sun, 24 May 2026 05:13:57 +0000</pubDate>
      <link>https://dev.to/sapotacorp/canvas-app-performance-delegation-caching-and-lazy-patterns-fhn</link>
      <guid>https://dev.to/sapotacorp/canvas-app-performance-delegation-caching-and-lazy-patterns-fhn</guid>
      <description>&lt;p&gt;A Canvas app we inherited took 14 seconds to load. The screens were responsive once visible, but the initial splash sat blank. Users hated it. After three days of work, the same app loaded in 2.1 seconds. The fix was not one clever trick - it was four patterns applied in sequence. Here they are, in the order we apply them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: measure, don't guess
&lt;/h2&gt;

&lt;p&gt;Before optimizing anything, run the Canvas app's built-in Monitor tool against a realistic session. Monitor records every operation - data source call, formula execution, screen change - with duration.&lt;/p&gt;

&lt;p&gt;Sort by duration descending. The top 10 operations are almost always where 80% of the time is spent. Without this, you can spend a day optimizing formulas that were never the bottleneck.&lt;/p&gt;

&lt;p&gt;Typical findings on a slow app:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  App.OnStart runs 3 sequential data source queries, each taking 1-2 seconds → 4-6 seconds before any screen is visible.&lt;/li&gt;
&lt;li&gt;  Screen.OnVisible on the default screen runs another query → additional 1-2 seconds before the screen is interactive.&lt;/li&gt;
&lt;li&gt;  Formulas depending on LookUp against a 5000-row table re-execute on every control refresh.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these has a pattern-fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 1: don't wait on App.OnStart
&lt;/h2&gt;

&lt;p&gt;The common mistake: loading "everything the app needs" in App.OnStart before any screen renders. Every query in OnStart is time the user stares at a splash screen.&lt;/p&gt;

&lt;p&gt;The fix: load only what the first screen needs. Later screens load their own data on OnVisible, cached in a collection for subsequent visits.&lt;/p&gt;

&lt;p&gt;The app is interactive in 200-500ms. Data for each screen loads as the user navigates to it. Subsequent visits are instant because the collection is cached.&lt;/p&gt;

&lt;p&gt;Trade-off: on slow networks, the first visit to each screen has a perceptible pause. For most business apps this is acceptable; for latency-critical scenarios, preload in the background using Concurrent().&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 2: Concurrent() for independent queries
&lt;/h2&gt;

&lt;p&gt;If App.OnStart genuinely needs multiple queries before the app can render (rare), run them in parallel instead of sequentially.&lt;/p&gt;

&lt;p&gt;The Concurrent function runs all its arguments in parallel. Total duration is the slowest single query, not the sum. For three 1-second queries, you save 2 seconds.&lt;/p&gt;

&lt;p&gt;Caveat: queries that depend on each other cannot be run concurrently. varUserSettings that uses the result of varCategories must stay sequential (or be split into two phases).&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 3: cache lookups in collections
&lt;/h2&gt;

&lt;p&gt;A formula like LookUp(Users, ID = item.UserID).Name inside a gallery's label re-executes on every render. For a gallery of 50 items, that's 50 lookups per paint. If the gallery re-paints on scroll, thousands of lookups happen.&lt;/p&gt;

&lt;p&gt;Fix: load the lookup table once into a collection, index into the collection from the formula.&lt;/p&gt;

&lt;p&gt;LookUp against an in-memory collection is O(n) but the n is small and the operation is microseconds. Against a Dataverse table, it is a server round trip per call.&lt;/p&gt;

&lt;p&gt;For repeated lookups on the same key, build a hash-like collection:&lt;/p&gt;

&lt;p&gt;The reduction in columns keeps the collection small; the in-memory lookup is fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 4: lazy loading galleries
&lt;/h2&gt;

&lt;p&gt;A gallery that shows 2000 items scrolls slowly because every item's controls re-render on scroll. The Canvas runtime has virtualized rendering but only when the gallery's data source supports paging.&lt;/p&gt;

&lt;p&gt;Fix: limit the gallery to a paged subset.&lt;/p&gt;

&lt;p&gt;With pageSize = 50, the gallery holds at most 50 rendered items regardless of the underlying data size. Navigation controls let the user page through.&lt;/p&gt;

&lt;p&gt;For "infinite scroll" UX, trigger Collect() to append more items as the user approaches the end - but this only works for small growth steps before the gallery itself becomes a bottleneck again.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 5: screen transition timing
&lt;/h2&gt;

&lt;p&gt;Screen transitions have a built-in animation. On slow devices, the animation itself adds 300-500ms to perceived navigation.&lt;/p&gt;

&lt;p&gt;Settings → Advanced settings → "Enable transitions" can be turned off. For business apps where speed matters more than polish, we disable transitions globally.&lt;/p&gt;

&lt;p&gt;For specific transitions where animation aids UX (a modal sliding in), override just that screen to keep transitions on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 6: reduce the visual complexity
&lt;/h2&gt;

&lt;p&gt;Canvas apps render all controls on the current screen, even those off-screen. A screen with 200 controls (nested galleries, headers, footers, hidden panels) is slower than a screen with 50.&lt;/p&gt;

&lt;p&gt;Fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Minimize hidden-by-default panels (use separate screens instead).&lt;/li&gt;
&lt;li&gt;  Combine repetitive controls into a gallery where possible (instead of 10 labels, one gallery of 10 items).&lt;/li&gt;
&lt;li&gt;  Remove controls that only serve design purposes and can be replaced with background images.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is visual diet; it helps on low-end mobile devices more than on desktop.&lt;/p&gt;

&lt;h2&gt;
  
  
  The measurement pipeline we run
&lt;/h2&gt;

&lt;p&gt;Every time a performance-sensitive Canvas app gets a significant change, we re-run Monitor with a standardized script:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Launch the app.&lt;/li&gt;
&lt;li&gt; Wait for first screen to be interactive.&lt;/li&gt;
&lt;li&gt; Navigate to three key screens in sequence.&lt;/li&gt;
&lt;li&gt; Perform a sample action on each (open a form, filter a gallery).&lt;/li&gt;
&lt;li&gt; Record Monitor output.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Compare the key durations to the previous baseline. Any regression triggers investigation before merge.&lt;/p&gt;

&lt;p&gt;The baselines live in a spreadsheet the team updates. Over two years, this has caught dozens of regressions before they hit users.&lt;/p&gt;

&lt;h2&gt;
  
  
  The inherited-app checklist
&lt;/h2&gt;

&lt;p&gt;When we inherit a slow Canvas app, our first pass is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Measure with Monitor.&lt;/li&gt;
&lt;li&gt; Check App.OnStart for sequential queries - move them to Concurrent or to per-screen OnVisible.&lt;/li&gt;
&lt;li&gt; Check formulas inside galleries for repeated LookUps - cache into collections.&lt;/li&gt;
&lt;li&gt; Check gallery sizes - paginate if over 500 rows.&lt;/li&gt;
&lt;li&gt; Disable transitions if the team accepts it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three days of work, typically. For the 14-second app mentioned earlier, the fix was:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Two of three App.OnStart queries moved to per-screen → saved 3 seconds.&lt;/li&gt;
&lt;li&gt;  Gallery LookUp against full Users table cached → saved 1.5 seconds per gallery paint × several paints at startup = 4+ seconds.&lt;/li&gt;
&lt;li&gt;  Transitions disabled → saved ~400ms per navigation.&lt;/li&gt;
&lt;li&gt;  Remaining optimizations: minor formula tweaks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The app now loads in 2.1 seconds and feels snappy. Users stopped complaining. And our team finally understands why Canvas apps sometimes feel slow even when the Power Fx formulas "look fine."&lt;/p&gt;

</description>
      <category>powerplatform</category>
    </item>
    <item>
      <title>Seven Types of Data Extensions We Use on SFMC Projects</title>
      <dc:creator>SapotaCorp</dc:creator>
      <pubDate>Sun, 24 May 2026 02:52:26 +0000</pubDate>
      <link>https://dev.to/sapotacorp/seven-types-of-data-extensions-we-use-on-sfmc-projects-83o</link>
      <guid>https://dev.to/sapotacorp/seven-types-of-data-extensions-we-use-on-sfmc-projects-83o</guid>
      <description>&lt;p&gt;"Data Extension" is the generic term, but SFMC supports several DE types with different behaviors. Knowing the distinctions saves you from trying to send email from a lookup DE or forgetting to refresh a Filtered DE before a campaign.&lt;/p&gt;

&lt;p&gt;Here's the reference we hand to every new engineer.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Sendable DE
&lt;/h2&gt;

&lt;p&gt;The main type used for sending email. Requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Is Sendable = true in Properties&lt;/li&gt;
&lt;li&gt;  A field of type EmailAddress&lt;/li&gt;
&lt;li&gt;  A Send Relationship mapping a field (usually Primary Key) to the Subscriber Key&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the DE you pick when you send a campaign or set up a Journey with Data Extension entry source. No send can happen without one.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Lookup DE (Non-sendable)
&lt;/h2&gt;

&lt;p&gt;Reference data that AMPscript Lookup() pulls into email templates at render time. Common examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Sales rep directory: SalesRep_DE with RepID, name, email, phone&lt;/li&gt;
&lt;li&gt;  Product catalog: Product_DE with SKU, name, description, price&lt;/li&gt;
&lt;li&gt;  Store locations: Store_DE with StoreID, address, hours&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not sendable, no EmailAddress field needed. Used by the email, not as the source of the send.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Filtered DE
&lt;/h2&gt;

&lt;p&gt;A DE created by applying a Data Filter to another DE - point-and-click segmentation without SQL. Example: "subscribers from Master_DE where MemberTier = Gold" produces a Gold-only Filtered DE.&lt;/p&gt;

&lt;p&gt;Important: Filtered DEs need to be refreshed to reflect the current state of the source DE. They don't auto-update.&lt;/p&gt;

&lt;p&gt;Refresh via:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Filter Activity in Automation Studio (scheduled or ad hoc)&lt;/li&gt;
&lt;li&gt;  Manually in Email Studio&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the source DE changes (new imports, updates) but the Filtered DE isn't refreshed, the Filtered DE holds stale data. Campaigns targeting it send to outdated segments.&lt;/p&gt;

&lt;p&gt;For anything more complex than single-attribute filtering (joins, calculations), use SQL Query Activity writing to a standard DE, not a Filtered DE.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Random DE
&lt;/h2&gt;

&lt;p&gt;Splits a source DE into N equal random chunks. Use case: A/B/N testing where you need 10 equal groups to test 10 email variants.&lt;/p&gt;

&lt;p&gt;No SQL needed. Configure the split percentages in the UI; SFMC assigns rows randomly.&lt;/p&gt;

&lt;p&gt;Caveats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The "random" assignment happens at creation; refreshing re-randomizes, potentially putting the same subscriber into different buckets.&lt;/li&gt;
&lt;li&gt;  For more sophisticated random assignment (e.g. Journey Builder's Random Split), the Journey tool is usually the better fit.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  5. Shared DE
&lt;/h2&gt;

&lt;p&gt;A DE placed in the Shared Data Extensions folder in the parent Business Unit. Multiple child BUs in the Enterprise account can access the DE without copying it.&lt;/p&gt;

&lt;p&gt;Use when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The same reference data (product catalog, store list) is needed across multiple brands or regions.&lt;/li&gt;
&lt;li&gt;  You want a single source of truth rather than syncing copies between BUs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Access permissions are set via Shared Data Extension Permissions - which BUs can read/write.&lt;/p&gt;

&lt;p&gt;Watch for unintended cross-BU writes: if two BUs can write to the same Shared DE, coordinate schemas and import schedules.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Send Log DE
&lt;/h2&gt;

&lt;p&gt;A special DE that logs every email send - who received what, when, subject line, etc. Useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Audit requirements beyond SFMC's default 10-day retention&lt;/li&gt;
&lt;li&gt;  Join with other data to build custom reporting&lt;/li&gt;
&lt;li&gt;  Investigate specific send events&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Created from the TriggeredSendDataExtension template when used with Triggered Sends.&lt;/p&gt;

&lt;p&gt;Caveat: Test Sends don't write to Send Log. Only production sends do. If you're testing and expecting the Send Log to populate, it won't.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. DE with Data Retention Policy
&lt;/h2&gt;

&lt;p&gt;Any DE can have a retention policy configured on creation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Delete records older than X days - SFMC auto-purges rows past the threshold&lt;/li&gt;
&lt;li&gt;  Delete data and DE after X days - whole DE goes away&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Temporary event signup DEs (auto-purge 30 days after event)&lt;/li&gt;
&lt;li&gt;  PII-heavy DEs that shouldn't retain data beyond a defined window&lt;/li&gt;
&lt;li&gt;  Any Send Log or archive DE to prevent unbounded growth&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Set retention at creation if possible. Adding it later works but doesn't apply retroactively to existing rows until the next automation evaluation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Picking the right type on a new DE
&lt;/h2&gt;

&lt;p&gt;NeedDE TypeSend email from this listSendableReference data AMPscript will look upLookupSimple attribute-based segmentFilteredA/B test random splitsRandomCross-BU shared referenceSharedAudit / custom trackingSend LogAuto-purge old dataDE with Retention Policy&lt;/p&gt;

&lt;p&gt;Most projects end up with a mix: one or two Sendable DEs, several Lookup DEs, possibly a Shared DE for multi-brand setups, and retention policies on anything transient.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;Naming the DE type right in your head before creating it prevents architecture rebuilds. The decision takes seconds; fixing an incorrectly-typed DE after it's been loaded with data can take a day.&lt;/p&gt;




&lt;p&gt;Designing SFMC data architecture? Our Salesforce team ships Data Extension layouts, shared-BU patterns, and retention strategies on production engagements. &lt;a href="https://www.sapotacorp.vn/contact" rel="noopener noreferrer"&gt;Get in touch -&amp;gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;See our full &lt;a href="https://www.sapotacorp.vn/service" rel="noopener noreferrer"&gt;platform services&lt;/a&gt; for the stack we cover.&lt;/p&gt;

</description>
      <category>marketingcloud</category>
    </item>
    <item>
      <title>Rollup vs calculated columns in Dataverse: the async trap we fell for</title>
      <dc:creator>SapotaCorp</dc:creator>
      <pubDate>Sun, 24 May 2026 02:52:20 +0000</pubDate>
      <link>https://dev.to/sapotacorp/rollup-vs-calculated-columns-in-dataverse-the-async-trap-we-fell-for-1mh1</link>
      <guid>https://dev.to/sapotacorp/rollup-vs-calculated-columns-in-dataverse-the-async-trap-we-fell-for-1mh1</guid>
      <description>&lt;p&gt;A deal desk dashboard showed the running total of opportunities per account. Total amount per account was a rollup column. Users opened the dashboard, saw a total, made a decision. Then someone added a new opportunity and checked the same account. The account's total did not change.&lt;/p&gt;

&lt;p&gt;Fifteen minutes later, refresh - still not changed. An hour later - changed. The rollup column was working exactly as documented, and the team had mistaken it for a real-time aggregate.&lt;/p&gt;

&lt;p&gt;Here is the difference between calculated and rollup columns, and the pattern we now use when the dashboard needs to be current.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two column types
&lt;/h2&gt;

&lt;p&gt;Calculated column: computes its value every time the row is read. If the column adds two other columns, every time anything loads the row (form view, API call, view row), Dataverse computes the sum. Results are always current because they are re-derived on demand.&lt;/p&gt;

&lt;p&gt;Rollup column: computes its value on a schedule. The default schedule is every 12 hours, configurable down to one hour per column. A rollup that sums child record amounts reads the current state of children when the schedule fires, stores the result, and serves that stored value until the next schedule.&lt;/p&gt;

&lt;p&gt;Both store their definition (the formula) in the solution. Both look identical to consuming code. The runtime behavior is radically different.&lt;/p&gt;

&lt;h2&gt;
  
  
  When each one is correct
&lt;/h2&gt;

&lt;p&gt;Calculated columns are correct when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The formula depends only on fields on the same row, or on a parent's fields (via lookup)&lt;/li&gt;
&lt;li&gt;  The computation is cheap (arithmetic, string concatenation, conditionals)&lt;/li&gt;
&lt;li&gt;  You need the value to be current every time it is read&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Rollup columns are correct when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The formula aggregates across many child records (SUM, COUNT, AVG, MIN, MAX)&lt;/li&gt;
&lt;li&gt;  The underlying data changes frequently but the aggregate does not need to be current within seconds&lt;/li&gt;
&lt;li&gt;  Scheduling async recomputation is acceptable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A calculated column that aggregates children is not allowed. Dataverse blocks it - the reason is performance, reading an account would trigger a query across all its opportunities every load. Rollup is the platform's answer: compute the aggregate ahead of time, serve it cached.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the async nature bites
&lt;/h2&gt;

&lt;p&gt;Teams who want a rolling total of child records are usually asked for it by a business user who sees the number on a dashboard and expects it to match reality. When a user tells the team "the total on the Account form should show all linked Opportunity amounts," the team builds a rollup column. The dashboard shows the rollup.&lt;/p&gt;

&lt;p&gt;The user then creates an opportunity, looks at the account, and the total is stale. From the user's perspective, the number is wrong. From the platform's perspective, the number is accurate for the last time the schedule fired.&lt;/p&gt;

&lt;p&gt;The fix is either to lower the user's expectations (the dashboard updates on a schedule) or to change the implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-time totals: three alternatives
&lt;/h2&gt;

&lt;p&gt;When the aggregate genuinely needs to be current:&lt;/p&gt;

&lt;p&gt;Alternative 1: plugin that updates a stored total on every child change.&lt;/p&gt;

&lt;p&gt;A post-operation plugin on Opportunity (Create, Update on amount, Delete) recalculates acme_total_opportunities on the parent Account and writes it. Works for modest volumes - an account with 200 opportunities, each updating rarely, is fine. Fails for accounts with 20,000 opportunities updating frequently - every child change becomes a parent write, and the Dataverse execution pipeline starts throttling.&lt;/p&gt;

&lt;p&gt;Good for: master-detail relationships with medium fanout.&lt;/p&gt;

&lt;p&gt;Alternative 2: calculated column summing via the lookup.&lt;/p&gt;

&lt;p&gt;You cannot aggregate children from a parent via a calculated column, but you can compute the child's contribution per row using a calculated column on the child. acme_attributable_amount = IF(acme_is_counted, amount, 0) on Opportunity, then use that in any consumer query. The consumer does the aggregation at read time.&lt;/p&gt;

&lt;p&gt;Works for: dashboards that can run FetchXML with aggregate operators (SUM the acme_attributable_amount across children in a single query). Fails if the consumer is a Dataverse form or view that cannot run aggregate queries.&lt;/p&gt;

&lt;p&gt;Good for: Power BI reports, custom dashboards, anywhere you control the read query.&lt;/p&gt;

&lt;p&gt;Alternative 3: rollup with one-hour schedule plus "Refresh" button.&lt;/p&gt;

&lt;p&gt;Keep the rollup for the baseline case. Add a button on the form that explicitly triggers a rollup recalculation via the CalculateRollupField SDK request. User experiences it as "I just added a child, now I hit Refresh, the total updates."&lt;/p&gt;

&lt;p&gt;Works for: dashboards with bursty usage patterns - the baseline is acceptable, explicit refresh covers the "just changed it" case.&lt;/p&gt;

&lt;p&gt;Good for: sales management screens, deal desk reviews, scenarios where users understand the refresh semantics.&lt;/p&gt;

&lt;h2&gt;
  
  
  The audit we run on rollups
&lt;/h2&gt;

&lt;p&gt;On any project with more than three rollup columns, we audit quarterly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Is the rollup still in use? Rollups accumulate - someone adds one for a requirement that later gets dropped, the column stays. Deprecated rollups still consume schedule slots.&lt;/li&gt;
&lt;li&gt; Is the schedule frequency right? A rollup set to 12 hours when the underlying data changes every hour is both wrong (stale) and wasteful (the 12-hour window sees many state changes).&lt;/li&gt;
&lt;li&gt; Is the formula still correct? Filters in rollup definitions reference specific statecode/statuscode values. When those values change semantics in later releases, rollups silently compute the wrong thing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The audit takes 20 minutes per project. Half the time, at least one rollup can be deleted; a third of the time, at least one schedule needs adjusting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotcha: deleting a child does not trigger rollup
&lt;/h2&gt;

&lt;p&gt;A rollup of child-record amounts updates when a child is created or its amount changes. It does not update when a child is deleted, until the scheduled refresh fires.&lt;/p&gt;

&lt;p&gt;The window between "child deleted" and "rollup refreshed" shows an overstated total. This is especially painful in audit scenarios - a user deletes a duplicate, the total still shows the duplicate's contribution for up to twelve hours.&lt;/p&gt;

&lt;p&gt;The fix is either a plugin on the child Delete event (post-operation, trigger a rollup recalculation explicitly via CalculateRollupFieldRequest) or a tighter schedule. We usually go with the plugin when the table supports deletions, because the time-lag is user-visible.&lt;/p&gt;

&lt;h2&gt;
  
  
  The rule we enforce
&lt;/h2&gt;

&lt;p&gt;Rollup columns ship with a form tooltip: "This total refreshes every N hours. Click Refresh after adding new records to see updated totals immediately."&lt;/p&gt;

&lt;p&gt;The tooltip is one field-description line, it costs nothing to add, and it inoculates the "why is the number stale" ticket that would otherwise land every other week. Every rollup we ship has it. Every one we inherit gets it added on the first pass.&lt;/p&gt;

</description>
      <category>powerplatform</category>
    </item>
    <item>
      <title>MES integration with D365 Supply Chain: Azure middleware pattern</title>
      <dc:creator>SapotaCorp</dc:creator>
      <pubDate>Sun, 24 May 2026 02:52:14 +0000</pubDate>
      <link>https://dev.to/sapotacorp/mes-integration-with-d365-supply-chain-azure-middleware-pattern-4698</link>
      <guid>https://dev.to/sapotacorp/mes-integration-with-d365-supply-chain-azure-middleware-pattern-4698</guid>
      <description>&lt;p&gt;Manufacturers running Dynamics 365 Supply Chain Management almost always also run a dedicated Manufacturing Execution System (MES) on the shop floor. Production order updates, inventory movements, quality tests, and traceability data flow between them continuously. The integration has to be low-latency (shop floor runs on seconds, not hours), high-throughput (hundreds of events per minute at peak), and reliable (lost messages mean lost traceability).&lt;/p&gt;

&lt;p&gt;Three integration patterns come up in evaluations. Two have documented failure modes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The options that don't fit manufacturing
&lt;/h2&gt;

&lt;p&gt;Nightly batch jobs via Data Management Framework. Designed for bulk data movement, not real-time signaling. Production orders complete hours before D365 knows about it. Real-time inventory view is always lagging. Traceability data arrives after the batches have shipped.&lt;/p&gt;

&lt;p&gt;Custom OData polling with a loop that queries MES every few seconds. Introduces polling overhead for no latency benefit, and MES systems aren't typically designed to handle heavy poll loads. Also creates a custom code dependency that needs maintenance.&lt;/p&gt;

&lt;p&gt;Database-level triggers on the MES database pushing directly to F&amp;amp;O's database. Breaks supportability completely. D365 F&amp;amp;O is a managed platform - direct database writes aren't supported, aren't upgrade-safe, and will break the next time Microsoft changes schema. Also creates a security nightmare (MES has privileged write access to F&amp;amp;O's database?).&lt;/p&gt;

&lt;p&gt;The only answer that fits the requirements is Azure middleware between the two systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Azure-native pattern
&lt;/h2&gt;

&lt;p&gt;Logic Apps or Service Bus as middleware between MES and D365, with F&amp;amp;O Business Events on the D365 side.&lt;/p&gt;

&lt;p&gt;What each piece does:&lt;/p&gt;

&lt;p&gt;Azure Service Bus for the guaranteed-delivery, ordered messaging. Production-order status updates, inventory moves, quality-test results flow through Service Bus queues with FIFO ordering per production order.&lt;/p&gt;

&lt;p&gt;Azure Logic Apps for the orchestration where branching and transformation happen. A pick-complete event from MES fires a Logic App that transforms the payload, updates inventory in D365, and triggers the next production-flow message back to MES.&lt;/p&gt;

&lt;p&gt;F&amp;amp;O Business Events for the D365-side publishing. When a production order is created, released, or completed in F&amp;amp;O, a business event fires to Service Bus or Event Grid. MES subscribers pick it up.&lt;/p&gt;

&lt;p&gt;Custom Services on F&amp;amp;O for the inbound - when MES has a state change D365 needs to record, the Logic App (or Function) calls a custom service endpoint on F&amp;amp;O. Custom services are designed for low-latency targeted writes, unlike data entities which are bulk-optimized.&lt;/p&gt;

&lt;h2&gt;
  
  
  Traceability architecture
&lt;/h2&gt;

&lt;p&gt;Traceability is specific in manufacturing - regulators and customers need to know which raw materials went into which finished-goods batch. D365's batch tracking combines with MES's shop-floor batch recording to produce the full lineage. The integration ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  MES tracks the physical movement (machine X processed batch Y at time Z)&lt;/li&gt;
&lt;li&gt;  D365 records the ERP-level batch (raw material batch A consumed in production order B producing finished batch C)&lt;/li&gt;
&lt;li&gt;  Integration correlates the two via batch numbers and production order references&lt;/li&gt;
&lt;li&gt;  Recall scenarios can trace backward from sold finished goods to source materials, or forward from suspect materials to affected finished goods&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The integration isn't just about moving data - it's about keeping the correlation intact under all failure modes.&lt;/p&gt;

&lt;h2&gt;
  
  
  High-throughput considerations
&lt;/h2&gt;

&lt;p&gt;At manufacturing scale (large plants with multiple lines, each firing events per minute), throughput planning matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Service Bus sizing - standard tier suffices for most deployments; Premium only when message volumes exceed the standard tier's throughput units&lt;/li&gt;
&lt;li&gt;  Logic Apps concurrency - configured per workflow, default is 20 concurrent runs; high-throughput flows need higher&lt;/li&gt;
&lt;li&gt;  F&amp;amp;O write capacity - custom services are faster than data entities for single-record writes; batching is appropriate when MES aggregates multiple updates&lt;/li&gt;
&lt;li&gt;  Dead-letter monitoring - alerts when Service Bus DLQ gets non-zero entries; usually indicates transformation errors that need human review&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Reliability patterns
&lt;/h2&gt;

&lt;p&gt;Manufacturing can't afford lost messages. The architecture carries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Retry with exponential backoff in Logic Apps for transient failures&lt;/li&gt;
&lt;li&gt;  Dead-letter queues for poison messages&lt;/li&gt;
&lt;li&gt;  Idempotency keys on custom service calls so retry doesn't double-record&lt;/li&gt;
&lt;li&gt;  Correlation IDs flowing end-to-end for cross-system debugging&lt;/li&gt;
&lt;li&gt;  Monitoring dashboards on message throughput and latency per queue&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Direction-specific patterns
&lt;/h2&gt;

&lt;p&gt;Flows in each direction have different shapes:&lt;/p&gt;

&lt;p&gt;MES → D365 (updates ERP from shop floor):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  MES publishes to Service Bus topic&lt;/li&gt;
&lt;li&gt;  Logic App subscribes, transforms, calls F&amp;amp;O custom service&lt;/li&gt;
&lt;li&gt;  F&amp;amp;O updates inventory, production order status, quality records&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;D365 → MES (issues work to shop floor):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  F&amp;amp;O business event on production order release&lt;/li&gt;
&lt;li&gt;  Event flows to Service Bus&lt;/li&gt;
&lt;li&gt;  MES subscriber picks up the work package, distributes to lines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bidirectional correlation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Correlation IDs in headers&lt;/li&gt;
&lt;li&gt;  Handshake patterns for critical state transitions (e.g., "order ready" → "order accepted by line")&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where custom code fits
&lt;/h2&gt;

&lt;p&gt;Not everything is Logic Apps. Sometimes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Azure Functions for complex transformations Logic Apps can't express cleanly&lt;/li&gt;
&lt;li&gt;  Custom services in F&amp;amp;O for writes standard entities don't cover&lt;/li&gt;
&lt;li&gt;  Durable Functions for long-running orchestrations (multi-day production runs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each has a clear use case. Reach for them when the declarative tool hits a limit, not as default.&lt;/p&gt;

&lt;h2&gt;
  
  
  What ships with the architecture
&lt;/h2&gt;

&lt;p&gt;A working MES-to-D365 integration has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Service Bus queues and topics partitioned by production-order or line&lt;/li&gt;
&lt;li&gt;  Logic Apps for each directional flow with transformation logic&lt;/li&gt;
&lt;li&gt;  Business events published on production order lifecycle&lt;/li&gt;
&lt;li&gt;  Custom services on F&amp;amp;O for inbound writes from MES&lt;/li&gt;
&lt;li&gt;  Dead-letter queue monitoring with alerting&lt;/li&gt;
&lt;li&gt;  Traceability validation - spot-check recall scenarios quarterly&lt;/li&gt;
&lt;li&gt;  Runbook for extending the integration when a new MES module is added&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern is architect-grade because manufacturing systems won't tolerate the simpler options. Azure middleware is the supported, scalable, maintainable middle.&lt;/p&gt;

</description>
      <category>dynamics365</category>
    </item>
    <item>
      <title>Custom API vs Custom Action vs Azure Function: Dataverse decision</title>
      <dc:creator>SapotaCorp</dc:creator>
      <pubDate>Sun, 24 May 2026 02:52:09 +0000</pubDate>
      <link>https://dev.to/sapotacorp/custom-api-vs-custom-action-vs-azure-function-dataverse-decision-2lo4</link>
      <guid>https://dev.to/sapotacorp/custom-api-vs-custom-action-vs-azure-function-dataverse-decision-2lo4</guid>
      <description>&lt;p&gt;A client needs to expose a "calculate the loyalty rebate for this customer" operation. It reads three Dataverse tables, applies some business rules, writes a result. Every consumer - the Dynamics web app, a Power Automate flow, an external integration - should call the same operation.&lt;/p&gt;

&lt;p&gt;Three places we could put it. Three different cost, latency, and scale profiles. Here is the matrix we now run on every "new operation" request.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three options
&lt;/h2&gt;

&lt;p&gt;Custom Action (legacy): a process defined in Dataverse that can be invoked through the SDK. Steps are Workflow Activity Actions. Old-school but still widely deployed.&lt;/p&gt;

&lt;p&gt;Custom API (modern): the successor to custom actions. Defined as Dataverse entity rows (customapi, customapiRequestParameter, customapiResponseProperty), backed by a plugin that implements the logic. Exposed through the Web API with a typed OpenAPI schema.&lt;/p&gt;

&lt;p&gt;Azure Function: fully external .NET function, invoked from Power Automate or direct HTTP. Runs in its own compute, scales separately, has its own pricing.&lt;/p&gt;

&lt;p&gt;The three solve overlapping problems. The choice is about latency, cost structure, and who maintains the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision matrix
&lt;/h2&gt;

&lt;p&gt;DimensionCustom ActionCustom APIAzure FunctionLatencyLow (in-process)Low (in-process)Medium (cross-service)Concurrency limitDataverse'sDataverse'sFunction App'sTimeout2 minutes2 minutes10 minutes (Consumption), configurable higherCostIncluded in DataverseIncluded in DataversePer-execution + computeLong-running workNoNoYes (durable functions)External dependency callsLimited (sandbox)Limited (sandbox)Full flexibilityOpenAPI schemaNoYesManualInvocable from flowYesYes (first-class)Yes (HTTP connector)Maintenance.NET + Workflow.NET.NET (preferred)&lt;/p&gt;

&lt;h2&gt;
  
  
  When each one is correct
&lt;/h2&gt;

&lt;p&gt;Custom Action is correct when you are maintaining an existing system that already uses them. For new work, skip them - custom APIs are the modern equivalent with better tooling.&lt;/p&gt;

&lt;p&gt;Custom API is correct when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The logic is primarily Dataverse queries and writes&lt;/li&gt;
&lt;li&gt;  Latency matters (under 500ms per call)&lt;/li&gt;
&lt;li&gt;  The caller is inside the Microsoft stack (flow, another plugin, Dynamics app)&lt;/li&gt;
&lt;li&gt;  The operation completes in under 2 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Azure Function is correct when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The logic calls external services with uncertain latency&lt;/li&gt;
&lt;li&gt;  The operation may take longer than 2 minutes&lt;/li&gt;
&lt;li&gt;  Durability, retries, or checkpointing matter (durable functions)&lt;/li&gt;
&lt;li&gt;  The code needs packages the Dataverse sandbox blocks (file IO, sockets, native dependencies)&lt;/li&gt;
&lt;li&gt;  You want the compute cost on a different meter from Dataverse&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cost reality
&lt;/h2&gt;

&lt;p&gt;Custom APIs consume Dataverse capacity. A high-volume custom API (tens of thousands of calls per day) is paid for by your existing Dataverse licensing - no incremental per-call charge.&lt;/p&gt;

&lt;p&gt;Azure Functions on Consumption plan: $0.000016 per GB-second and $0.20 per million executions. A typical custom function costing 128MB for 300ms runs at roughly $0.00000006 per call. A million calls per month is around $0.25 in compute plus $0.20 in executions - effectively free.&lt;/p&gt;

&lt;p&gt;The cost math usually favors custom API for Dataverse-heavy work and Azure Function for external-dependency-heavy work. Mixing the two via a custom API that calls a function (rare pattern) pays both bills.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotchas per option
&lt;/h2&gt;

&lt;p&gt;Custom API gotcha: the plugin that implements a custom API registers against a synthetic "Custom API message" step. Debugging is via plugin trace logs, same as any plugin. Deployment is through the solution. OpenAPI schema is auto-generated from the custom api row definitions - mis-typing a parameter name once in Dataverse and once in the plugin code produces a runtime failure that the solution checker does not catch.&lt;/p&gt;

&lt;p&gt;Azure Function gotcha: cold start on Consumption plan can add 2-5 seconds for the first call after idle. For a user-facing interactive call, this is unacceptable. Options: Premium plan (always warm, higher baseline cost), Dedicated App Service plan, or accept the cold-start delay if the call is async.&lt;/p&gt;

&lt;p&gt;Authentication gotcha for Azure Function: calling Dataverse from an Azure Function requires auth. The cleanest pattern is a managed identity on the Function App with an application user in Dataverse. Tutorials that use a user's personal credentials in app settings are security incidents waiting to happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  The decision we ran last month
&lt;/h2&gt;

&lt;p&gt;Requirement: "Calculate and apply a loyalty rebate on every Order created."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Reads three tables (Customer, OrderHistory, RebateProgram)&lt;/li&gt;
&lt;li&gt;  Writes a calculated value back to the Order&lt;/li&gt;
&lt;li&gt;  Must run on every order creation&lt;/li&gt;
&lt;li&gt;  Expected volume: 2,000 orders per day, bursty up to 50/minute&lt;/li&gt;
&lt;li&gt;  Latency target: under one second on order save&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Walking the matrix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Latency: sub-second is tight for Azure Function with cold start. Custom API stays in-process.&lt;/li&gt;
&lt;li&gt;  External dependencies: none. All Dataverse.&lt;/li&gt;
&lt;li&gt;  Long-running: no, under 500ms estimated.&lt;/li&gt;
&lt;li&gt;  Cost: Dataverse is already paid for; Azure Function would be cheap but not free.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Custom API won clearly. Implementation was a plugin registered against a custom API definition, called from both a post-operation plugin on Order Create (synchronous, blocks save) and directly from a Power Automate flow (for retroactive recalculation).&lt;/p&gt;

&lt;p&gt;A year later, the client wanted to integrate with an external tax service that takes 3-4 seconds per call. We did not put that in the same custom API - the 2-minute timeout plus the unpredictability of the external service made it fragile. We built a separate Azure Function for the tax call, invoked async from a Power Automate flow. Two tools for two different latency profiles, as it should be.&lt;/p&gt;

&lt;h2&gt;
  
  
  The rule of thumb
&lt;/h2&gt;

&lt;p&gt;If the operation stays inside Dataverse, start with a custom API. You get latency, integrated ALM, and zero incremental cost. You can always move it to an Azure Function later if the requirements shift.&lt;/p&gt;

&lt;p&gt;If the operation has any external dependency with unpredictable behavior, start with an Azure Function. You get the durability patterns and the 10-minute timeout that the Dataverse sandbox cannot give.&lt;/p&gt;

&lt;p&gt;If you find yourself using custom actions for new work, stop. Custom APIs have everything custom actions did plus a schema and better tooling.&lt;/p&gt;

</description>
      <category>powerplatform</category>
    </item>
  </channel>
</rss>
