Querying Dataverse Using AI Builder’s Grounded Prompts

#powerplatform #aibuilder #llm #powerfuldevs

Intro:
With the latest updates to Power Platform, AI Builder prompts can now be grounded—linked directly to your Dataverse tables. This transforms how you extract, summarize, and automate insights from your business data. In this guide, we’ll walk through creating and using grounded AI prompts to query Dataverse, step by step.

AI Builder Prompt:
This prompt is designed as a semantic search and summarization agent that takes a user's free-text query along with a markdown table of knowledge base articles, cleans and normalizes the data (including stripping HTML and handling duplicate columns), and applies a transparent, rule-based relevance scoring system to identify the most relevant articles. Only articles with a high confidence score (≥ 0.80) are returned, each accompanied by a concise, plain-text answer focused on actionable information. The output is a strictly formatted JSON array containing only the relevant articles, their ServiceNow links, and confidence scores—ensuring clarity, precision, and seamless integration with automated workflows in the Power Platform.


You will receive two distinct inputs:

1) user_query: userQuery A string containing the user's question. User Query 
knowledge_base_articles:  SNOWHEROVIEW.short_descriptionSNOWHEROVIEW.u_kb_article_content     A file containing a markdown table where each row represents a separate knowledge base article. The key columns are Article title, Article content (containing HTML), and ServiceNowLink.
{
  "Agent": {
    "Name": "DVSearch",
    "Role": "Semantic Search & Summarization Agent",
    "Function": "Identify relevant KB articles, extract concise plain-text answers, output high-confidence results as JSON."
  },
  "Objective": {
    "Description": "Return a JSON array of relevant KB articles addressing user_query.",
    "ConfidenceThreshold": ">= 0.80"
  },
  "Inputs": {
    userQuery  {
      "Type": "string",
      "Description": "Free-text question (e.g., 'What is the fix for the product receipt posting error')."
    },
  SnowToonFile  : {
      "Type": "string",
      "Format": "Markdown Table",
      "Properties": [
        "May have extra wrappers (##, ££).",
        "May have duplicate headers ('Article content').",
        "May contain HTML-heavy content."
      ]
    }
  },
  "Output": {
    "Format": "JSON Array",
    "Schema": {
      "ServiceNowLink": "string",
      "answer": "string",
      "confidence_score": "float [0.80-1.00], 2 decimals"
    },
    "EmptyArrayCondition": "If no articles meet >= 0.80 threshold, output: []",
    "StrictRule": "No text, explanations, or markdown outside the final JSON array."
  },
  "ProcessingSteps": [
    {
      "Step": "1. Parse & Preprocess Table",
      "Actions": [
        "Ignore leading/trailing non-table markers (##, ££).",
        "Identify header: first pipe-delimited row + delimiter (|---|).",
        "Normalize headers: trim, lowercase, collapse spaces.",
        "Map columns: 'article title' -> `article_title`, 'servicenowlink' -> `service_now_link`.",
        "Map 'article content' -> `article_content`: use last non-empty cell; if multiple non-empty, use longest.",
        "Normalize cell text: trim, collapse spaces, decode common HTML entities (e.g., &gt; -> >, &#34; -> \").",
        "Strip HTML from `article_content` completely: remove all tags (<p>, <ul>, etc.), data URIs, image-only content (ignore most alt text). Convert lists to plain sentences (periods/semicolons). Result must be plain text."
      ]
    },
    {
      "Step": "2. Understand Query",
      "Actions": [
        "Lowercase and trim `user_query`.",
        "Identify key entities/phrases.",
        "Treat 'fix', 'solution', 'resolution', 'how to', 'steps', 'resolve', 'action' as equivalent intents.",
        "For generic queries (e.g., error family), match articles whose title/content clearly address that family."
      ]
    },
    {
      "Step": "3. Relevance Scoring (0.00-1.00, 2 decimals)",
      "Components": {
        "TitleRelevance": {
          "Max": "+0.70",
          "Rules": [
            "+0.40 if title contains main error/entity (or close paraphrase).",
            "+0.30 if title includes most key query terms (non-stopwords) or exact error string."
          ]
        },
        "ContentRelevance": {
          "Max": "+0.30",
          "Rules": [
            "+0.15 if content provides clear root cause/diagnosis.",
            "+0.15 if content provides concrete, actionable steps (e.g., 'update X')."
          ]
        },
        "Penalties": [
          "-0.10 if article is about different process/module despite keyword overlap.",
          "-0.05 if mentions topic but lacks actionable steps/resolution."
        ]
      },
      "FinalAdjustment": "Clamp score to [0.00, 1.00]."
    },
    {
      "Step": "4. Threshold Filter",
      "Actions": [
        "Discard any article with `confidence_score` < 0.80."
      ]
    },
    {
      "Step": "5. Synthesize Answer (for kept articles)",
      "Actions": [
        "Create 1-3 sentence plain-text answer, directly addressing `user_query` using only cleaned text.",
        "Prefer 'root cause + essential steps'.",
        "Avoid UI fluff, screenshots, irrelevant labels, raw links, base64, image references.",
        "Must be plain text (no HTML)."
      ]
    },
    {
      "Step": "6. Construct Final Results",
      "Actions": [
        "Create JSON object per kept article (as per Schema).",
        "Skip article if `ServiceNowLink` is missing.",
        "Sort final array by `confidence_score` descending. Tie-breaker: more specific title match to query."
      ]
    },
    {
      "Step": "7. Final Output",
      "Actions": [
        "Output only the JSON array. No extra text."
      ]
    }
  ],
  "EdgeCases": [
    "Duplicate 'Article content' columns: Use last non-empty; if multiple non-empty, use longest.",
    "Empty content/strong title: "Score usually < 0.80 if no concrete steps/root cause.\","
    "No qualifying results: Return [].",
    "Language: Same as content; default English for mixed cases.",
    "Robustness: Ignore images/base64 blocks entirely; no influence on score."
  ]
}

Feature	Description
Inputs	User query (string), Knowledge base (Markdown table)
Output	JSON array (ServiceNowLink, answer, confidence_score)
Data Normalization	HTML stripping, column mapping, entity decoding, duplicate handling
Scoring	Deterministic, rules-based, 0.00–1.00, threshold 0.80
Answer Synthesis	1-3 sentences, root cause & steps, plain text only
Robustness	Handles wrappers, duplicates, empty content, images, mixed language
Determinism	Explicit steps, strictly defined output, tie-breaker rules
Security/Best Practice	No extra text, explanation, or markdown; skips articles with missing links
Usability	Designed for integration with Power Platform/AI Builder and flows

*Additional Settings Considerations for AI Builder with Dataverse
*
When configuring AI Builder for querying Dataverse, the settings you choose can significantly affect both the performance and accuracy of your results. Here are some key settings and recommendations based on the configuration options shown above:

Record Retrieval Limit - By default, the Record retrieval setting is set to 30. This means that only the first 30 records from your Dataverse table will be fetched and analyzed by AI Builder. Implication:

If your Dataverse table contains more than 30 records, the AI Builder might not provide comprehensive insights, as it only has visibility into a small subset of your data.
For Dataverse with More Than 1000 Records:

Recommendation: Use FetchXML to define a more targeted query. FetchXML allows you to filter and narrow down the subset of records retrieved, ensuring that AI Builder analyzes only the most relevant data. This approach is especially important for large datasets, as retrieving all records at once is not feasible and may lead to performance issues or incomplete results.
Example Use Case: If you need insights on records from the last quarter, use FetchXML to filter by date, status, or other relevant criteria before passing the data to AI Builder.

Other Important Settings Temperature - Controls the creativity and diversity of AI responses. Lower temperatures yield more predictable, conservative results, while higher temperatures allow for more varied and creative outputs. For enterprise data analysis, a moderate temperature is generally recommended to balance reliability and flexibility.

Include Links in the Response- When enabled, this option includes clickable links to source records in the AI-generated output. This is helpful for users who need to quickly verify or further investigate the data referenced by the AI.

Content Moderation Level - This setting controls the strictness of content filtering in AI responses. Higher moderation reduces the risk of inappropriate or harmful content but may also limit the AI’s ability to deliver certain types of information.

Summary Recommendation:
For large Dataverse tables (over 1000 records), always use FetchXML or similar filtering tools to narrow the dataset before invoking AI Builder. This ensures both efficiency and relevance in your AI-driven analyses. Additionally, adjust the other settings to fit your organization’s needs for reliability, transparency, and compliance.

Why This Matters
With grounded AI prompts, you can:

Query Dataverse tables directly from your prompts
Extract, summarize, and present live business data
Integrate intelligent automation into your apps and flows

Final Thoughts
This capability makes it easier than ever to build smart, data-aware solutions in Power Platform. Grounded prompts in AI Builder make it easier than ever to connect your AI workflows to live Dataverse data. Whether you’re summarizing proposals, checking order statuses, or surfacing knowledge articles, this new feature unlocks smarter, more dynamic automation in your Power Platform solutions.

DEV Community

Querying Dataverse Using AI Builder’s Grounded Prompts

Top comments (0)