Feature Request: Response Attribution Monitoring for Rovo Agents

XMLWordPrintable

      Summary

      Provide visibility into whether agent responses are grounded in knowledge source content. Flag responses containing claims that cannot be traced back to any retrieved source, indicating potential fabrication or hallucination.

      Problem Statement

      Rovo agents retrieve content from knowledge sources and synthesise responses. When the agent provides a response, there is no indication of whether the response content is drawn from actual knowledge source material or whether the agent has fabricated information that sounds plausible but has no basis in the configured sources.

      This is a known characteristic of large language models - they can generate confident, authoritative responses that are entirely fabricated. For enterprise support agents, this presents a direct brand and reputational risk: users trust the agent's responses and may act on incorrect information.

      The risk is amplified for agents with org-wide knowledge scope, where the knowledge base is too large to comprehensively evaluate with a 50-prompt dataset. Fabrication detection provides a production safety net that catches hallucinations the evaluation datasets did not anticipate.

      Proposed Solution

      Response attribution analysis

      For a sample set on a customer defined basis, the system analyses whether the claims in the response can be traced back to content retrieved from the agent's configured knowledge sources. Each response receives an attribution score representing the percentage of claims that are grounded in source content.

      Attribution level Score Meaning Fully attributed 90-100% All or nearly all claims traceable to knowledge sources Partially attributed 50-89% Some claims grounded, some unattributed Low attribution 0-49% Majority of claims cannot be traced to knowledge sources

      Responses with low attribution are flagged for platform team review.

      Response attribution page

      A new "Response attribution" item in the Agent Studio sidebar.

      Summary metrics (30 days):

      • Total responses
      • Percentage fully attributed
      • Percentage partially attributed
      • Percentage with unattributed claims

      Flagged responses table: A filterable table of responses with low or partial attribution, showing:

      • Date and time
      • User prompt (truncated preview)
      • Attribution score with visual meter
      • Number of unattributed claims
      • Actions: Review (drill into detail)

      Response detail view

      When the platform team reviews a flagged response, they see a claim-by-claim breakdown:

      • Attributed claims (green) - The specific statement from the response, with a link to the knowledge source page that contains the supporting content
      • Unattributed claims (red) - The specific statement from the response that could not be matched to any retrieved source

      This allows the platform team to quickly identify exactly which parts of the response were fabricated and take appropriate action:

      • Add to accuracy dataset - Create a prompt/expected-response pair to ensure the agent handles this query correctly in future evaluations
      • View full conversation - Review the full interaction in context
      • Dismiss - Mark as acceptable (the claim may be general knowledge that doesn't require a specific source)

      Design Reference

      See attached mockup (mockup-fabrication-detection.html) showing:

      1. Mockup 1 - Response attribution dashboard with summary metrics, and a flagged responses table showing four responses with low attribution scores and unattributed claim counts.
      1. Mockup 2 - Response detail view showing a claim-by-claim analysis. Attributed claims are shown in green with links to their source pages. Unattributed claims are shown in red. Actions allow adding the prompt to the accuracy dataset or dismissing.

      Use Cases

      Catching hallucinated limits: A user asks about Jira custom field limits. The agent correctly references the field governance process (attributed) but fabricates a specific number ("the default limit is 500") that doesn't appear in any knowledge source. The platform team reviews the flagged response, identifies the fabricated claim, and adds the prompt with the correct expected response to the accuracy dataset.

      Org-wide scope quality assurance: An agent with org-wide knowledge scope answers a question about a topic outside its primary domain. The response has 0% attribution because the agent synthesised an answer from general model knowledge rather than Telstra-specific content. The platform team reviews and determines whether to add domain-specific content to the knowledge sources or add deflection instructions for that topic.

      Knowledge source gap identification: Multiple flagged responses cluster around the same topic - all with low attribution. This indicates the agent is being asked about something not covered in its knowledge sources. Rather than fabricating answers, the agent should be deflecting. The platform team either adds relevant content to the knowledge sources or adds a deflection rule to the agent's instructions.

      Considerations

      • Performance and cost. Attribution analysis requires comparing agent output against retrieved source content at query time. This has a processing cost. Consider whether attribution analysis runs on every response or on a configurable sample rate (e.g. 10% of responses) to manage resource consumption.
      • Attribution threshold. The threshold for flagging should be configurable per agent. A general-purpose agent may reasonably include general knowledge statements that won't be fully attributed. A compliance-focused agent should have a higher attribution threshold.
      • General knowledge vs fabrication. Not all unattributed claims are fabrications. Statements like "Jira is a project management tool" are general knowledge. The dismiss action allows the platform team to clear these without cluttering the review queue. Consider allowing pattern-based auto-dismissal for common general knowledge claims.
      • Latency. Attribution analysis should not impact the response time experienced by the end user. Analysis should run asynchronously after the response is delivered, with results available in the dashboard within minutes.

              Assignee:
              Neha Bora
              Reporter:
              Vindika D
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: