User Feedback Loop with Governance Integration

XMLWordPrintable

    • Type: Suggestion
    • Resolution: Unresolved
    • Component/s: Agents
    • None

      Summary

      Extend the existing thumbs up/down feedback mechanism to include categorised reasons for negative feedback, and surface this feedback in the Agent Studio for platform team review. Provide a workflow for converting user-reported issues into evaluation dataset candidates.

      Problem Statement

      Rovo agents currently support thumbs up/down feedback on responses, but this feedback does not flow into the agent governance pipeline. The platform team has no visibility into which responses users are flagging as problematic, what categories of problems are occurring, or how feedback trends over time.

      This creates two gaps:

      1. Blind spots in evaluation datasets. Evaluation datasets are built from anticipated prompts, not observed failures. Real users ask questions the platform team didn't anticipate, and encounter problems that aren't covered by existing datasets.
      1. No early warning system. A spike in negative feedback on a specific topic could indicate a knowledge source has become outdated, an instruction gap has emerged, or a recent change has introduced a regression. Without visibility into feedback, the platform team only discovers these issues through ad-hoc reports or scheduled evaluations.

      Proposed Solution

      Enhanced user feedback (user-facing)

      When a user clicks the thumbs down button on an agent response, a popover appears asking them to select a reason:

      • Incorrect information - The response contained factual errors
      • Out of date - The information was correct previously but is no longer current
      • Didn't answer my question - The agent deflected or couldn't find relevant information
      • Inappropriate or harmful - The response contained content that shouldn't have been generated
      • Other - None of the above categories apply

      The reason selection is a single click - no free-text required. The user submits and continues their conversation. Thumbs up feedback remains a single click with no additional input.

      Feedback review interface (platform team)

      A new "User feedback" item in the Agent Studio sidebar, with a badge showing the count of pending (unreviewed) negative feedback items.

      The page displays:

      Summary metrics (30 days):

      • Total interactions
      • Positive feedback count
      • Negative feedback count
      • Pending review count

      Reason breakdown: A horizontal bar chart showing the distribution of negative feedback by reason category over the last 30 days. This gives the platform team an at-a-glance view of whether the agent's primary issue is accuracy, staleness, coverage, or safety.

      Pending review table: A filterable table of negative feedback items awaiting review, showing:

      • Date and time
      • Reason category (colour-coded tag)
      • The user's original prompt (truncated preview)
      • The agent's response (truncated preview)
      • Actions: View (full conversation), + Dataset (add to evaluation dataset), Dismiss

      Filters:

      • By review status: pending, reviewed, added to dataset, dismissed
      • By reason category

      Feedback-to-dataset workflow

      When a platform team member reviews a feedback item and clicks "+ Dataset," they are prompted to:

      1. Select which dataset to add the prompt to (accuracy, boundary, or a custom dataset)
      1. Write or edit the expected response (what the agent should have said)
      1. Confirm

      The prompt and expected response are added to the selected dataset. The next scheduled or manual evaluation will include this prompt, closing the loop from user report to automated testing.

      Feedback statuses

      Pending >Feedback received, not yet reviewed by the platform team
      Reviewed > Platform team has viewed the feedback
      Added to dataset > The prompt has been added to an evaluation dataset
      Dismissed > The platform team reviewed and determined no action is needed (e.g. user error, unreasonable expectation)

      Design Reference

      See attached mockup (mockup-user-feedback.html) showing:

      1. Mockup 1 - User view: chat interface with thumbs down selected and the categorised reason popover displayed. Shows the five reason categories with radio selection and submit button.
      1. Mockup 2 - Platform team view: User feedback page in the Agent Studio sidebar. Shows summary metrics, reason breakdown bar chart, and the pending review table with action buttons (View, + Dataset, Dismiss).

      Use Cases

      Knowledge source drift detection: The platform team notices a spike in "Out of date" feedback on a specific topic. Investigation reveals that the underlying Confluence page was updated two weeks ago with new process steps, but the agent is still giving the old instructions. The team updates the accuracy dataset with the corrected expected response and triggers a re-evaluation.

      Evaluation dataset enrichment: A user asks a question the platform team didn't anticipate ("How do I configure ConfiForms conditional field visibility?"). The agent fails to answer. The platform team reviews the feedback, writes the correct expected response, and adds it to the accuracy dataset. Future evaluations now cover this query.

      AUP bypass detection in production: A user reports an "Inappropriate" response where the agent inferred a colleague's emotional state from their Jira activity. The platform team reviews the feedback, confirms it's an AUP bypass, and adds the prompt to the AUP evaluation dataset. This expands the AUP dataset with a real-world prompt rather than only lab-crafted test cases.

      False positive identification: A user marks a correct response as "Incorrect" because they disagree with the process described (e.g. they want self-service space creation but the agent correctly directs them to the request process). The platform team reviews and dismisses the feedback - the agent responded correctly, the user's expectation was wrong.

      Interaction with Other Features

      • Evaluation datasets: The "+ Dataset" action directly adds prompts to existing evaluation datasets, which are then included in scheduled and re-verification evaluations.
      • Compliance dashboard: Feedback volume and sentiment could be surfaced as additional columns on the compliance dashboard, giving portfolio-level visibility into which agents have the most negative feedback.
      • Scheduled evaluations: New dataset entries from feedback are included in the next scheduled evaluation run automatically.

      Considerations

      • Privacy. User prompts captured via feedback may contain sensitive information. The feedback review interface should be restricted to agent and organisation administrators. Consider whether user identity should be visible to reviewers or anonymised.
      • Feedback volume. High-traffic agents may generate significant feedback volume. The pending review queue should support bulk actions (e.g. dismiss all "Other" feedback older than 30 days) and the reason breakdown chart helps the platform team triage by category rather than reviewing every item individually.
      • Positive feedback value. While this feature request focuses on negative feedback, positive feedback data is also valuable. Responses with high positive feedback rates could be used to validate that evaluation dataset expected responses are aligned with what users consider good answers.
      • Notification. Platform team administrators should receive a notification when negative feedback is received, configurable by threshold (e.g. notify immediately for "Inappropriate," daily digest for other categories).

              Assignee:
              Unassigned
              Reporter:
              Rachel Kim
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: