-
Type:
Suggestion
-
Resolution: Unresolved
-
Component/s: Agents
-
None
Issue Summary
Some LLM providers frequently update their models automatically (e.g. minor version bumps or silent model changes). Even when these changes are small, they can alter how prompts are interpreted and lead to unexpected differences in responses.
As a result, the following problems may happen.
- Prompt behavior can suddenly change for production users.
- Teams may see regressions in the quality or consistency of responses.
- It becomes hard to confidently adopt newer LLM versions, since any change is effectively “testing in production.”
This is particularly frustrating for teams who have carefully tuned prompts or workflows and want stability and predictability, but also need to keep up with newer LLM versions.
Suggestion
Introduce a Sandbox environment or mode in Rovo where admins can do below things.
- Configure or select a new LLM version/model that is not yet applied to production.
- Test prompts, workflows, and typical user scenarios against this new version in the Sandbox.
- Compare behavior between the current production LLM version and the candidate version (for example, via side‑by‑side responses or an A/B style comparison for key prompts).
- Once the behavior is validated and accepted, promote the tested LLM version from Sandbox to Production in a controlled way.