One best practice is to model AI agents’ role, workflows, and the user goals they are intended to achieve. Developing end-user personas and evaluating whether AI agents meet their objectives can inform the testing of human-AI collaborative workflows and decision-making scenarios.
“AI agents are stochastic systems, and traditional testing methods based on well-defined test plans and tools that verify fixed outputs are not effective,” says Nirmal Mukhi, VP and head of engineering at ASAPP. “Realistic simulation involves modeling various customer profiles, each with a distinct personality, knowledge they may possess, and a set of goals around what they actually want to achieve during the conversation with the agent. Evaluation at scale involves then examining thousands of such simulated conversations to evaluate them based on desired behavior, policies, and checking if the customer’s goals were achieved.”
Ramanathan of Mphasis adds, “The real differentiator is resilience, testing how agents fail, escalate, or recover. Winners will not chase perfection at launch; they will build trust as a living system through sandboxing, monitoring, and continuous adaptation.”



