End-to-end testing and observability for Conversational AI. Run pre-production simulations across diverse personas and monitor production conversations to test instruction-following, tool calls, and conversational quality.