The Evaluation Prompt
When theEvaluatorAgent
needs to determine if a policy was followed, it constructs a detailed prompt for the Judge LLM. This prompt contains all the necessary context for an informed and consistent decision.
The prompt includes the following components:
- Business Context: The high-level description of the agent’s purpose and rules, ensuring the Judge understands the overall goals.
- Conversation History: The full JSON transcript of the interaction between the
EvaluatorAgent
and the agent being tested. - Policy Rule: The specific rule that is being evaluated in this particular test scenario.
- Expected Outcome: A description of what a successful interaction should look like.
The Judgment Process
The Judge LLM is instructed to follow a precise set of steps:- Analyze the Conversation: It parses the conversation history to isolate the responses from the agent being tested.
- Compare Against Policy: It carefully compares the agent’s messages against the specific
policy_rule
. - Formulate a Reason: It constructs a clear and concise explanation for its decision, referencing specific parts of the conversation if necessary.
- Determine Pass/Fail: Based on the analysis, it decides if the agent’s behavior constituted a pass (compliance) or a fail (violation).