run-tests-kit
Run a batch test suite via the Copilot Studio Kit (Dataverse API). Uses the Power CAT Copilot Studio Kit to execute test cases against a published agent and produces pass/fail results with latencies. Requires the Kit installed in the environment, an App Registration with Dataverse permissions, and a published agent.
Skill body
Run Tests via Copilot Studio Kit
Run a batch test suite against a published Copilot Studio agent using the Power CAT Copilot Studio Kit.
Prerequisites
The user must have:
- The Copilot Studio Kit installed in their Power Platform environment
- Published their agent in the Copilot Studio UI
- Created a test set in the Copilot Studio Kit
- An Azure App Registration with Dataverse permissions
Phase 1: Configure Settings
-
Read
tests/settings.json(relative to the user’s project CWD) and check for missing or placeholder values (containingYOUR_). - If the file doesn’t exist, create it from the template:
cp ${CLAUDE_SKILL_DIR}/../../tests/settings-example.json ./tests/settings.json -
If values are missing, ask the user for each missing value. Explain where to find each one:
- Environment URL (
dataverse.environmentUrl): “What is your Dataverse environment URL? Find it in Power Platform admin center or Copilot Studio > Settings > Session Details. It looks likehttps://orgXXXXXX.crm.dynamics.com” - Tenant ID (
dataverse.tenantId): “What is your Azure tenant ID? Find it in Azure Portal > Microsoft Entra ID > Overview. It’s a GUID likec87f36f7-fc65-453c-9019-0d724f21bc42” - Client ID (
dataverse.clientId): “What is your App Registration client ID? Find it in Azure Portal > App Registrations > your app > Application (client) ID. It’s a GUID.” - Agent Configuration ID (
testRun.agentConfigurationId): “What is your agent configuration ID? In Copilot Studio, go to your agent > Tests tab. The ID is a GUID found in the URL or test configuration.” - Test Set ID (
testRun.agentTestSetId): “What is your test set ID? In Copilot Studio, go to your agent > Tests tab > select your test set. The ID is a GUID found in the URL.”
Ask for ALL missing values at once (don’t ask one at a time).
- Environment URL (
- Write
tests/settings.jsonwith the collected values:{ "dataverse": { "environmentUrl": "<value>", "tenantId": "<value>", "clientId": "<value>" }, "testRun": { "agentConfigurationId": "<value>", "agentTestSetId": "<value>" } } - If all values are already configured and valid, proceed to Phase 2.
Phase 2: Run Tests
- Ensure
tests/package.jsonexists in the user’s project. If not, copy it:cp ${CLAUDE_SKILL_DIR}/../../tests/package.json ./tests/package.json - Install dependencies if
tests/node_modules/doesn’t exist:npm install --prefix tests - Run the test script in the background with a 100-minute timeout (6000000ms):
node ${CLAUDE_SKILL_DIR}/../../tests/run-tests.js --config-dir ./testsUse
run_in_background: truefor this command. Save the returned task ID. -
Wait 10 seconds, then check the background task output (non-blocking check).
-
Detect the authentication state from the output:
-
If the output contains “Using cached token”: Authentication succeeded automatically. Tell the user: “Authentication successful (cached credentials). Tests are running, this may take several minutes…”
-
If the output contains “use a web browser to open the page”: Extract the URL and device code from the message. Present this prominently to the user:
Authentication Required
Open your browser to: https://microsoft.com/devicelogin Enter the code: XXXXXXXXX (extract the actual code from the output)
After signing in, the tests will continue automatically.
-
If the output contains an error: Report the error to the user and stop.
-
If the output is empty or incomplete: Wait another 10 seconds and check again (retry up to 3 times).
-
-
Wait for the background task to complete (blocking). The script polls every 20 seconds until all tests finish and downloads results as a CSV.
-
Read the final output to get the success rate and CSV filename.
- Proceed to Phase 3.
Phase 3: Analyze Results
-
Get the results:
Glob: tests/test-results-*.csv— read the most recent CSV file (newest by modification time). -
Parse the CSV columns: | Column | Meaning | |——–|———| | Test Utterance | The user message that was tested | | Expected Response | What the test expected | | Response | What the agent actually responded | | Latency (ms) | Response time | | Result |
Success,Failed,Unknown,Error, orPending| | Test Type |Response Match,Topic Match,Generative Answers,Multi-turn,Plan Validation, orAttachments| | Result Reason | Why the test passed or failed | - Focus on failed tests (Result =
FailedorError). For each failure, analyze:- Test Type = Topic Match: The wrong topic was triggered, or no topic matched. Check trigger phrases and model descriptions.
- Test Type = Response Match: The response didn’t match expected. Check
SendActivitymessages, instructions, or generative answer config. - Test Type = Generative Answers: The generative answer was incorrect or missing. Check knowledge sources,
SearchAndSummarizeContent, and agent instructions. - Test Type = Plan Validation: The orchestrator’s plan was wrong. Check topic descriptions and agent-level instructions.
- Test Type = Multi-turn: A multi-turn conversation failed. Check topic flow, variable handling, and conditions.
- Proceed to Phase 4 (Propose Fixes).
Phase 4: Propose Fixes
- For each failure, identify the relevant YAML file(s):
- Auto-discover the agent:
Glob: **/agent.mcs.yml - Find the relevant topic by matching the test utterance against trigger phrases and model descriptions
- Read the topic file to understand the current flow
- Auto-discover the agent:
- Propose specific YAML changes to fix each failure. Present them to the user as a summary:
- Which test(s) failed and why
- Which file(s) need changes
- What the proposed change is (show the diff)
- Wait for user decision. The user can:
- Accept all — apply all proposed changes
- Accept partially — apply only some changes (ask which ones)
- Reject — discard proposed changes and discuss alternative approaches
- Apply accepted changes using the Edit tool. After applying, remind the user to push and publish again before re-running tests.
Test Result Codes Reference
Result: 1=Success, 2=Failed, 3=Unknown, 4=Error, 5=Pending
Test Type: 1=Response Match, 2=Topic Match, 3=Attachments, 4=Generative Answers, 5=Multi-turn, 6=Plan Validation
Run Status: 1=Not Run, 2=Running, 3=Complete, 4=Not Available, 5=Pending, 6=Error