FIDIAN CLOUD / CUSTOMER VPC
agent sandbox #
evaluator agent ⚖
⌘ Client · Claude Code
edit + rerun
CODE CHANGES
+ prompt change
+ tool change
+ world-model change
❯ run_batch_evals
⮐ summary received
deploys the changes · dispatches each task
Waits for every task's evaluator, then clusters all verdicts into one analysis.
6 / 30 passing (20%) · failure mode: early traversal termination
results
insights
next steps
code changes+2 −1