Bench AITest suites
YAML evaluations against your enabled models
← Compare
Run target
Enabled models
No models enabled — enable at least one in Settings.
Judge (llm-rubric)

Auto · Claude if ANTHROPIC_API_KEY is set on the server (secret “anthropic” is empty)

Recent suite runs
No saved runs yet. Each successful suite run is stored in this browser (up to 15).
SuiteEval configuration (YAML)
Enable models in Settings
Define prompts and assertions in YAML, then run against your enabled models. Expand rows in the results table to inspect outputs and rubric checks.