Bench AI

Run target

Enabled models

No models enabled — enable at least one in Settings.

Judge (llm-rubric)

Auto · Claude if ANTHROPIC_API_KEY is set on the server (secret “anthropic” is empty)

Recent suite runs

No saved runs yet. Each successful suite run is stored in this browser (up to 15).

SuiteEval configuration (YAML)

Enable models in Settings

Define prompts and assertions in YAML, then run against your enabled models. Expand rows in the results table to inspect outputs and rubric checks.