> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cloosphere.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Arena · Leaderboard

> Objectively compare and rank model quality with Arena blind comparison and the Elo leaderboard

<Info>Admin › Evaluation › Arena · Leaderboard</Info>

**Arena · Leaderboard** pits two models against each other blindly (**Arena**) and ranks them by accumulating the results as an **Elo rating** (**Leaderboard**). It provides an objective comparison of model quality based on real users' choices.

***

## Arena

A feature for blind-evaluating two models by comparing their responses side by side.

### Setup

<Info>Admin › Evaluation › Arena</Info>

<Frame caption="Admin > Evaluation > Arena — Arena model toggle and comparison model management">
  <img src="https://mintcdn.com/cloocus/yJy1JSWKrTBw20B-/images/monitoring/evaluations-arena.png?fit=max&auto=format&n=yJy1JSWKrTBw20B-&q=85&s=f54a3ba49f89dc39f07166bd147472ab" alt="Arena setup — Arena model toggle, model management" width="2880" height="1800" data-path="images/monitoring/evaluations-arena.png" />
</Frame>

| Setting          | Description                                                                   |
| ---------------- | ----------------------------------------------------------------------------- |
| **Arena models** | Toggle whether Arena mode is used                                             |
| **Manage**       | Configure the models to compare (use default Arena models or add custom ones) |

Use **+** in the **Manage** item to add comparison models directly. Name and ID are required, and you specify access permissions and the models to include. Leaving the models empty includes all models.

<Frame caption="Admin > Evaluation > Arena > Manage > + — modal for adding a custom Arena comparison model">
  <img src="https://mintcdn.com/cloocus/z12HbjPvLk3VcOGS/images/monitoring/evaluations-arena-add-model.png?fit=max&auto=format&n=z12HbjPvLk3VcOGS&q=85&s=cdca944b4d4678aa4fc318bb5a5a1ea5" alt="Add Arena model modal — name, ID, description, permissions, model selection" width="1360" height="992" data-path="images/monitoring/evaluations-arena-add-model.png" />
</Frame>

When Arena is enabled, two models' responses appear anonymously side by side while a user chats, and the user selects the better response.

***

## Leaderboard

<Info>Admin › Evaluation › Leaderboard</Info>

Calculates **Elo rating**-based model rankings from Arena blind comparison results.

Each time a user picks the better response in Arena, that model's Elo score updates, letting you objectively gauge real-usage-based model quality rankings.

<Frame caption="Admin > Evaluation > Leaderboard — Elo rating-based model ranking table">
  <img src="https://mintcdn.com/cloocus/Nim6rqpdwJuim_F0/images/monitoring/evaluations-leaderboard.png?fit=max&auto=format&n=Nim6rqpdwJuim_F0&q=85&s=5fd80f28540525fadaba48c219cb0b8a" alt="Leaderboard — Elo rating-based model ranking table" width="1372" height="531" data-path="images/monitoring/evaluations-leaderboard.png" />
</Frame>

You can search rankings by model name in the search box at the top.

| Column         | Description                                              |
| -------------- | -------------------------------------------------------- |
| **RK**         | Rank (descending by evaluation score)                    |
| **Model**      | Evaluated model                                          |
| **Evaluation** | Score derived from Arena comparison results (Elo rating) |
| **Wins**       | Number of wins in Arena comparisons                      |
| **Losses**     | Number of losses in Arena comparisons                    |

Example: RK 1 · Cloocus general model - GPT-oss-120B · Evaluation 1061 · Wins 4 · Losses 0

<Note>The leaderboard is in beta, and evaluation criteria may change as the algorithm is revised. It updates in real time based on the Elo evaluation system.</Note>

***

## Use Cases

<Accordion title="Comparing Quality Across Models" icon="scale-balanced">
  1. Enable Arena evaluation to collect blind comparison data
  2. Compare average scores in the per-model statistics of auto-evaluation
  3. Set the model with the best cost-to-quality efficiency as the default model
</Accordion>

***

## Related Pages

<Columns cols={3}>
  <Card title="Evaluation" icon="star" href="/en/monitoring/evaluations">
    Full overview and guide to evaluation methods
  </Card>

  <Card title="Auto-Evaluations" icon="robot" href="/en/monitoring/auto-evaluations">
    Automatic quality scoring by a judge LLM
  </Card>

  <Card title="Usage" icon="coins" href="/en/monitoring/usage">
    Check token usage per model
  </Card>
</Columns>
