Open-source benchmarking for AI agent skills

Will AI agents
choose your skill?

Every day, thousands of decisions happen inside AI context windows. Your skill vs. competitors. Find out who wins — and why.

LIVE BATTLE RESULT
WINNER
Your Search Skill

"Advanced web search with real-time results and structured output..."

72%
selection rate
VS
Competitor Skill

"Basic web scraping tool for fetching page contents..."

28%
selection rate
Agent's reasoning

"The winning skill clearly specified real-time capabilities and structured output, which directly matched the task of finding current news. The competitor only mentioned basic page fetching without real-time or formatting guarantees."

The battleground has moved

Developers don't search for tools anymore — they ask their AI assistant. Your skill is either chosen or invisible.

$ Developer: "Find the latest AI news and summarize the key points"
$ Agent evaluating available skills...
Your Search Skillselected
Competitor's Web Scraper
Built-in WebSearch
One skill wins the context window. The rest are forgotten.

How it works

Three steps to know if your skill wins

1

Submit your skill

Paste your skill description, upload a .md file, or search the skills.sh repository.

2

Battle the competition

Your skill enters the arena against real competitors. AI agents simulate choosing between them.

3

See the verdict

Get your selection rate, detailed reasoning for each decision, and insights to improve.

Everything you need

Benchmark, compare, and optimize your agent skills

Head-to-head battles

Compare your skill directly against competitors in realistic agent scenarios.

Selection rate metrics

See exactly how often AI agents choose your skill — with clear percentages.

Public leaderboard

Rank your skill against others. See where you stand in the arena.

Actionable insights

AI-powered reasoning explains why your skill was chosen — or wasn't.

Open source SDK

Run evaluations locally or in CI/CD. pip install skills-arena.

Multi-agent testing

Test across Claude, GPT, and more to see how different agents evaluate your skill.

Or use the SDK

Run evaluations locally, in CI/CD, or anywhere Python runs.

$ pip install skills-arena
>>> from skills_arena import Arena
>>> results = Arena().evaluate("./my-skill.md", task="web search")
>>> print(results.selection_rate)
0.72

Ready to compete?

Find out if AI agents will choose your skill.
Two free evaluations — no signup needed.

Enter the Arena