Open-source benchmarking for AI agent skills

Will AI agents
choose your skill?

Every day, thousands of decisions happen inside AI context windows. Your skill vs. competitors. Find out who wins — and why.

Enter the Arena View on GitHub

LIVE BATTLE RESULT

WINNER

Your Search Skill

"Advanced web search with real-time results and structured output..."

72%

selection rate

Competitor Skill

"Basic web scraping tool for fetching page contents..."

28%

selection rate

Agent's reasoning

"The winning skill clearly specified real-time capabilities and structured output, which directly matched the task of finding current news. The competitor only mentioned basic page fetching without real-time or formatting guarantees."

The battleground has moved

Developers don't search for tools anymore — they ask their AI assistant. Your skill is either chosen or invisible.

$ Developer: "Find the latest AI news and summarize the key points"

$ Agent evaluating available skills...

Your Search Skillselected

Competitor's Web Scraper

Built-in WebSearch

One skill wins the context window. The rest are forgotten.

How it works

Three steps to know if your skill wins

Submit your skill

Paste your skill description, upload a .md file, or search the skills.sh repository.

Battle the competition

Your skill enters the arena against real competitors. AI agents simulate choosing between them.

See the verdict

Get your selection rate, detailed reasoning for each decision, and insights to improve.

Everything you need

Benchmark, compare, and optimize your agent skills

Head-to-head battles

Compare your skill directly against competitors in realistic agent scenarios.

Selection rate metrics

See exactly how often AI agents choose your skill — with clear percentages.

Public leaderboard

Rank your skill against others. See where you stand in the arena.

Actionable insights

AI-powered reasoning explains why your skill was chosen — or wasn't.

Open source SDK

Run evaluations locally or in CI/CD. pip install skills-arena.

Multi-agent testing

Test across Claude, GPT, and more to see how different agents evaluate your skill.

Or use the SDK

Run evaluations locally, in CI/CD, or anywhere Python runs.

$ pip install skills-arena

>>> from skills_arena import Arena

>>> results = Arena().evaluate("./my-skill.md", task="web search")

>>> print(results.selection_rate)

0.72

Ready to compete?

Find out if AI agents will choose your skill.
Two free evaluations — no signup needed.

Enter the Arena

Will AI agentschoose your skill?