March 21, 2026 model board

The model page people actually need.

There is no single best model. There is a best model for your role, budget, context window, latency target, and execution path. This page starts with real pricing and context numbers, then maps the best options by agent role instead of pretending one leaderboard solves everything.

Last updated: March 21, 2026
Accuracy-first Value-first Cheap and good enough Open / open-weight challengers Long context and tools

Core model board

This is the one global board for the page: premium models, cheap defaults, local fallbacks, research-first models, and routers. Prices are OpenRouter starting prices per 1M tokens where available, checked March 21, 2026.

Best production options for crewswarm

Use this as the short-list for crewswarm engine and agent defaults. It is not only a coding board. It mixes premium frontier models, the cheaper models that are already practical in the swarm, local/offline fallbacks, and router/search options you may actually use in production.

Model Vendor Lane Context Input / 1M Output / 1M Best crewswarm roles Why it matters
Claude Opus 4.6 Anthropic Premium coding 1,000,000 $5 $25 crew-coder, crew-fixer, crew-security Safest premium choice when the task is expensive to get wrong.
Claude Sonnet 4.6 Anthropic Premium coding 1,000,000 $3 $15 crew-coder, crew-coder-front, crew-coder-back, crew-fixer Frontier quality without Opus burn.
GPT-5.4 OpenAI Premium general 1,050,000 $2.50 $15 crew-lead, crew-architect, crew-ml, crew-main Strong all-around flagship with giant context.
Gemini 3.1 Pro Preview Google Premium general 1,048,576 $2 $12 crew-lead, crew-architect, crew-researcher High-context power with improving software-engineering focus.
Grok 4.20 Beta xAI Premium orchestration 2,000,000 $2 $6 crew-lead, crew-main, crew-orchestrator Massive context and strong speed profile.
Grok 4.1 Fast xAI Cheap orchestration 2,000,000 $0.20 $0.50 crew-lead, crew-main, crew-orchestrator, crew-pm One of the best speed-to-cost options for coordinators and high-context chat.
Kimi K2.5 Moonshot Value coding 262,144 $0.45 $2.20 crew-coder budget lane, crew-coder-front, L3 workers One of the best capability-per-dollar plays.
GLM-5 Z.ai Value coding 202,752 $0.72 $2.30 crew-coder-back, crew-architect, budget coding lane Strong open-weight alternative with agent focus.
MiniMax M2.5 MiniMax Cheap utility 196,608 $0.27 $0.95 crew-copywriter, crew-seo, L3 workers Probably the nastiest value sleeper on the board.
DeepSeek R1 DeepSeek Reasoning 64,000 $0.70 $2.50 crew-pm, crew-judge, research-heavy analysis Still relevant, but context is now small relative to 2026 leaders.
Gemini 2.5 Flash Google Cheap utility 1,048,576 $0.30 $2.50 crew-qa, crew-pm, crew-seo, crew-telegram, crew-loco Excellent “good enough” workhorse.
Gemini 2.5 Flash Lite Google Cheapest utility 1,048,576 $0.10 $0.40 triage, glue work, cheap support lanes Ridiculously cheap for a 1M-context utility model.
Groq Llama 3.3 70B Versatile Groq Cheap coordinator 128,000 Groq pricing Groq pricing crew-main, crew-orchestrator, crew-judge Already practical in your live swarm and one of the easiest cheap defaults to keep hot.
Groq Llama 3.1 8B Instant Groq Ultra-cheap planning 128,000 Groq pricing Groq pricing crew-pm, triage, lightweight routing Fastest lightweight option when speed matters more than prestige.
Perplexity Sonar Pro Perplexity Research-first 200,000 $3 $15 crew-researcher, crew-main research lane Built-in live web search makes it different from the pure foundation models on this board.
Ollama Qwen 2.5 Coder 7B Ollama Local / offline coding 128,000 Local only Local only offline worker lanes, private coding, cheapest fallback Not frontier, but useful when privacy, cost, or offline operation beats raw quality.
Ollama Llama 3.1 8B Ollama Local / offline general 128,000 Local only Local only offline coordination, simple PM, bridge roles Lowest-friction local fallback for a swarm that needs to stay cheap or private.
OpenRouter Auto OpenRouter Router 2,000,000 Varies Varies router layer for every role Not a model. It routes to models like Claude, GPT, Gemini, Kimi, Grok, GLM, and others.

Why OpenRouter is on this page

Because it solves a real production problem: one API, many vendors, provider failover, and an auto-router. It is access infrastructure, not a coding agent and not a foundation model.

What this means for crewswarm

crewswarm should treat OpenRouter as one transport layer. The product value is still orchestration, memory, task routing, and engine choice above the raw model API.

Role picks by agent

One section, full roster. Expand the role you care about and use the top 3 picks for that lane.

`crew-lead` top 3

For orchestration, synthesis, routing, long context, and talking to the user without lagging the whole system.

+
#ModelWhy it fits `crew-lead`ContextPricingPick when
1Grok 4.1 Fast
xAI
Fast enough for leadership/orchestration, cheap relative to premium coding models, and strong for routing and synthesis.2,000,000$0.20 / $0.50speed and conversational flow matter most
2GPT-5.4
OpenAI
Best premium option when `crew-lead` needs giant context, better judgment, and broad mixed-task reasoning.1,050,000$2.50 / $15you want premium synthesis and deep context
3Gemini 3.1 Pro
Google
Very strong long-context lead brain for project state, docs, and multimodal orchestration.1,048,576$2 / $12you want huge context with better economics

`crew-coder` / `crew-fixer` top 3

For actual code quality, repo surgery, debugging, and difficult implementation work.

+
#ModelWhy it fits codersContextPricingPick when
1Claude Sonnet 4.6
Anthropic
Best practical default for day-to-day coding quality without Opus cost.1,000,000$3 / $15you want the safest premium coding default
2GPT-5.3 Codex
OpenAI
Excellent for targeted implementation and repair, especially in Codex-style harnesses.400,000$1.75 / $14you want strong repo surgery and OpenAI tooling
3Kimi K2.5
Moonshot
Best cheaper coding challenger if you want strong output without premium closed-model pricing.262,144$0.45 / $2.20cost matters more than absolute peak quality

`crew-pm` / `crew-qa` top 3

For planning, audits, roadmap work, triage, and cheap high-volume review passes.

+
#ModelWhy it fits PM / QAContextPricingPick when
1Gemini 2.5 Flash
Google
Best cheap strong worker for PM, QA, SEO, and utility roles.1,048,576$0.30 / $2.50you need cheap volume without useless output
2Grok 4.1 Fast Reasoning
xAI
Good PM planning brain when you want speed plus reasoning. Same base pricing as Grok 4.1 Fast, with reasoning enabled.2,000,000$0.20 / $0.50planning quality matters more than raw cheapness
3Gemini 2.5 Flash Lite
Google
Ultra-cheap for repetitive review, summaries, and glue work.1,048,576$0.10 / $0.40you want the lowest-cost support lane

`crew-coder-front` / `crew-frontend`

Frontend implementation, UI polish, and component-heavy coding work.

+
#ModelWhyContextPricing
1Claude Sonnet 4.6Safest premium default for frontend quality.1,000,000$3 / $15
2GPT-5.3 CodexStrong targeted UI implementation and repair.400,000$1.75 / $14
3Kimi K2.5Best cheaper frontend coding challenger.262,144$0.45 / $2.20

`crew-coder-back`

Backend systems, APIs, infra glue, and long-horizon implementation.

+
#ModelWhyContextPricing
1Claude Sonnet 4.6Strong backend reasoning with good reliability.1,000,000$3 / $15
2GPT-5.3 CodexTargeted backend implementation and fixer work.400,000$1.75 / $14
3GLM-5Open challenger for systems and backend-heavy coding.202,752$0.72 / $2.30

`crew-security`

Threat review, careful code reading, and security analysis where mistakes are expensive.

+
#ModelWhyContextPricing
1Claude Opus 4.6Premium review model when false confidence is expensive.1,000,000$5 / $25
2Claude Sonnet 4.6Cheaper default for strong security reading and reasoning.1,000,000$3 / $15
3GPT-5.4Good mixed security plus architecture fallback.1,050,000$2.50 / $15

`crew-github`

Git operations, PR prep, repo automation, and tool-oriented execution.

+
#ModelWhyContextPricing
1Gemini 2.5 FlashCheap and strong enough for git-oriented execution lanes.1,048,576$0.30 / $2.50
2Grok 4.1 FastFast operator model for repo routing and summaries.2,000,000$0.20 / $0.50
3GPT-5.3 CodexUse when the git lane also needs real code judgment.400,000$1.75 / $14

`crew-copywriter` / `crew-seo`

Content, search pages, summaries, and high-volume marketing tasks.

+
#ModelWhyContextPricing
1Gemini 2.5 FlashBest cheap default for throughput-heavy content lanes.1,048,576$0.30 / $2.50
2MiniMax M2.5Cheap strong writer and productivity lane.196,608$0.27 / $0.95
3Claude Sonnet 4.6Use when quality matters more than token burn.1,000,000$3 / $15

`crew-researcher`

Live research, citations, search-grounded work, and changing-source synthesis.

+
#ModelWhyContextPricing
1Perplexity SonarSearch-grounded research lane. Retrieval matters as much as raw model IQ here.VariesVaries
2Gemini 3.1 ProStrong long-context synthesis when result sets are large.1,048,576$2 / $12
3GPT-5.4Premium mixed research and reasoning fallback.1,050,000$2.50 / $15

`crew-architect` / `crew-ml`

Systems design, ML planning, architecture docs, and deep technical judgment.

+
#ModelWhyContextPricing
1GPT-5.4Best mixed architecture and systems reasoning flagship.1,050,000$2.50 / $15
2Gemini 3.1 ProGreat high-context planning for large system state.1,048,576$2 / $12
3Claude Sonnet 4.6Strong architecture plus coding crossover.1,000,000$3 / $15

`crew-main` / `crew-orchestrator` / `crew-judge`

Synthesis, routing, judging whether to continue, and decision-gate work.

+
#ModelWhyContextPricing
1Groq Llama 3.3 70B VersatileCheap, fast, and already practical for routing and gate decisions.128,000Groq tier
2Grok 4.1 FastUpgrade when coordinators need bigger context and stronger synthesis.2,000,000$0.20 / $0.50
3Ollama Qwen 2.5 3B / Llama 3.1 8BLocal fallback when you want near-zero cost, privacy, or offline coordination.128,000Local

`crew-telegram` / `crew-loco`

Bridge roles, messaging surfaces, and lightweight conversational support.

+
#ModelWhyContextPricing
1Gemini 2.5 FlashCheap strong default for messaging and bridge lanes.1,048,576$0.30 / $2.50
2Grok 4.1 FastFast conversational fallback with huge context.2,000,000$0.20 / $0.50
3Gemini 2.5 Flash LiteLowest-cost utility lane for repetitive bridge work.1,048,576$0.10 / $0.40

`crew-cli` L1 / L2 / L3 stack

`crew-cli` is not one flat model call. The code splits responsibilities across L1 chat, L2 reasoning/planning, and L3 workers. This matters because the best model for chat is not the best model for decomposition, and the best model for decomposition is often too expensive for every small worker task.

L1 chat interface

Fast, cheap, user-facing

Best fits: Grok 4.1 Fast, Gemini 2.5 Flash, GPT-5 mini.

L1 should stay responsive and not burn premium reasoning tokens on greetings, clarifications, or simple chat turns. In the code, L1 does not execute tasks; it passes them to L2 and synthesizes results back.

L2 reasoning / planning

Heavy brain layer

Best fits: Claude Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro.

This is where the router, planner, decomposer, and validator live. This is the right place for expensive reasoning models because L2 is deciding how to break down and validate the work graph.

L3 workers

Scoped execution

Best fits: Kimi K2.5, Gemini 2.5 Flash, GPT-5.3 Codex.

L3 workers should usually execute bounded tasks, not act like mini CEOs. That means cheap fast models are often correct here, with premium models reserved for hard coding workers or fallback lanes.

What the code says

L1 is chat only, L2 is router/reasoner/planner, and L3 is the worker execution layer. That matches the implementation in crew-cli, not just a marketing diagram.

Practical default

Use a cheap fast model for L1, a premium reasoning model for L2, and smaller faster workers for L3 unless a task clearly needs a premium coding model.

Chat brain vs execution

This is the distinction most model pages miss. Some crewswarm roles are mostly direct LLMs. Coding roles usually need a real execution engine — Claude Code CLI, Codex CLI, Gemini CLI, Cursor CLI, OpenCode, or the native crew-cli executor — so the model gets tool access, file edits, and terminal capabilities. Assigning a powerful model like Claude Opus 4.6 or GPT-5.4 to a coding role via Direct API wastes its potential; route it through a CLI engine instead. The chat model you talk to and the model that actually executes work do not always need to be the same.

Agent lane Chat brain Exec engine Exec model Why
crew-lead Grok 4.1 Fast, GPT-5.4, Gemini 3.1 Pro Direct API or Cursor CLI Usually same as chat brain Leadership is mostly synthesis and routing. It does not always need a coding CLI.
crew-coder / crew-fixer Claude Sonnet 4.6, GPT-5.4 Codex CLI, Claude Code, OpenCode GPT-5.3 Codex, Claude Sonnet 4.6, Kimi K2.5 These roles need real file edits, terminals, tools, and multi-step execution.
crew-coder-front / back Claude Sonnet 4.6, GPT-5.4 Codex CLI, Claude Code, OpenCode GPT-5.3 Codex, Claude Sonnet 4.6, GLM-5, Kimi K2.5 Same pattern as coders: strong chat model up top, real worker model underneath.
crew-pm / crew-qa / crew-seo Gemini 2.5 Flash, Grok 4.1 Fast Reasoning Direct API Usually same as chat brain These are mostly analysis, summaries, triage, and planning lanes. A CLI is often unnecessary.
crew-researcher Perplexity Sonar, Gemini 3.1 Pro, GPT-5.4 Direct API Usually same as chat brain Research-first roles care more about search and citations than about code execution.
crew-main / orchestrator / judge Groq Llama 3.3 70B, Grok 4.1 Fast Direct API Usually same as chat brain Coordinator roles are often cheap chat brains unless escalated.
Local / private lane Ollama Llama 3.1 8B Direct API or local OpenCode lane Ollama Qwen 2.5 Coder 7B, Ollama Llama 3.1 8B Useful when privacy, offline use, or near-zero cost matters more than raw quality.

Chat & research → Direct API

crew-pm, crew-qa, crew-seo, crew-copywriter, crew-telegram, and coordinator roles run fine as cheap, fast Direct API lanes. Use Gemini 2.5 Flash, Grok 4.1 Fast, or Perplexity Sonar — no CLI overhead needed.

Coding & fixing → CLI engine

crew-coder, crew-fixer, crew-coder-front, and crew-coder-back should always run through a CLI engine (Claude Code, Codex CLI, Gemini CLI, Cursor CLI, OpenCode, or crew-cli). Assign strong models here — Claude Sonnet 4.6, GPT-5.4, Kimi K2.5 — so they get tool access, file edits, and terminals.

Don't waste big models on Direct API

A model like Claude Opus 4.6 or GPT-5.4 behind a Direct API call can only generate text. Route it through a CLI engine and it can read files, run tests, edit code, and use 45+ built-in tools. Execution mode matters as much as model choice.

Where Cursor fits

Cursor CLI is a good reasoning/chat surface, especially with composer-2-fast, but it should not be treated as the universal answer for every hard execution lane.

FAQ

Short answers to the actual search queries people type when choosing a model stack.

What is the best model for `crew-lead`?

Grok 4.1 Fast is a strong default when you want fast orchestration and lower burn. Use GPT-5.4 or Gemini 3.1 Pro when `crew-lead` needs giant context and stronger synthesis.

What is the best coding model for `crew-coder` and `crew-fixer`?

Claude Sonnet 4.6 is the best practical premium default for crew-coder and crew-fixer. Use Claude Opus 4.6 for the hardest jobs. Use Kimi K2.5 when you want a much cheaper coding challenger.

What is the cheapest good enough model for `crew-qa`, `crew-pm`, and support agents?

Gemini 2.5 Flash is the main answer. It is cheap, fast, and strong enough for crew-qa, crew-pm, crew-seo, triage, and summaries. Gemini 2.5 Flash Lite is the lower-cost fallback for repetitive glue work.

Is OpenRouter a model or a coding agent?

Neither. OpenRouter is a unified API and routing layer. It gives you one endpoint for many vendors and can auto-route or fail over across providers. The agent behavior still comes from tools like crewswarm, Cursor, Claude Code, or Codex CLI.

Should one model run the whole swarm?

No. crewswarm is better with per-agent assignments: one model for `crew-lead`, one for coders, and one cheap workhorse for QA and support roles. That keeps quality high without setting money on fire.