Why OpenRouter is on this page
Because it solves a real production problem: one API, many vendors, provider failover, and an auto-router. It is access infrastructure, not a coding agent and not a foundation model.
There is no single best model. There is a best model for your role, budget, context window, latency target, and execution path. This page starts with real pricing and context numbers, then maps the best options by agent role instead of pretending one leaderboard solves everything.
This is the one global board for the page: premium models, cheap defaults, local fallbacks, research-first models, and routers. Prices are OpenRouter starting prices per 1M tokens where available, checked March 21, 2026.
Use this as the short-list for crewswarm engine and agent defaults. It is not only a coding board. It mixes premium frontier models, the cheaper models that are already practical in the swarm, local/offline fallbacks, and router/search options you may actually use in production.
| Model | Vendor | Lane | Context | Input / 1M | Output / 1M | Best crewswarm roles | Why it matters |
|---|---|---|---|---|---|---|---|
| Claude Opus 4.6 | Anthropic | Premium coding | 1,000,000 | $5 | $25 | crew-coder, crew-fixer, crew-security | Safest premium choice when the task is expensive to get wrong. |
| Claude Sonnet 4.6 | Anthropic | Premium coding | 1,000,000 | $3 | $15 | crew-coder, crew-coder-front, crew-coder-back, crew-fixer | Frontier quality without Opus burn. |
| GPT-5.4 | OpenAI | Premium general | 1,050,000 | $2.50 | $15 | crew-lead, crew-architect, crew-ml, crew-main | Strong all-around flagship with giant context. |
| Gemini 3.1 Pro Preview | Premium general | 1,048,576 | $2 | $12 | crew-lead, crew-architect, crew-researcher | High-context power with improving software-engineering focus. | |
| Grok 4.20 Beta | xAI | Premium orchestration | 2,000,000 | $2 | $6 | crew-lead, crew-main, crew-orchestrator | Massive context and strong speed profile. |
| Grok 4.1 Fast | xAI | Cheap orchestration | 2,000,000 | $0.20 | $0.50 | crew-lead, crew-main, crew-orchestrator, crew-pm | One of the best speed-to-cost options for coordinators and high-context chat. |
| Kimi K2.5 | Moonshot | Value coding | 262,144 | $0.45 | $2.20 | crew-coder budget lane, crew-coder-front, L3 workers | One of the best capability-per-dollar plays. |
| GLM-5 | Z.ai | Value coding | 202,752 | $0.72 | $2.30 | crew-coder-back, crew-architect, budget coding lane | Strong open-weight alternative with agent focus. |
| MiniMax M2.5 | MiniMax | Cheap utility | 196,608 | $0.27 | $0.95 | crew-copywriter, crew-seo, L3 workers | Probably the nastiest value sleeper on the board. |
| DeepSeek R1 | DeepSeek | Reasoning | 64,000 | $0.70 | $2.50 | crew-pm, crew-judge, research-heavy analysis | Still relevant, but context is now small relative to 2026 leaders. |
| Gemini 2.5 Flash | Cheap utility | 1,048,576 | $0.30 | $2.50 | crew-qa, crew-pm, crew-seo, crew-telegram, crew-loco | Excellent “good enough” workhorse. | |
| Gemini 2.5 Flash Lite | Cheapest utility | 1,048,576 | $0.10 | $0.40 | triage, glue work, cheap support lanes | Ridiculously cheap for a 1M-context utility model. | |
| Groq Llama 3.3 70B Versatile | Groq | Cheap coordinator | 128,000 | Groq pricing | Groq pricing | crew-main, crew-orchestrator, crew-judge | Already practical in your live swarm and one of the easiest cheap defaults to keep hot. |
| Groq Llama 3.1 8B Instant | Groq | Ultra-cheap planning | 128,000 | Groq pricing | Groq pricing | crew-pm, triage, lightweight routing | Fastest lightweight option when speed matters more than prestige. |
| Perplexity Sonar Pro | Perplexity | Research-first | 200,000 | $3 | $15 | crew-researcher, crew-main research lane | Built-in live web search makes it different from the pure foundation models on this board. |
| Ollama Qwen 2.5 Coder 7B | Ollama | Local / offline coding | 128,000 | Local only | Local only | offline worker lanes, private coding, cheapest fallback | Not frontier, but useful when privacy, cost, or offline operation beats raw quality. |
| Ollama Llama 3.1 8B | Ollama | Local / offline general | 128,000 | Local only | Local only | offline coordination, simple PM, bridge roles | Lowest-friction local fallback for a swarm that needs to stay cheap or private. |
| OpenRouter Auto | OpenRouter | Router | 2,000,000 | Varies | Varies | router layer for every role | Not a model. It routes to models like Claude, GPT, Gemini, Kimi, Grok, GLM, and others. |
Because it solves a real production problem: one API, many vendors, provider failover, and an auto-router. It is access infrastructure, not a coding agent and not a foundation model.
crewswarm should treat OpenRouter as one transport layer. The product value is still orchestration, memory, task routing, and engine choice above the raw model API.
One section, full roster. Expand the role you care about and use the top 3 picks for that lane.
For orchestration, synthesis, routing, long context, and talking to the user without lagging the whole system.
| # | Model | Why it fits `crew-lead` | Context | Pricing | Pick when |
|---|---|---|---|---|---|
| 1 | Grok 4.1 Fast xAI | Fast enough for leadership/orchestration, cheap relative to premium coding models, and strong for routing and synthesis. | 2,000,000 | $0.20 / $0.50 | speed and conversational flow matter most |
| 2 | GPT-5.4 OpenAI | Best premium option when `crew-lead` needs giant context, better judgment, and broad mixed-task reasoning. | 1,050,000 | $2.50 / $15 | you want premium synthesis and deep context |
| 3 | Gemini 3.1 Pro | Very strong long-context lead brain for project state, docs, and multimodal orchestration. | 1,048,576 | $2 / $12 | you want huge context with better economics |
For actual code quality, repo surgery, debugging, and difficult implementation work.
| # | Model | Why it fits coders | Context | Pricing | Pick when |
|---|---|---|---|---|---|
| 1 | Claude Sonnet 4.6 Anthropic | Best practical default for day-to-day coding quality without Opus cost. | 1,000,000 | $3 / $15 | you want the safest premium coding default |
| 2 | GPT-5.3 Codex OpenAI | Excellent for targeted implementation and repair, especially in Codex-style harnesses. | 400,000 | $1.75 / $14 | you want strong repo surgery and OpenAI tooling |
| 3 | Kimi K2.5 Moonshot | Best cheaper coding challenger if you want strong output without premium closed-model pricing. | 262,144 | $0.45 / $2.20 | cost matters more than absolute peak quality |
For planning, audits, roadmap work, triage, and cheap high-volume review passes.
| # | Model | Why it fits PM / QA | Context | Pricing | Pick when |
|---|---|---|---|---|---|
| 1 | Gemini 2.5 Flash | Best cheap strong worker for PM, QA, SEO, and utility roles. | 1,048,576 | $0.30 / $2.50 | you need cheap volume without useless output |
| 2 | Grok 4.1 Fast Reasoning xAI | Good PM planning brain when you want speed plus reasoning. Same base pricing as Grok 4.1 Fast, with reasoning enabled. | 2,000,000 | $0.20 / $0.50 | planning quality matters more than raw cheapness |
| 3 | Gemini 2.5 Flash Lite | Ultra-cheap for repetitive review, summaries, and glue work. | 1,048,576 | $0.10 / $0.40 | you want the lowest-cost support lane |
Frontend implementation, UI polish, and component-heavy coding work.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Claude Sonnet 4.6 | Safest premium default for frontend quality. | 1,000,000 | $3 / $15 |
| 2 | GPT-5.3 Codex | Strong targeted UI implementation and repair. | 400,000 | $1.75 / $14 |
| 3 | Kimi K2.5 | Best cheaper frontend coding challenger. | 262,144 | $0.45 / $2.20 |
Backend systems, APIs, infra glue, and long-horizon implementation.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Claude Sonnet 4.6 | Strong backend reasoning with good reliability. | 1,000,000 | $3 / $15 |
| 2 | GPT-5.3 Codex | Targeted backend implementation and fixer work. | 400,000 | $1.75 / $14 |
| 3 | GLM-5 | Open challenger for systems and backend-heavy coding. | 202,752 | $0.72 / $2.30 |
Threat review, careful code reading, and security analysis where mistakes are expensive.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Claude Opus 4.6 | Premium review model when false confidence is expensive. | 1,000,000 | $5 / $25 |
| 2 | Claude Sonnet 4.6 | Cheaper default for strong security reading and reasoning. | 1,000,000 | $3 / $15 |
| 3 | GPT-5.4 | Good mixed security plus architecture fallback. | 1,050,000 | $2.50 / $15 |
Git operations, PR prep, repo automation, and tool-oriented execution.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Gemini 2.5 Flash | Cheap and strong enough for git-oriented execution lanes. | 1,048,576 | $0.30 / $2.50 |
| 2 | Grok 4.1 Fast | Fast operator model for repo routing and summaries. | 2,000,000 | $0.20 / $0.50 |
| 3 | GPT-5.3 Codex | Use when the git lane also needs real code judgment. | 400,000 | $1.75 / $14 |
Content, search pages, summaries, and high-volume marketing tasks.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Gemini 2.5 Flash | Best cheap default for throughput-heavy content lanes. | 1,048,576 | $0.30 / $2.50 |
| 2 | MiniMax M2.5 | Cheap strong writer and productivity lane. | 196,608 | $0.27 / $0.95 |
| 3 | Claude Sonnet 4.6 | Use when quality matters more than token burn. | 1,000,000 | $3 / $15 |
Live research, citations, search-grounded work, and changing-source synthesis.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Perplexity Sonar | Search-grounded research lane. Retrieval matters as much as raw model IQ here. | Varies | Varies |
| 2 | Gemini 3.1 Pro | Strong long-context synthesis when result sets are large. | 1,048,576 | $2 / $12 |
| 3 | GPT-5.4 | Premium mixed research and reasoning fallback. | 1,050,000 | $2.50 / $15 |
Systems design, ML planning, architecture docs, and deep technical judgment.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | GPT-5.4 | Best mixed architecture and systems reasoning flagship. | 1,050,000 | $2.50 / $15 |
| 2 | Gemini 3.1 Pro | Great high-context planning for large system state. | 1,048,576 | $2 / $12 |
| 3 | Claude Sonnet 4.6 | Strong architecture plus coding crossover. | 1,000,000 | $3 / $15 |
Synthesis, routing, judging whether to continue, and decision-gate work.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Groq Llama 3.3 70B Versatile | Cheap, fast, and already practical for routing and gate decisions. | 128,000 | Groq tier |
| 2 | Grok 4.1 Fast | Upgrade when coordinators need bigger context and stronger synthesis. | 2,000,000 | $0.20 / $0.50 |
| 3 | Ollama Qwen 2.5 3B / Llama 3.1 8B | Local fallback when you want near-zero cost, privacy, or offline coordination. | 128,000 | Local |
Bridge roles, messaging surfaces, and lightweight conversational support.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Gemini 2.5 Flash | Cheap strong default for messaging and bridge lanes. | 1,048,576 | $0.30 / $2.50 |
| 2 | Grok 4.1 Fast | Fast conversational fallback with huge context. | 2,000,000 | $0.20 / $0.50 |
| 3 | Gemini 2.5 Flash Lite | Lowest-cost utility lane for repetitive bridge work. | 1,048,576 | $0.10 / $0.40 |
`crew-cli` is not one flat model call. The code splits responsibilities across L1 chat, L2 reasoning/planning, and L3 workers. This matters because the best model for chat is not the best model for decomposition, and the best model for decomposition is often too expensive for every small worker task.
Best fits: Grok 4.1 Fast, Gemini 2.5 Flash, GPT-5 mini.
L1 should stay responsive and not burn premium reasoning tokens on greetings, clarifications, or simple chat turns. In the code, L1 does not execute tasks; it passes them to L2 and synthesizes results back.
Best fits: Claude Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro.
This is where the router, planner, decomposer, and validator live. This is the right place for expensive reasoning models because L2 is deciding how to break down and validate the work graph.
Best fits: Kimi K2.5, Gemini 2.5 Flash, GPT-5.3 Codex.
L3 workers should usually execute bounded tasks, not act like mini CEOs. That means cheap fast models are often correct here, with premium models reserved for hard coding workers or fallback lanes.
L1 is chat only, L2 is router/reasoner/planner, and L3 is the worker execution layer. That matches the implementation in crew-cli, not just a marketing diagram.
Use a cheap fast model for L1, a premium reasoning model for L2, and smaller faster workers for L3 unless a task clearly needs a premium coding model.
This is the distinction most model pages miss. Some crewswarm roles are mostly direct LLMs. Coding roles usually need a real execution engine — Claude Code CLI, Codex CLI, Gemini CLI, Cursor CLI, OpenCode, or the native crew-cli executor — so the model gets tool access, file edits, and terminal capabilities. Assigning a powerful model like Claude Opus 4.6 or GPT-5.4 to a coding role via Direct API wastes its potential; route it through a CLI engine instead. The chat model you talk to and the model that actually executes work do not always need to be the same.
| Agent lane | Chat brain | Exec engine | Exec model | Why |
|---|---|---|---|---|
| crew-lead | Grok 4.1 Fast, GPT-5.4, Gemini 3.1 Pro | Direct API or Cursor CLI | Usually same as chat brain | Leadership is mostly synthesis and routing. It does not always need a coding CLI. |
| crew-coder / crew-fixer | Claude Sonnet 4.6, GPT-5.4 | Codex CLI, Claude Code, OpenCode | GPT-5.3 Codex, Claude Sonnet 4.6, Kimi K2.5 | These roles need real file edits, terminals, tools, and multi-step execution. |
| crew-coder-front / back | Claude Sonnet 4.6, GPT-5.4 | Codex CLI, Claude Code, OpenCode | GPT-5.3 Codex, Claude Sonnet 4.6, GLM-5, Kimi K2.5 | Same pattern as coders: strong chat model up top, real worker model underneath. |
| crew-pm / crew-qa / crew-seo | Gemini 2.5 Flash, Grok 4.1 Fast Reasoning | Direct API | Usually same as chat brain | These are mostly analysis, summaries, triage, and planning lanes. A CLI is often unnecessary. |
| crew-researcher | Perplexity Sonar, Gemini 3.1 Pro, GPT-5.4 | Direct API | Usually same as chat brain | Research-first roles care more about search and citations than about code execution. |
| crew-main / orchestrator / judge | Groq Llama 3.3 70B, Grok 4.1 Fast | Direct API | Usually same as chat brain | Coordinator roles are often cheap chat brains unless escalated. |
| Local / private lane | Ollama Llama 3.1 8B | Direct API or local OpenCode lane | Ollama Qwen 2.5 Coder 7B, Ollama Llama 3.1 8B | Useful when privacy, offline use, or near-zero cost matters more than raw quality. |
crew-pm, crew-qa, crew-seo, crew-copywriter, crew-telegram, and coordinator roles run fine as cheap, fast Direct API lanes. Use Gemini 2.5 Flash, Grok 4.1 Fast, or Perplexity Sonar — no CLI overhead needed.
crew-coder, crew-fixer, crew-coder-front, and crew-coder-back should always run through a CLI engine (Claude Code, Codex CLI, Gemini CLI, Cursor CLI, OpenCode, or crew-cli). Assign strong models here — Claude Sonnet 4.6, GPT-5.4, Kimi K2.5 — so they get tool access, file edits, and terminals.
A model like Claude Opus 4.6 or GPT-5.4 behind a Direct API call can only generate text. Route it through a CLI engine and it can read files, run tests, edit code, and use 45+ built-in tools. Execution mode matters as much as model choice.
Cursor CLI is a good reasoning/chat surface, especially with composer-2-fast, but it should not be treated as the universal answer for every hard execution lane.
Short answers to the actual search queries people type when choosing a model stack.
Grok 4.1 Fast is a strong default when you want fast orchestration and lower burn. Use GPT-5.4 or Gemini 3.1 Pro when `crew-lead` needs giant context and stronger synthesis.
Claude Sonnet 4.6 is the best practical premium default for crew-coder and crew-fixer. Use Claude Opus 4.6 for the hardest jobs. Use Kimi K2.5 when you want a much cheaper coding challenger.
Gemini 2.5 Flash is the main answer. It is cheap, fast, and strong enough for crew-qa, crew-pm, crew-seo, triage, and summaries. Gemini 2.5 Flash Lite is the lower-cost fallback for repetitive glue work.
Neither. OpenRouter is a unified API and routing layer. It gives you one endpoint for many vendors and can auto-route or fail over across providers. The agent behavior still comes from tools like crewswarm, Cursor, Claude Code, or Codex CLI.
No. crewswarm is better with per-agent assignments: one model for `crew-lead`, one for coders, and one cheap workhorse for QA and support roles. That keeps quality high without setting money on fire.