Why OpenRouter is on this page
Because it solves a real production problem: one API, many vendors, provider failover, and an auto-router. It is access infrastructure, not a coding agent and not a foundation model.
There is no single best model. There is a best model for your role, budget, context window, latency target, and execution path. This page starts with real pricing and context numbers, then maps the best options by agent role instead of pretending one leaderboard solves everything.
This is the one global board for the page: premium models, cheap defaults, local fallbacks, research-first models, and routers. Prices are OpenRouter starting prices per 1M tokens where available, checked March 21, 2026.
Use this as the short-list for crewswarm engine and agent defaults. It is not only a coding board. It mixes premium frontier models, the cheaper models that are already practical in the swarm, local/offline fallbacks, and router/search options you may actually use in production.
| Model | Vendor | Lane | Context | Input / 1M | Output / 1M | Best crewswarm roles | Why it matters |
|---|---|---|---|---|---|---|---|
| Claude Opus 4.6 | Anthropic | Premium coding | 1,000,000 | $5 | $25 | crew-coder, crew-fixer, crew-security | Safest premium choice when the task is expensive to get wrong. |
| Claude Sonnet 4.6 | Anthropic | Premium coding | 1,000,000 | $3 | $15 | crew-coder, crew-coder-front, crew-coder-back, crew-fixer | Frontier quality without Opus burn. |
| GPT-5.4 | OpenAI | Premium general | 1,050,000 | $2.50 | $15 | crew-lead, crew-architect, crew-ml, crew-main | Strong all-around flagship with giant context. |
| Gemini 3.1 Pro Preview | Premium general | 1,048,576 | $2 | $12 | crew-lead, crew-architect, crew-researcher | High-context power with improving software-engineering focus. | |
| Grok 4.20 Beta | xAI | Premium orchestration | 2,000,000 | $2 | $6 | crew-lead, crew-main, crew-orchestrator | Massive context and strong speed profile. |
| Grok 4.1 Fast | xAI | Cheap orchestration | 2,000,000 | $0.20 | $0.50 | crew-lead, crew-main, crew-orchestrator, crew-pm | One of the best speed-to-cost options for coordinators and high-context chat. |
| Kimi K2.5 | Moonshot | Value coding | 262,144 | $0.45 | $2.20 | crew-coder budget lane, crew-coder-front, L3 workers | One of the best capability-per-dollar plays. |
| GLM-5 | Z.ai | Value coding | 202,752 | $0.72 | $2.30 | crew-coder-back, crew-architect, budget coding lane | Strong open-weight alternative with agent focus. |
| MiniMax M2.5 | MiniMax | Cheap utility | 196,608 | $0.27 | $0.95 | crew-copywriter, crew-seo, L3 workers | Probably the nastiest value sleeper on the board. |
| DeepSeek R1 | DeepSeek | Reasoning | 64,000 | $0.70 | $2.50 | crew-pm, crew-judge, research-heavy analysis | Still relevant, but context is now small relative to 2026 leaders. |
| Gemini 2.5 Flash | Cheap utility | 1,048,576 | $0.30 | $2.50 | crew-qa, crew-pm, crew-seo, crew-telegram, crew-loco | Excellent “good enough” workhorse. | |
| Gemini 2.5 Flash Lite | Cheapest utility | 1,048,576 | $0.10 | $0.40 | triage, glue work, cheap support lanes | Ridiculously cheap for a 1M-context utility model. | |
| Groq Llama 3.3 70B Versatile | Groq | Cheap coordinator | 128,000 | Groq pricing | Groq pricing | crew-main, crew-orchestrator, crew-judge | Already practical in your live swarm and one of the easiest cheap defaults to keep hot. |
| Groq Llama 3.1 8B Instant | Groq | Ultra-cheap planning | 128,000 | Groq pricing | Groq pricing | crew-pm, triage, lightweight routing | Fastest lightweight option when speed matters more than prestige. |
| Perplexity Sonar Pro | Perplexity | Research-first | 200,000 | $3 | $15 | crew-researcher, crew-main research lane | Built-in live web search makes it different from the pure foundation models on this board. |
| Ollama Qwen 2.5 Coder 7B | Ollama | Local / offline coding | 128,000 | Local only | Local only | offline worker lanes, private coding, cheapest fallback | Not frontier, but useful when privacy, cost, or offline operation beats raw quality. |
| Ollama Llama 3.1 8B | Ollama | Local / offline general | 128,000 | Local only | Local only | offline coordination, simple PM, bridge roles | Lowest-friction local fallback for a swarm that needs to stay cheap or private. |
| OpenRouter Auto | OpenRouter | Router | 2,000,000 | Varies | Varies | router layer for every role | Not a model. It routes to models like Claude, GPT, Gemini, Kimi, Grok, GLM, and others. |
Because it solves a real production problem: one API, many vendors, provider failover, and an auto-router. It is access infrastructure, not a coding agent and not a foundation model.
crewswarm should treat OpenRouter as one transport layer. The product value is still orchestration, memory, task routing, and engine choice above the raw model API.
crew-cli is not one flat model call. The code splits responsibilities across L1 chat, L2 reasoning/planning, and L3 workers. This matters because the best model for chat is not the best model for decomposition, and the best model for decomposition is often too expensive for every small worker task.
Classifies tasks as direct-answer, execute-direct, or execute-parallel. Needs to be fast and cheap — runs on every input.
| Groq GPT-OSS 20B | $0.08/$0.30 |
| Gemini 2.5 Flash Lite | $0.10/$0.40 |
| Groq Llama 3.3 70B | $0.59/$0.79 |
| Claude Haiku 4.5 | $1.00/$5.00 |
Any model works — routing is a simple classification task. L1 should stay responsive and not burn premium reasoning tokens on greetings, clarifications, or simple chat turns.
Generates 8 artifacts (PDD, ROADMAP, ARCH, DESIGN, etc.) then decomposes into work units with dependencies and personas. 14 models tested at 90/100.
| GPT-OSS 20B (Groq) | $0.003/plan |
| Gemini 2.5 Flash Lite | $0.004/plan |
| DeepSeek Reasoner | $0.004/plan |
| Grok 3 Mini | $0.005/plan |
| Claude Sonnet 4.6 | $3.00/$15.00 |
| GPT-5.4 | $2.50/$15.00 |
Prompt engineering does the work — cheap models match GPT-5.4 quality. This is the right place for expensive reasoning models because L2 is deciding how to break down and validate the work graph.
Runs agentic tool-calling loops — reads files, writes code, runs tests. 5 models pass all tests (including 2 free local models). Must support structured tool calling.
| Gemini 2.5 Flash | $0.002/task |
| DeepSeek Chat | $0.001/task |
| Grok 4-1 Fast | $0.001/task |
| GPT-5.4 | $0.02/task |
| Claude Sonnet 4.6 | $0.02/task |
All produce identical quality code — pick by speed and cost. Cheap fast models are often correct here, with premium models reserved for hard coding tasks or fallback lanes.
L1: Groq GPT-OSS 20B ($0.0001) + L2: Gemini Flash Lite ($0.004) + L3: DeepSeek Chat ($0.001) = $0.006 total for a complete plan-and-execute cycle.
One section, full roster. Expand the role you care about and use the top 3 picks for that lane.
For orchestration, synthesis, routing, long context, and talking to the user without lagging the whole system.
| # | Model | Why it fits `crew-lead` | Context | Pricing | Pick when |
|---|---|---|---|---|---|
| 1 | Grok 4.1 Fast xAI | Fast enough for leadership/orchestration, cheap relative to premium coding models, and strong for routing and synthesis. | 2,000,000 | $0.20 / $0.50 | speed and conversational flow matter most |
| 2 | GPT-5.4 OpenAI | Best premium option when `crew-lead` needs giant context, better judgment, and broad mixed-task reasoning. | 1,050,000 | $2.50 / $15 | you want premium synthesis and deep context |
| 3 | Gemini 3.1 Pro | Very strong long-context lead brain for project state, docs, and multimodal orchestration. | 1,048,576 | $2 / $12 | you want huge context with better economics |
For actual code quality, repo surgery, debugging, and difficult implementation work.
| # | Model | Why it fits coders | Context | Pricing | Pick when |
|---|---|---|---|---|---|
| 1 | Claude Sonnet 4.6 Anthropic | Best practical default for day-to-day coding quality without Opus cost. | 1,000,000 | $3 / $15 | you want the safest premium coding default |
| 2 | GPT-5.3 Codex OpenAI | Excellent for targeted implementation and repair, especially in Codex-style harnesses. | 400,000 | $1.75 / $14 | you want strong repo surgery and OpenAI tooling |
| 3 | Kimi K2.5 Moonshot | Best cheaper coding challenger if you want strong output without premium closed-model pricing. | 262,144 | $0.45 / $2.20 | cost matters more than absolute peak quality |
For planning, audits, roadmap work, triage, and cheap high-volume review passes.
| # | Model | Why it fits PM / QA | Context | Pricing | Pick when |
|---|---|---|---|---|---|
| 1 | Gemini 2.5 Flash | Best cheap strong worker for PM, QA, SEO, and utility roles. | 1,048,576 | $0.30 / $2.50 | you need cheap volume without useless output |
| 2 | Grok 4.1 Fast Reasoning xAI | Good PM planning brain when you want speed plus reasoning. Same base pricing as Grok 4.1 Fast, with reasoning enabled. | 2,000,000 | $0.20 / $0.50 | planning quality matters more than raw cheapness |
| 3 | Gemini 2.5 Flash Lite | Ultra-cheap for repetitive review, summaries, and glue work. | 1,048,576 | $0.10 / $0.40 | you want the lowest-cost support lane |
Frontend implementation, UI polish, and component-heavy coding work.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Claude Sonnet 4.6 | Safest premium default for frontend quality. | 1,000,000 | $3 / $15 |
| 2 | GPT-5.3 Codex | Strong targeted UI implementation and repair. | 400,000 | $1.75 / $14 |
| 3 | Kimi K2.5 | Best cheaper frontend coding challenger. | 262,144 | $0.45 / $2.20 |
Backend systems, APIs, infra glue, and long-horizon implementation.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Claude Sonnet 4.6 | Strong backend reasoning with good reliability. | 1,000,000 | $3 / $15 |
| 2 | GPT-5.3 Codex | Targeted backend implementation and fixer work. | 400,000 | $1.75 / $14 |
| 3 | GLM-5 | Open challenger for systems and backend-heavy coding. | 202,752 | $0.72 / $2.30 |
Threat review, careful code reading, and security analysis where mistakes are expensive.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Claude Opus 4.6 | Premium review model when false confidence is expensive. | 1,000,000 | $5 / $25 |
| 2 | Claude Sonnet 4.6 | Cheaper default for strong security reading and reasoning. | 1,000,000 | $3 / $15 |
| 3 | GPT-5.4 | Good mixed security plus architecture fallback. | 1,050,000 | $2.50 / $15 |
Git operations, PR prep, repo automation, and tool-oriented execution.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Gemini 2.5 Flash | Cheap and strong enough for git-oriented execution lanes. | 1,048,576 | $0.30 / $2.50 |
| 2 | Grok 4.1 Fast | Fast operator model for repo routing and summaries. | 2,000,000 | $0.20 / $0.50 |
| 3 | GPT-5.3 Codex | Use when the git lane also needs real code judgment. | 400,000 | $1.75 / $14 |
Content, search pages, summaries, and high-volume marketing tasks.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Gemini 2.5 Flash | Best cheap default for throughput-heavy content lanes. | 1,048,576 | $0.30 / $2.50 |
| 2 | MiniMax M2.5 | Cheap strong writer and productivity lane. | 196,608 | $0.27 / $0.95 |
| 3 | Claude Sonnet 4.6 | Use when quality matters more than token burn. | 1,000,000 | $3 / $15 |
Live research, citations, search-grounded work, and changing-source synthesis.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Perplexity Sonar | Search-grounded research lane. Retrieval matters as much as raw model IQ here. | Varies | Varies |
| 2 | Gemini 3.1 Pro | Strong long-context synthesis when result sets are large. | 1,048,576 | $2 / $12 |
| 3 | GPT-5.4 | Premium mixed research and reasoning fallback. | 1,050,000 | $2.50 / $15 |
Systems design, ML planning, architecture docs, and deep technical judgment.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | GPT-5.4 | Best mixed architecture and systems reasoning flagship. | 1,050,000 | $2.50 / $15 |
| 2 | Gemini 3.1 Pro | Great high-context planning for large system state. | 1,048,576 | $2 / $12 |
| 3 | Claude Sonnet 4.6 | Strong architecture plus coding crossover. | 1,000,000 | $3 / $15 |
Synthesis, routing, judging whether to continue, and decision-gate work.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Groq Llama 3.3 70B Versatile | Cheap, fast, and already practical for routing and gate decisions. | 128,000 | Groq tier |
| 2 | Grok 4.1 Fast | Upgrade when coordinators need bigger context and stronger synthesis. | 2,000,000 | $0.20 / $0.50 |
| 3 | Ollama Qwen 2.5 3B / Llama 3.1 8B | Local fallback when you want near-zero cost, privacy, or offline coordination. | 128,000 | Local |
Bridge roles, messaging surfaces, and lightweight conversational support.
| # | Model | Why | Context | Pricing |
|---|---|---|---|---|
| 1 | Gemini 2.5 Flash | Cheap strong default for messaging and bridge lanes. | 1,048,576 | $0.30 / $2.50 |
| 2 | Grok 4.1 Fast | Fast conversational fallback with huge context. | 2,000,000 | $0.20 / $0.50 |
| 3 | Gemini 2.5 Flash Lite | Lowest-cost utility lane for repetitive bridge work. | 1,048,576 | $0.10 / $0.40 |
This is the distinction most model pages miss. Some crewswarm roles are mostly direct LLMs. Coding roles usually need a real execution engine — Claude Code CLI, Codex CLI, Gemini CLI, Cursor CLI, OpenCode, or the native crew-cli executor — so the model gets tool access, file edits, and terminal capabilities. Assigning a powerful model like Claude Opus 4.6 or GPT-5.4 to a coding role via Direct API wastes its potential; route it through a CLI engine instead. The chat model you talk to and the model that actually executes work do not always need to be the same.
| Agent lane | Chat brain | Exec engine | Exec model | Why |
|---|---|---|---|---|
| crew-lead | Grok 4.1 Fast, GPT-5.4, Gemini 3.1 Pro | Direct API or Cursor CLI | Usually same as chat brain | Leadership is mostly synthesis and routing. It does not always need a coding CLI. |
| crew-coder / crew-fixer | Claude Sonnet 4.6, GPT-5.4 | Claude Code, Codex CLI, Gemini CLI, Cursor CLI, OpenCode, crew-cli | GPT-5.3 Codex, Claude Sonnet 4.6, Kimi K2.5 | These roles need real file edits, terminals, tools, and multi-step execution. All 6 engines work. crew-cli is the only option for Grok, DeepSeek, Qwen, Kimi, and local models. |
| crew-coder-front / back | Claude Sonnet 4.6, GPT-5.4 | Claude Code, Codex CLI, Gemini CLI, Cursor CLI, OpenCode, crew-cli | GPT-5.3 Codex, Claude Sonnet 4.6, GLM-5, Kimi K2.5 | Same as coders: all 6 engines work. crew-cli is the only option for Grok, DeepSeek, Qwen, Kimi, and local models. |
| crew-pm / crew-qa / crew-seo | Gemini 2.5 Flash, Grok 4.1 Fast Reasoning | Direct API, Gemini CLI, Codex CLI, crew-cli | Usually same as chat brain | Mostly analysis, summaries, triage, and planning. Direct API is fine for chat-only work, but routing through a CLI engine gives these agents file access and tool use when needed. |
| crew-researcher | Perplexity Sonar, Gemini 3.1 Pro, GPT-5.4 | Direct API | Usually same as chat brain | Research-first roles care more about search and citations than about code execution. |
| crew-main / orchestrator / judge | Groq Llama 3.3 70B, Grok 4.1 Fast | Direct API | Usually same as chat brain | Coordinator roles are often cheap chat brains unless escalated. |
| Local / private lane | Ollama Llama 3.1 8B | crew-cli or local OpenCode lane | Ollama Qwen 2.5 Coder 7B, Ollama Llama 3.1 8B | Useful when privacy, offline use, or near-zero cost matters more than raw quality. crew-cli gives local models the full 45+ tool system. |
crew-pm, crew-qa, crew-seo, crew-copywriter, and coordinator roles run fine as Direct API lanes for chat-only work. But routing them through Gemini CLI, Codex CLI, or crew-cli gives them file access and tools when tasks go beyond analysis.
crew-coder, crew-fixer, crew-coder-front, and crew-coder-back should always run through a CLI engine (Claude Code, Codex CLI, Gemini CLI, Cursor CLI, OpenCode, or crew-cli). crew-cli is the only option for models without their own CLI — Grok, DeepSeek, Qwen, Kimi, Groq, and local Ollama models all get full agentic coding through crew-cli's 45+ tools and quality engine.
A model like Claude Opus 4.6 or GPT-5.4 behind a Direct API call can only generate text. Route it through a CLI engine and it can read files, run tests, edit code, and use 45+ built-in tools. Execution mode matters as much as model choice.
Cursor CLI is a good reasoning/chat surface, especially with composer-2-fast, but it should not be treated as the universal answer for every hard execution lane.
Short answers to the actual search queries people type when choosing a model stack.
Grok 4.1 Fast is a strong default when you want fast orchestration and lower burn. Use GPT-5.4 or Gemini 3.1 Pro when `crew-lead` needs giant context and stronger synthesis.
Claude Sonnet 4.6 is the best practical premium default for crew-coder and crew-fixer. Use Claude Opus 4.6 for the hardest jobs. Use Kimi K2.5 when you want a much cheaper coding challenger.
Gemini 2.5 Flash is the main answer. It is cheap, fast, and strong enough for crew-qa, crew-pm, crew-seo, triage, and summaries. Gemini 2.5 Flash Lite is the lower-cost fallback for repetitive glue work.
Neither. OpenRouter is a unified API and routing layer. It gives you one endpoint for many vendors and can auto-route or fail over across providers. The agent behavior still comes from tools like crewswarm, Cursor, Claude Code, or Codex CLI.
No. crewswarm is better with per-agent assignments: one model for `crew-lead`, one for coders, and one cheap workhorse for QA and support roles. That keeps quality high without setting money on fire.