What is the best model for crew-lead in crewswarm?

For crewswarm, crew-lead usually wants speed, strong synthesis, and large context. Grok 4.1 Fast is a good default for orchestration. GPT-5.4 and Gemini 3.1 Pro are stronger premium options when long context and deeper judgment matter more than cost.

What is the best coding model for crewswarm agents?

Claude Sonnet 4.6 is the best practical premium default for crew-coder and crew-fixer because it keeps frontier coding quality without Opus pricing. Claude Opus 4.6 is the premium upgrade for the hardest jobs. Kimi K2.5 is the main budget coding challenger.

What is the cheapest good enough model for QA and support agents?

Gemini 2.5 Flash is the strongest cheap default for QA, PM, SEO, and other support agents. Gemini 2.5 Flash Lite is the cheaper fallback for summaries, classification, and glue tasks.

Is OpenRouter a coding agent or just a unified API?

OpenRouter is a unified API and routing layer, not a coding agent. It gives one OpenAI-compatible API for many models and providers, plus fallback and auto-routing. crewswarm can use it as transport, but the orchestration, memory, and agent behavior are separate.

Do I need one model for everything in crewswarm?

No. crewswarm works better with per-agent model assignments. A fast large-context model fits crew-lead, a premium coding model fits crew-coder and crew-fixer, and a cheaper model fits QA, PM, SEO, and utility roles.

AI Models 2026 — Best Coding, Value, Speed, and Open Models

Core model board

This is the one global board for the page: premium models, cheap defaults, local fallbacks, research-first models, and routers. Prices are OpenRouter starting prices per 1M tokens where available, checked March 21, 2026.

Best production options for crewswarm

Use this as the short-list for crewswarm engine and agent defaults. It is not only a coding board. It mixes premium frontier models, the cheaper models that are already practical in the swarm, local/offline fallbacks, and router/search options you may actually use in production.

Model	Vendor	Lane	Context	Input / 1M	Output / 1M	Best crewswarm roles	Why it matters
Claude Opus 4.6	Anthropic	Premium coding	1,000,000	$5	$25	crew-coder, crew-fixer, crew-security	Safest premium choice when the task is expensive to get wrong.
Claude Sonnet 4.6	Anthropic	Premium coding	1,000,000	$3	$15	crew-coder, crew-coder-front, crew-coder-back, crew-fixer	Frontier quality without Opus burn.
GPT-5.4	OpenAI	Premium general	1,050,000	$2.50	$15	crew-lead, crew-architect, crew-ml, crew-main	Strong all-around flagship with giant context.
Gemini 3.1 Pro Preview	Google	Premium general	1,048,576	$2	$12	crew-lead, crew-architect, crew-researcher	High-context power with improving software-engineering focus.
Grok 4.20 Beta	xAI	Premium orchestration	2,000,000	$2	$6	crew-lead, crew-main, crew-orchestrator	Massive context and strong speed profile.
Grok 4.1 Fast	xAI	Cheap orchestration	2,000,000	$0.20	$0.50	crew-lead, crew-main, crew-orchestrator, crew-pm	One of the best speed-to-cost options for coordinators and high-context chat.
Kimi K2.5	Moonshot	Value coding	262,144	$0.45	$2.20	crew-coder budget lane, crew-coder-front, L3 workers	One of the best capability-per-dollar plays.
GLM-5	Z.ai	Value coding	202,752	$0.72	$2.30	crew-coder-back, crew-architect, budget coding lane	Strong open-weight alternative with agent focus.
MiniMax M2.5	MiniMax	Cheap utility	196,608	$0.27	$0.95	crew-copywriter, crew-seo, L3 workers	Probably the nastiest value sleeper on the board.
DeepSeek R1	DeepSeek	Reasoning	64,000	$0.70	$2.50	crew-pm, crew-judge, research-heavy analysis	Still relevant, but context is now small relative to 2026 leaders.
Gemini 2.5 Flash	Google	Cheap utility	1,048,576	$0.30	$2.50	crew-qa, crew-pm, crew-seo, crew-telegram, crew-loco	Excellent “good enough” workhorse.
Gemini 2.5 Flash Lite	Google	Cheapest utility	1,048,576	$0.10	$0.40	triage, glue work, cheap support lanes	Ridiculously cheap for a 1M-context utility model.
Groq Llama 3.3 70B Versatile	Groq	Cheap coordinator	128,000	Groq pricing	Groq pricing	crew-main, crew-orchestrator, crew-judge	Already practical in your live swarm and one of the easiest cheap defaults to keep hot.
Groq Llama 3.1 8B Instant	Groq	Ultra-cheap planning	128,000	Groq pricing	Groq pricing	crew-pm, triage, lightweight routing	Fastest lightweight option when speed matters more than prestige.
Perplexity Sonar Pro	Perplexity	Research-first	200,000	$3	$15	crew-researcher, crew-main research lane	Built-in live web search makes it different from the pure foundation models on this board.
Ollama Qwen 2.5 Coder 7B	Ollama	Local / offline coding	128,000	Local only	Local only	offline worker lanes, private coding, cheapest fallback	Not frontier, but useful when privacy, cost, or offline operation beats raw quality.
Ollama Llama 3.1 8B	Ollama	Local / offline general	128,000	Local only	Local only	offline coordination, simple PM, bridge roles	Lowest-friction local fallback for a swarm that needs to stay cheap or private.
OpenRouter Auto	OpenRouter	Router	2,000,000	Varies	Varies	router layer for every role	Not a model. It routes to models like Claude, GPT, Gemini, Kimi, Grok, GLM, and others.

Why OpenRouter is on this page

Because it solves a real production problem: one API, many vendors, provider failover, and an auto-router. It is access infrastructure, not a coding agent and not a foundation model.

What this means for crewswarm

crewswarm should treat OpenRouter as one transport layer. The product value is still orchestration, memory, task routing, and engine choice above the raw model API.

crew-cli L1 / L2 / L3 pipeline

crew-cli is not one flat model call. The code splits responsibilities across L1 chat, L2 reasoning/planning, and L3 workers. This matters because the best model for chat is not the best model for decomposition, and the best model for decomposition is often too expensive for every small worker task.

L1: Router

Classifies tasks as direct-answer, execute-direct, or execute-parallel. Needs to be fast and cheap — runs on every input.

Groq GPT-OSS 20B	$0.08/$0.30
Gemini 2.5 Flash Lite	$0.10/$0.40
Groq Llama 3.3 70B	$0.59/$0.79
Claude Haiku 4.5	$1.00/$5.00

Any model works — routing is a simple classification task. L1 should stay responsive and not burn premium reasoning tokens on greetings, clarifications, or simple chat turns.

L2: Planner

Generates 8 artifacts (PDD, ROADMAP, ARCH, DESIGN, etc.) then decomposes into work units with dependencies and personas. 14 models tested at 90/100.

GPT-OSS 20B (Groq)	$0.003/plan
Gemini 2.5 Flash Lite	$0.004/plan
DeepSeek Reasoner	$0.004/plan
Grok 3 Mini	$0.005/plan
Claude Sonnet 4.6	$3.00/$15.00
GPT-5.4	$2.50/$15.00

Prompt engineering does the work — cheap models match GPT-5.4 quality. This is the right place for expensive reasoning models because L2 is deciding how to break down and validate the work graph.

L3: Executor

Runs agentic tool-calling loops — reads files, writes code, runs tests. 5 models pass all tests (including 2 free local models). Must support structured tool calling.

Gemini 2.5 Flash	$0.002/task
DeepSeek Chat	$0.001/task
Grok 4-1 Fast	$0.001/task
GPT-5.4	$0.02/task
Claude Sonnet 4.6	$0.02/task

All produce identical quality code — pick by speed and cost. Cheap fast models are often correct here, with premium models reserved for hard coding tasks or fallback lanes.

Best value stack: $0.006 per feature

L1: Groq GPT-OSS 20B ($0.0001) + L2: Gemini Flash Lite ($0.004) + L3: DeepSeek Chat ($0.001) = $0.006 total for a complete plan-and-execute cycle.

Role picks by agent

One section, full roster. Expand the role you care about and use the top 3 picks for that lane.

`crew-lead` top 3

For orchestration, synthesis, routing, long context, and talking to the user without lagging the whole system.

#	Model	Why it fits `crew-lead`	Context	Pricing	Pick when
1	Grok 4.1 Fast xAI	Fast enough for leadership/orchestration, cheap relative to premium coding models, and strong for routing and synthesis.	2,000,000	$0.20 / $0.50	speed and conversational flow matter most
2	GPT-5.4 OpenAI	Best premium option when `crew-lead` needs giant context, better judgment, and broad mixed-task reasoning.	1,050,000	$2.50 / $15	you want premium synthesis and deep context
3	Gemini 3.1 Pro Google	Very strong long-context lead brain for project state, docs, and multimodal orchestration.	1,048,576	$2 / $12	you want huge context with better economics

`crew-coder` / `crew-fixer` top 3

For actual code quality, repo surgery, debugging, and difficult implementation work.

#	Model	Why it fits coders	Context	Pricing	Pick when
1	Claude Sonnet 4.6 Anthropic	Best practical default for day-to-day coding quality without Opus cost.	1,000,000	$3 / $15	you want the safest premium coding default
2	GPT-5.3 Codex OpenAI	Excellent for targeted implementation and repair, especially in Codex-style harnesses.	400,000	$1.75 / $14	you want strong repo surgery and OpenAI tooling
3	Kimi K2.5 Moonshot	Best cheaper coding challenger if you want strong output without premium closed-model pricing.	262,144	$0.45 / $2.20	cost matters more than absolute peak quality

`crew-pm` / `crew-qa` top 3

For planning, audits, roadmap work, triage, and cheap high-volume review passes.

#	Model	Why it fits PM / QA	Context	Pricing	Pick when
1	Gemini 2.5 Flash Google	Best cheap strong worker for PM, QA, SEO, and utility roles.	1,048,576	$0.30 / $2.50	you need cheap volume without useless output
2	Grok 4.1 Fast Reasoning xAI	Good PM planning brain when you want speed plus reasoning. Same base pricing as Grok 4.1 Fast, with reasoning enabled.	2,000,000	$0.20 / $0.50	planning quality matters more than raw cheapness
3	Gemini 2.5 Flash Lite Google	Ultra-cheap for repetitive review, summaries, and glue work.	1,048,576	$0.10 / $0.40	you want the lowest-cost support lane

`crew-coder-front` / `crew-frontend`

Frontend implementation, UI polish, and component-heavy coding work.

#	Model	Why	Context	Pricing
1	Claude Sonnet 4.6	Safest premium default for frontend quality.	1,000,000	$3 / $15
2	GPT-5.3 Codex	Strong targeted UI implementation and repair.	400,000	$1.75 / $14
3	Kimi K2.5	Best cheaper frontend coding challenger.	262,144	$0.45 / $2.20

`crew-coder-back`

Backend systems, APIs, infra glue, and long-horizon implementation.

#	Model	Why	Context	Pricing
1	Claude Sonnet 4.6	Strong backend reasoning with good reliability.	1,000,000	$3 / $15
2	GPT-5.3 Codex	Targeted backend implementation and fixer work.	400,000	$1.75 / $14
3	GLM-5	Open challenger for systems and backend-heavy coding.	202,752	$0.72 / $2.30

`crew-security`

Threat review, careful code reading, and security analysis where mistakes are expensive.

#	Model	Why	Context	Pricing
1	Claude Opus 4.6	Premium review model when false confidence is expensive.	1,000,000	$5 / $25
2	Claude Sonnet 4.6	Cheaper default for strong security reading and reasoning.	1,000,000	$3 / $15
3	GPT-5.4	Good mixed security plus architecture fallback.	1,050,000	$2.50 / $15

`crew-github`

Git operations, PR prep, repo automation, and tool-oriented execution.

#	Model	Why	Context	Pricing
1	Gemini 2.5 Flash	Cheap and strong enough for git-oriented execution lanes.	1,048,576	$0.30 / $2.50
2	Grok 4.1 Fast	Fast operator model for repo routing and summaries.	2,000,000	$0.20 / $0.50
3	GPT-5.3 Codex	Use when the git lane also needs real code judgment.	400,000	$1.75 / $14

`crew-copywriter` / `crew-seo`

Content, search pages, summaries, and high-volume marketing tasks.

#	Model	Why	Context	Pricing
1	Gemini 2.5 Flash	Best cheap default for throughput-heavy content lanes.	1,048,576	$0.30 / $2.50
2	MiniMax M2.5	Cheap strong writer and productivity lane.	196,608	$0.27 / $0.95
3	Claude Sonnet 4.6	Use when quality matters more than token burn.	1,000,000	$3 / $15

`crew-researcher`

Live research, citations, search-grounded work, and changing-source synthesis.

#	Model	Why	Context	Pricing
1	Perplexity Sonar	Search-grounded research lane. Retrieval matters as much as raw model IQ here.	Varies	Varies
2	Gemini 3.1 Pro	Strong long-context synthesis when result sets are large.	1,048,576	$2 / $12
3	GPT-5.4	Premium mixed research and reasoning fallback.	1,050,000	$2.50 / $15

`crew-architect` / `crew-ml`

Systems design, ML planning, architecture docs, and deep technical judgment.

#	Model	Why	Context	Pricing
1	GPT-5.4	Best mixed architecture and systems reasoning flagship.	1,050,000	$2.50 / $15
2	Gemini 3.1 Pro	Great high-context planning for large system state.	1,048,576	$2 / $12
3	Claude Sonnet 4.6	Strong architecture plus coding crossover.	1,000,000	$3 / $15

`crew-main` / `crew-orchestrator` / `crew-judge`

Synthesis, routing, judging whether to continue, and decision-gate work.

#	Model	Why	Context	Pricing
1	Groq Llama 3.3 70B Versatile	Cheap, fast, and already practical for routing and gate decisions.	128,000	Groq tier
2	Grok 4.1 Fast	Upgrade when coordinators need bigger context and stronger synthesis.	2,000,000	$0.20 / $0.50
3	Ollama Qwen 2.5 3B / Llama 3.1 8B	Local fallback when you want near-zero cost, privacy, or offline coordination.	128,000	Local

`crew-telegram` / `crew-loco`

Bridge roles, messaging surfaces, and lightweight conversational support.

#	Model	Why	Context	Pricing
1	Gemini 2.5 Flash	Cheap strong default for messaging and bridge lanes.	1,048,576	$0.30 / $2.50
2	Grok 4.1 Fast	Fast conversational fallback with huge context.	2,000,000	$0.20 / $0.50
3	Gemini 2.5 Flash Lite	Lowest-cost utility lane for repetitive bridge work.	1,048,576	$0.10 / $0.40

Chat brain vs execution

This is the distinction most model pages miss. Some crewswarm roles are mostly direct LLMs. Coding roles usually need a real execution engine — Claude Code CLI, Codex CLI, Gemini CLI, Cursor CLI, OpenCode, or the native crew-cli executor — so the model gets tool access, file edits, and terminal capabilities. Assigning a powerful model like Claude Opus 4.6 or GPT-5.4 to a coding role via Direct API wastes its potential; route it through a CLI engine instead. The chat model you talk to and the model that actually executes work do not always need to be the same.

Agent lane	Chat brain	Exec engine	Exec model	Why
crew-lead	Grok 4.1 Fast, GPT-5.4, Gemini 3.1 Pro	Direct API or Cursor CLI	Usually same as chat brain	Leadership is mostly synthesis and routing. It does not always need a coding CLI.
crew-coder / crew-fixer	Claude Sonnet 4.6, GPT-5.4	Claude Code, Codex CLI, Gemini CLI, Cursor CLI, OpenCode, crew-cli	GPT-5.3 Codex, Claude Sonnet 4.6, Kimi K2.5	These roles need real file edits, terminals, tools, and multi-step execution. All 6 engines work. crew-cli is the only option for Grok, DeepSeek, Qwen, Kimi, and local models.
crew-coder-front / back	Claude Sonnet 4.6, GPT-5.4	Claude Code, Codex CLI, Gemini CLI, Cursor CLI, OpenCode, crew-cli	GPT-5.3 Codex, Claude Sonnet 4.6, GLM-5, Kimi K2.5	Same as coders: all 6 engines work. crew-cli is the only option for Grok, DeepSeek, Qwen, Kimi, and local models.
crew-pm / crew-qa / crew-seo	Gemini 2.5 Flash, Grok 4.1 Fast Reasoning	Direct API, Gemini CLI, Codex CLI, crew-cli	Usually same as chat brain	Mostly analysis, summaries, triage, and planning. Direct API is fine for chat-only work, but routing through a CLI engine gives these agents file access and tool use when needed.
crew-researcher	Perplexity Sonar, Gemini 3.1 Pro, GPT-5.4	Direct API	Usually same as chat brain	Research-first roles care more about search and citations than about code execution.
crew-main / orchestrator / judge	Groq Llama 3.3 70B, Grok 4.1 Fast	Direct API	Usually same as chat brain	Coordinator roles are often cheap chat brains unless escalated.
Local / private lane	Ollama Llama 3.1 8B	crew-cli or local OpenCode lane	Ollama Qwen 2.5 Coder 7B, Ollama Llama 3.1 8B	Useful when privacy, offline use, or near-zero cost matters more than raw quality. crew-cli gives local models the full 45+ tool system.

Chat & research → Direct API or lightweight CLI

crew-pm, crew-qa, crew-seo, crew-copywriter, and coordinator roles run fine as Direct API lanes for chat-only work. But routing them through Gemini CLI, Codex CLI, or crew-cli gives them file access and tools when tasks go beyond analysis.

Coding & fixing → CLI engine

crew-coder, crew-fixer, crew-coder-front, and crew-coder-back should always run through a CLI engine (Claude Code, Codex CLI, Gemini CLI, Cursor CLI, OpenCode, or crew-cli). crew-cli is the only option for models without their own CLI — Grok, DeepSeek, Qwen, Kimi, Groq, and local Ollama models all get full agentic coding through crew-cli's 45+ tools and quality engine.

Don't waste big models on Direct API

A model like Claude Opus 4.6 or GPT-5.4 behind a Direct API call can only generate text. Route it through a CLI engine and it can read files, run tests, edit code, and use 45+ built-in tools. Execution mode matters as much as model choice.

Where Cursor fits

Cursor CLI is a good reasoning/chat surface, especially with composer-2-fast, but it should not be treated as the universal answer for every hard execution lane.

FAQ

Short answers to the actual search queries people type when choosing a model stack.

What is the best model for `crew-lead`?

Grok 4.1 Fast is a strong default when you want fast orchestration and lower burn. Use GPT-5.4 or Gemini 3.1 Pro when `crew-lead` needs giant context and stronger synthesis.

What is the best coding model for `crew-coder` and `crew-fixer`?

Claude Sonnet 4.6 is the best practical premium default for crew-coder and crew-fixer. Use Claude Opus 4.6 for the hardest jobs. Use Kimi K2.5 when you want a much cheaper coding challenger.

What is the cheapest good enough model for `crew-qa`, `crew-pm`, and support agents?

Gemini 2.5 Flash is the main answer. It is cheap, fast, and strong enough for crew-qa, crew-pm, crew-seo, triage, and summaries. Gemini 2.5 Flash Lite is the lower-cost fallback for repetitive glue work.

Is OpenRouter a model or a coding agent?

Neither. OpenRouter is a unified API and routing layer. It gives you one endpoint for many vendors and can auto-route or fail over across providers. The agent behavior still comes from tools like crewswarm, Cursor, Claude Code, or Codex CLI.

Should one model run the whole swarm?

No. crewswarm is better with per-agent assignments: one model for `crew-lead`, one for coders, and one cheap workhorse for QA and support roles. That keeps quality high without setting money on fire.