crew-cli is the portable execution engine inside CrewSwarm. It runs real agentic coding loops: route the task, plan when needed, use tools, write files, run commands, validate the result, and keep going until the work is done.
The point is not just "another CLI." The point is escaping the one-agent, one-vendor bottleneck. Bring whatever keys you have, mix in local lanes, and keep building when Claude, OpenAI, Anthropic, or another runtime hits limits.
Use cheap or local models for routing and worker churn. Reserve premium models for planning, validation, and hard reasoning. Pay for intelligence where it matters, not on every token of every step.
npm install -g crewswarm-cli
Want the practical tradeoffs? Compare crew-cli with Claude, Codex, Cursor, Gemini, and OpenCode.
Most AI coding tools still assume one person driving one model in one lane. That breaks down fast when the work needs retries, tools, parallelism, or another provider.
A single chat thread is fine for small fixes. Real engineering work needs planning, execution, validation, retries, and sometimes multiple lanes moving at once.
Rate limits, quota, outages, and model drift should not stop the job. crew-cli is built to keep moving across providers, local models, and external runtimes.
Not every step needs premium reasoning. crew-cli lets you spend on the planning brains and keep cheap or local lanes doing the glue work.
Each command uses the same 3-tier pipeline underneath. Simple tasks skip planning. Complex tasks get full artifact generation.
One-shot task execution. Describe what you want, get files on disk. Routes automatically — simple tasks execute directly, complex ones get planned first.
crew chat "Add error handling to server.mjs" --apply
Generate 7 planning artifacts before writing code: PDD, ROADMAP, ARCH, SCAFFOLD, CONTRACT-TESTS, DOD, GOLDEN-BENCHMARKS. Dual-model validation with risk assessment.
crew plan "Build user auth with OAuth"
TDD pipeline: generates tests first, implements to pass them, then validates. Catches its own bugs. Three LLM calls, ~$0.0002.
crew test-first "write add(a,b) with full test coverage"
Blind code review. Scores correctness, security, performance, readability, test coverage. Returns a PASS/FIX verdict with actionable items.
crew validate src/auth.mjs
Autonomous mode. Iterates until the task is complete — reads files, writes code, runs commands, verifies output. No human in the loop.
crew auto "create a greet function with tests"
Interactive session with colored diff preview before applying changes, session history, memory, and mode switching (manual/assist/autopilot). /preview to review, /apply or /rollback. /sessions to list past sessions, /resume [id] to continue one.
crew repl --mode autopilot
Apply sandbox changes with safety gates. Blast-radius analysis blocks risky diffs. --check runs your test suite and parses diagnostics — TSC, ESLint, Go, Rust, pytest errors get fed back to crew-fixer for targeted retry.
crew apply --check "npm test" --retries 3
Health check in 4 seconds: Node.js, Git, API keys, gateway connectivity, MCP servers, CLI updates. Suggests cheapest providers when no keys are configured.
crew doctor
crew-cli is built for the real constraint in AI terminals: one vendor hits a wall, but the job still needs to finish.
Use the providers you already pay for: OpenAI, Anthropic, Google, Groq, DeepSeek, xAI, OpenRouter, local models, and more. No middleman markup, no platform lock-in.
If one model path is rate-limited, out of quota, or temporarily unavailable, crew-cli can fall through to another configured provider instead of dying on a single stack.
Claude Code, Codex CLI, Cursor, Gemini, and direct APIs each have different strengths and failure modes. crew-cli is the portable execution lane when one runtime hits a wall mid-task.
Mix local models into the same workflow for effectively free parallel throughput. Keep premium APIs for the hard parts and offload glue work, summaries, and background execution to local lanes.
Use fast cheap lanes for L1 routing and L3 worker churn. Spend premium tokens on L2 planning, validation, and hard reasoning only when the task actually justifies it.
Every task flows through three stages. Simple tasks skip straight to execution. Complex tasks get full planning with risk validation.
Fast, cheap lane decides how to handle the task: direct answer, single execution, or decomposition into more work. This is where cheap hosted models or local router lanes shine.
This is the expensive brain layer: planning, validation, risk checks, and hard reasoning. Use your premium context-heavy models here when the task actually needs them.
Tool-using worker lane with 45+ built-in tools: file I/O, shell, git, LSP, browser, and web search. Use strong cheap models, premium coders, or local workers depending on cost and quality needs.
Attach images via --image flag or /image REPL command. Native support for Gemini, GPT-4o, Claude 3, Grok Vision — no base64 text dumps.
File I/O, shell, git, LSP diagnostics, Jupyter notebooks, web search, Docker sandbox, memory, multi-turn sub-agents, worktree isolation — more tools than any competitor CLI.
10x more token-efficient than JSON-RPC. Stream-parseable, no escaping, and supports graceful partial execution if a model times out.
View Spec →Execute independent tasks concurrently. Achieve a 2.96x wall-clock speedup over sequential implementation cycles.
Multi-agent waves get automatic git worktree isolation — each agent works on its own branch so parallel file edits never conflict. Merges back after the wave completes.
Language Server Protocol integration identifies syntax/type errors in the sandbox. Agents fix their own bugs before human review.
Persistent cross-session memory with MemoryBroker. Agents recall facts, decisions, and prior task results across conversations.
Not just pass/fail — --check parses error output into structured diagnostics (TSC, ESLint, GCC, Go, Rust, pytest). Feeds specific file:line:col errors to crew-fixer. Stops early when no progress is detected.
Git stash snapshots every 60 seconds during long pipeline runs. Roll back to any point via git stash list. Configure with CREW_CHECKPOINT_INTERVAL_MS. Zero overhead when idle.
Run competing implementation strategies in parallel branches. Compare architectural diffs side-by-side and merge the winner.
Headless Chrome integration for visual QA. Agents can inspect live DOM state and fix CSS/UX issues autonomously.
Token-by-token output as the LLM generates. All providers stream — Gemini, OpenAI, Anthropic, Grok, DeepSeek, Groq, and OpenRouter. No buffered waits.
Full conversation history persists across REPL sessions via SessionManager. Resume where you left off with /history, /status, /clear, /sessions, and /resume.
Built-in diagnostics: checks Node.js version, Git, API keys, gateway, MCP, and CLI updates in under 4 seconds. Suggests cheapest providers when no keys are set.
/sessions lists past sessions, /resume [id] picks up where you left off. JSONL crash-safe transcripts survive mid-write crashes — no lost context, ever.
.crew/hooks.json lets you intercept any tool call. PreToolUse can block dangerous commands, PostToolUse can log everything. Shell commands with JSON on stdin.
Agents work in isolated git worktrees on separate branches. No file conflicts during parallel work. Auto-cleanup if no changes, squash merge if changes made.
Context compression adapts to how full your context window is. Light compression at 50%, aggressive at 75%+. Per-model context window awareness keeps agents sharp.
Multi-wave pipelines use labeled tmux panes for cross-agent context sharing. Agent A's output, cwd, and env vars are handed off to Agent B via the session manager. Zero cold starts between pipeline waves.
Intercept any tool call with .crew/hooks.json. Block dangerous shell commands, log every file write, or transform tool input before execution. JSON piped on stdin to your shell scripts.
Real-time token spend per model with prompt cache savings. Tracks Anthropic 90% cache discount, Groq 50%, Google free tier. Dashboard shows per-agent, per-model cost breakdown.
Detects stuck agents: questions instead of work, plans instead of code, incomplete bail-outs. Auto-corrects with targeted prompts. Not just backoff — adaptive recovery.
Built-in MCP server exposes the full swarm via JSON-RPC. dispatch_agent, run_pipeline, chat_send, crewswarm_status — any MCP client can orchestrate the fleet.
Most AI coding CLIs run a simple loop: prompt, tool call, repeat. crew-cli runs a quality-aware engine that learns from mistakes, proves its work, and picks the right specialist for each task.
When a tool call fails, the engine remembers. Same failing command? Blocked automatically. The model gets explicit "don't repeat this" context injected into every subsequent turn, forcing it to try a different approach instead of looping.
The engine extracts verification goals from your task ("tests pass", "build succeeds", "lint clean") and tracks them as first-class state. It won't declare success until every goal is proven. If verification fails, it gets extra turns to fix — then re-verifies.
Every file edit is evaluated in real time: was the file read before writing? Is the same file being churned repeatedly? Are edits staying in scope? The critic injects quality guidance into the next turn — no extra LLM call needed.
When work is split into parallel units, the engine scores each specialist persona against the task: language match, complexity, historical success rate, recent failures. Bug fixes route to the fixer. Docs route to the writer. Performance data improves rankings over time.
Instead of flattening tool results into text (losing context on every turn), the engine preserves rich state: which files were read vs written, what goals are active, what failed and why. This survives compaction — the model always knows what matters.
Each turn, the engine scores possible next actions (read, search, edit, test, verify) based on what's happened so far. Just edited without verifying? Verify ranks highest. Same search three times? It gets penalized. The model sees ranked suggestions, not just raw tools.
crew-cli is built for the complexity of professional software development, not just smol demos.
| Feature | crew-cli | Claude Code | Codex CLI | Gemini CLI | Cursor |
|---|---|---|---|---|---|
| Multi-model routing | ✅ 10+ Providers | ❌ Anthropic Only | ❌ OpenAI Only | ❌ Google Only | ✅ Native |
| Multimodal (Images) | ✅ All Providers | ✅ Claude Vision | ❌ Text Only | ✅ Gemini Vision | ✅ Native |
| Built-in Tools | ✅ 45+ Tools | ✅ ~15 Tools | ✅ ~10 Tools | ✅ ~12 Tools | ✅ ~20 Tools |
| Sandbox + Branching | ✅ Professional | ❌ Direct Write | ✅ Sandbox | ❌ Direct Write | ❌ Passive |
| Parallel Dispatch | ✅ 21 Specialists | ✅ Subagents | ❌ Single Agent | ❌ Single Agent | ✅ Subagents |
| Agent Memory | ✅ Cross-Session | ❌ Per-Session | ❌ Per-Session | ✅ Gems | ❌ Per-Session |
| Diagnostic Lint-Loop | ✅ Parsed Errors | ❌ Manual | ❌ Manual | ❌ Manual | ✅ loop_on_lints |
| Browser Debugging | ✅ Headless Chrome | ❌ No UI Vision | ❌ No | ❌ No | ❌ Passive |
| Cost Tracking | ✅ Per-Session | ✅ Integrated | ❌ No | ❌ No | ❌ No Granularity |
| Streaming Output | ✅ All Providers | ✅ Native | ✅ Native | ✅ Native | ✅ Native |
| Diagnostics CLI | ✅ crew doctor | ❌ No | ❌ No | ❌ No | ❌ No |
| Session Memory | ✅ Persistent | ✅ Per-Conversation | ❌ Stateless | ✅ Gems | ❌ Per-Session |