Portable Execution Engine · 24 Providers · Local + Hosted Lanes

One agent is too sequential.
crew-cli keeps the work moving.

crew-cli is the portable execution engine inside CrewSwarm. It runs real agentic coding loops: route the task, plan when needed, use tools, write files, run commands, validate the result, and keep going until the work is done.

The point is not just "another CLI." The point is escaping the one-agent, one-vendor bottleneck. Bring whatever keys you have, mix in local lanes, and keep building when Claude, OpenAI, Anthropic, or another runtime hits limits.

Use cheap or local models for routing and worker churn. Reserve premium models for planning, validation, and hard reasoning. Pay for intelligence where it matters, not on every token of every step.

npm install -g crewswarm-cli

Want the practical tradeoffs? Compare crew-cli with Claude, Codex, Cursor, Gemini, and OpenCode.

$ crew chat "Build a REST API with JWT auth"
[Router] → EXECUTE-DIRECT (fast model, $0.0001)
[Executor] Trying provider: grok
[Grok] cache hit: 384/844 tokens (45%)
🔧 write_file(/tmp/api/server.mjs)
🔧 write_file(/tmp/api/auth.mjs)
🔧 run_cmd(node --test test.mjs)
✅ 3 files created, tests passing. Cost: $0.004
Why It Exists

Built for the real bottleneck

Most AI coding tools still assume one person driving one model in one lane. That breaks down fast when the work needs retries, tools, parallelism, or another provider.

One Agent Is Too Sequential

A single chat thread is fine for small fixes. Real engineering work needs planning, execution, validation, retries, and sometimes multiple lanes moving at once.

One Vendor Is Too Fragile

Rate limits, quota, outages, and model drift should not stop the job. crew-cli is built to keep moving across providers, local models, and external runtimes.

One Price Tier Is Wasteful

Not every step needs premium reasoning. crew-cli lets you spend on the planning brains and keep cheap or local lanes doing the glue work.

Commands

Eight ways to work

Each command uses the same 3-tier pipeline underneath. Simple tasks skip planning. Complex tasks get full artifact generation.

crew chat

One-shot task execution. Describe what you want, get files on disk. Routes automatically — simple tasks execute directly, complex ones get planned first.

crew chat "Add error handling to server.mjs" --apply

crew plan

Generate 7 planning artifacts before writing code: PDD, ROADMAP, ARCH, SCAFFOLD, CONTRACT-TESTS, DOD, GOLDEN-BENCHMARKS. Dual-model validation with risk assessment.

crew plan "Build user auth with OAuth"

crew test-first

TDD pipeline: generates tests first, implements to pass them, then validates. Catches its own bugs. Three LLM calls, ~$0.0002.

crew test-first "write add(a,b) with full test coverage"

crew validate

Blind code review. Scores correctness, security, performance, readability, test coverage. Returns a PASS/FIX verdict with actionable items.

crew validate src/auth.mjs

crew auto

Autonomous mode. Iterates until the task is complete — reads files, writes code, runs commands, verifies output. No human in the loop.

crew auto "create a greet function with tests"

crew repl

Interactive session with colored diff preview before applying changes, session history, memory, and mode switching (manual/assist/autopilot). /preview to review, /apply or /rollback. /sessions to list past sessions, /resume [id] to continue one.

crew repl --mode autopilot

crew apply

Apply sandbox changes with safety gates. Blast-radius analysis blocks risky diffs. --check runs your test suite and parses diagnostics — TSC, ESLint, Go, Rust, pytest errors get fed back to crew-fixer for targeted retry.

crew apply --check "npm test" --retries 3

crew doctor

Health check in 4 seconds: Node.js, Git, API keys, gateway connectivity, MCP servers, CLI updates. Suggests cheapest providers when no keys are configured.

crew doctor
Resilience

No single-provider trap

crew-cli is built for the real constraint in AI terminals: one vendor hits a wall, but the job still needs to finish.

Bring Your Own Keys

Use the providers you already pay for: OpenAI, Anthropic, Google, Groq, DeepSeek, xAI, OpenRouter, local models, and more. No middleman markup, no platform lock-in.

Fail Over, Don’t Stall

If one model path is rate-limited, out of quota, or temporarily unavailable, crew-cli can fall through to another configured provider instead of dying on a single stack.

CLI Limits Aren’t Terminal

Claude Code, Codex CLI, Cursor, Gemini, and direct APIs each have different strengths and failure modes. crew-cli is the portable execution lane when one runtime hits a wall mid-task.

Local Lanes Stay Cheap

Mix local models into the same workflow for effectively free parallel throughput. Keep premium APIs for the hard parts and offload glue work, summaries, and background execution to local lanes.

Pay For The Brain, Not The Glue

Use fast cheap lanes for L1 routing and L3 worker churn. Spend premium tokens on L2 planning, validation, and hard reasoning only when the task actually justifies it.

Architecture

The 3-tier pipeline

Every task flows through three stages. Simple tasks skip straight to execution. Complex tasks get full planning with risk validation.

🚦

Tier 1: Router

Fast, cheap lane decides how to handle the task: direct answer, single execution, or decomposition into more work. This is where cheap hosted models or local router lanes shine.

Groq / Grok / Gemini Flash / local router lanes
🗺️

Tier 2: Planner

This is the expensive brain layer: planning, validation, risk checks, and hard reasoning. Use your premium context-heavy models here when the task actually needs them.

GPT / Claude / Gemini Pro / premium reasoning models

Tier 3: Executor

Tool-using worker lane with 45+ built-in tools: file I/O, shell, git, LSP, browser, and web search. Use strong cheap models, premium coders, or local workers depending on cost and quality needs.

Gemini / Codex / Claude / Groq / local worker lanes
Technical Manual

Engineering-First Terminals

🖼️ Multimodal Vision

Attach images via --image flag or /image REPL command. Native support for Gemini, GPT-4o, Claude 3, Grok Vision — no base64 text dumps.

🔧 45+ Built-in Tools

File I/O, shell, git, LSP diagnostics, Jupyter notebooks, web search, Docker sandbox, memory, multi-turn sub-agents, worktree isolation — more tools than any competitor CLI.

ATAT Protocol

10x more token-efficient than JSON-RPC. Stream-parseable, no escaping, and supports graceful partial execution if a model times out.

View Spec →

Parallel Worker Pool

Execute independent tasks concurrently. Achieve a 2.96x wall-clock speedup over sequential implementation cycles.

Git Worktree Isolation

Multi-agent waves get automatic git worktree isolation — each agent works on its own branch so parallel file edits never conflict. Merges back after the wave completes.

LSP Self-Healing

Language Server Protocol integration identifies syntax/type errors in the sandbox. Agents fix their own bugs before human review.

🧠 Agent Memory

Persistent cross-session memory with MemoryBroker. Agents recall facts, decisions, and prior task results across conversations.

Diagnostic Lint-Loop

Not just pass/fail — --check parses error output into structured diagnostics (TSC, ESLint, GCC, Go, Rust, pytest). Feeds specific file:line:col errors to crew-fixer. Stops early when no progress is detected.

Checkpoint-at-Interval

Git stash snapshots every 60 seconds during long pipeline runs. Roll back to any point via git stash list. Configure with CREW_CHECKPOINT_INTERVAL_MS. Zero overhead when idle.

Speculative Explore

Run competing implementation strategies in parallel branches. Compare architectural diffs side-by-side and merge the winner.

Browser Debugging

Headless Chrome integration for visual QA. Agents can inspect live DOM state and fix CSS/UX issues autonomously.

⚡ Real-time Streaming

Token-by-token output as the LLM generates. All providers stream — Gemini, OpenAI, Anthropic, Grok, DeepSeek, Groq, and OpenRouter. No buffered waits.

🔄 Session Continuity

Full conversation history persists across REPL sessions via SessionManager. Resume where you left off with /history, /status, /clear, /sessions, and /resume.

🩺 crew doctor

Built-in diagnostics: checks Node.js version, Git, API keys, gateway, MCP, and CLI updates in under 4 seconds. Suggests cheapest providers when no keys are set.

🔄 Session Resume

/sessions lists past sessions, /resume [id] picks up where you left off. JSONL crash-safe transcripts survive mid-write crashes — no lost context, ever.

🪝 Tool Hooks

.crew/hooks.json lets you intercept any tool call. PreToolUse can block dangerous commands, PostToolUse can log everything. Shell commands with JSON on stdin.

🌳 Git Worktree Isolation

Agents work in isolated git worktrees on separate branches. No file conflicts during parallel work. Auto-cleanup if no changes, squash merge if changes made.

📊 Token-Aware Compaction

Context compression adapts to how full your context window is. Light compression at 50%, aggressive at 75%+. Per-model context window awareness keeps agents sharp.

🖥️ tmux Session Handoff

Multi-wave pipelines use labeled tmux panes for cross-agent context sharing. Agent A's output, cwd, and env vars are handed off to Agent B via the session manager. Zero cold starts between pipeline waves.

📡 PreToolUse / PostToolUse Hooks

Intercept any tool call with .crew/hooks.json. Block dangerous shell commands, log every file write, or transform tool input before execution. JSON piped on stdin to your shell scripts.

💰 Cost Tracking

Real-time token spend per model with prompt cache savings. Tracks Anthropic 90% cache discount, Groq 50%, Google free tier. Dashboard shows per-agent, per-model cost breakdown.

🔄 Intelligent Retry

Detects stuck agents: questions instead of work, plans instead of code, incomplete bail-outs. Auto-corrects with targeted prompts. Not just backoff — adaptive recovery.

🔌 64 MCP Tools

Built-in MCP server exposes the full swarm via JSON-RPC. dispatch_agent, run_pipeline, chat_send, crewswarm_status — any MCP client can orchestrate the fleet.

Under the Hood

Execution Quality Engine

Most AI coding CLIs run a simple loop: prompt, tool call, repeat. crew-cli runs a quality-aware engine that learns from mistakes, proves its work, and picks the right specialist for each task.

Failure Memory

When a tool call fails, the engine remembers. Same failing command? Blocked automatically. The model gets explicit "don't repeat this" context injected into every subsequent turn, forcing it to try a different approach instead of looping.

Verification-First

The engine extracts verification goals from your task ("tests pass", "build succeeds", "lint clean") and tracks them as first-class state. It won't declare success until every goal is proven. If verification fails, it gets extra turns to fix — then re-verifies.

Patch Critic

Every file edit is evaluated in real time: was the file read before writing? Is the same file being churned repeatedly? Are edits staying in scope? The critic injects quality guidance into the next turn — no extra LLM call needed.

Smart Delegation

When work is split into parallel units, the engine scores each specialist persona against the task: language match, complexity, historical success rate, recent failures. Bug fixes route to the fixer. Docs route to the writer. Performance data improves rankings over time.

Structured History

Instead of flattening tool results into text (losing context on every turn), the engine preserves rich state: which files were read vs written, what goals are active, what failed and why. This survives compaction — the model always knows what matters.

Action Ranking

Each turn, the engine scores possible next actions (read, search, edit, test, verify) based on what's happened so far. Just edited without verifying? Verify ranks highest. Same search three times? It gets penalized. The model sees ranked suggestions, not just raw tools.

Comparison

Competitive Parity

crew-cli is built for the complexity of professional software development, not just smol demos.

Feature crew-cli Claude Code Codex CLI Gemini CLI Cursor
Multi-model routing ✅ 10+ Providers ❌ Anthropic Only ❌ OpenAI Only ❌ Google Only ✅ Native
Multimodal (Images) ✅ All Providers ✅ Claude Vision ❌ Text Only ✅ Gemini Vision ✅ Native
Built-in Tools ✅ 45+ Tools ✅ ~15 Tools ✅ ~10 Tools ✅ ~12 Tools ✅ ~20 Tools
Sandbox + Branching ✅ Professional ❌ Direct Write ✅ Sandbox ❌ Direct Write ❌ Passive
Parallel Dispatch ✅ 21 Specialists ✅ Subagents ❌ Single Agent ❌ Single Agent ✅ Subagents
Agent Memory ✅ Cross-Session ❌ Per-Session ❌ Per-Session ✅ Gems ❌ Per-Session
Diagnostic Lint-Loop ✅ Parsed Errors ❌ Manual ❌ Manual ❌ Manual ✅ loop_on_lints
Browser Debugging ✅ Headless Chrome ❌ No UI Vision ❌ No ❌ No ❌ Passive
Cost Tracking ✅ Per-Session ✅ Integrated ❌ No ❌ No ❌ No Granularity
Streaming Output ✅ All Providers ✅ Native ✅ Native ✅ Native ✅ Native
Diagnostics CLI ✅ crew doctor ❌ No ❌ No ❌ No ❌ No
Session Memory ✅ Persistent ✅ Per-Conversation ❌ Stateless ✅ Gems ❌ Per-Session
Quick Start

One command to launch the crew

# Install the high-performance CLI
npm install -g crewswarm-cli

# Or run from the main repo installer
git clone https://github.com/crewswarm/crewswarm && cd crewswarm && bash install.sh