crewswarm — Multi-Agent AI Coding Platform

Q: How does the PM Loop keep shipping without new prompts?

The PM Loop reads ROADMAP.md, ships every pending item, then calls Groq as a product strategist to append fresh roadmap items based on the live output. It repeats forever until you stop it.

Operating Model

You are the PM. The agents are the engineers.

Single-agent tools still assume one human driving one model. crewswarm is built for a different workflow: define the goal, split the work, run multiple specialists in parallel, and spend your time unblocking and reviewing instead of waiting for one agent to finish.

Parallel by default

Frontend, backend, QA, PM, and security lanes can all move at once. Idle time should start another worker, not watch one agent type.

Delegate, don’t babysit

Assign concrete tasks, constraints, and acceptance criteria. The system routes work to the right agent and engine instead of forcing one model to do everything sequentially.

Keep the crew unblocked

The human job moves up a layer: approve direction, resolve blockers, review outputs, and redirect effort. That is the PM loop.

Read the PM Loop manifesto · See a concrete PM-loop run · Compare the execution lanes

Most AI dev tools are just a chat box bolted onto an editor.

crewswarm is different. It is built to handle actual execution. While other tools fake the important parts, our specialist agents write real files, execute commands locally, and maintain persistent project memory across multiple steps without disappearing into someone else's cloud.

The Product

One stack. Three surfaces.

Use the Dashboard as the control plane, Vibe as the browser IDE, and crewchat as the native chat client. All three talk to the same agents, memory, and runtime.

Dashboard Services, agents, models, memory, and build control.

Vibe File tree, Monaco editor, chat, diffs, and terminal.

crewchat Native chat surface for quick routing and project context.

vibe.html

Open Vibe →

Real Vibe browser IDE screenshot with file explorer, editor workspace, agent chat, and activity trace

How it works

From requirement to reality
in one command

No orchestration expertise required. Write what you need in plain English.

01

You write a requirement

One sentence, one paragraph, or a full spec. Drop it in the dashboard or pass it on the CLI.

→

02

PM breaks it into tasks

crew-pm plans MVP → Phase 1 → Phase 2. Each phase gets 3–5 small, targeted tasks.

→

03

Agents execute with real tools

crew-coder writes code, crew-qa adds tests, crew-fixer handles bugs — each gets exactly their task.

→

04

Done. Files on disk.

Real files, real output. No hallucinated success messages. Failed tasks hit the DLQ for replay.

Quickstart

Install to first build in 60 seconds

One npm install. Pick your models. Ship a feature. That's it.

Architecture

6 engines, 22 agents, one RT bus

crewswarm runs as one stack: 6 coding engines for execution, a realtime bus for agent coordination, and surfaces (dashboard, Vibe, crew-cli, Telegram, WhatsApp) on top.

Planning crew-pm · roadmap → phased tasks

↓ command · assign

crew-coder

crew-coder-front

crew-coder-back

crew-qa

crew-fixer

crew-security

↓ done · status

Coordination crew-lead · route, synthesize, reply

RT bus channels (command, assign, done, status, events) coordinate 22 agents across 6 coding engines.

+

crew-cli Execution Quality Engine

Every other AI CLI runs a blind loop: prompt, tool call, repeat. crew-cli wraps every task in 8 quality modules that prevent failures, force verification, and learn from past runs. The result: a free 70B open model produces the same code quality as GPT-5.4.

5 models pass all tests
Claude Opus, Sonnet, GPT-5.4, Qwen 3.5, GLM-5.1 — including 2 free local models

12 providers, 4/4 presets
OAuth + API key + OpenRouter. Chinese models (GLM-5, Qwen3, Kimi) included.

Zero-cost quality layers
Failure memory, action ranking, edit gate, patch critic — all in-process, no extra LLM calls

50x cost reduction
Gemini Flash at $0.003 matches GPT-5.4 at $0.15 — same score, same tests passing

See the full engine breakdown and benchmark results →

Three layers, one stack

The product stays simple because the runtime is layered cleanly underneath it.

01

Execution engines

Claude Code · Cursor CLI · Codex CLI · Gemini CLI · OpenCode · crew-cli

Six coding engines that write files, run commands, and stream output across all 22 providers. Each agent can use a different engine. Switch from the dashboard.

↓

02

RT bus + agent bridges

WebSocket bus · targeted dispatch · retries · DLQ · wave orchestration

22 agent bridges connect via WebSocket. Targeted dispatch sends tasks to specific agents. Failed tasks retry with backoff, then hit the Dead Letter Queue.

↓

03

Product surfaces

Dashboard · Vibe IDE · crew-cli · Telegram · WhatsApp · MCP

PM Loop, shared memory, wave orchestration, session resume, and fault recovery — accessible from any surface. Same agents, same RT bus, different interfaces.

Rate limits are real

Hit a limit? Switch engines. Keep building.

Every $20/month plan has rate limits. Claude, Cursor, Codex — you'll hit the wall mid-feature. crewswarm is the only tool where you seamlessly switch to another engine and keep your session context. Or pick the best CLI for each job.

🤖

Claude Code

Best for large refactors, multi-file reasoning, and frontend work. Full workspace context means it sees everything. Native session resume across messages.

Best for: crew-coder, crew-fixer, frontend

🖱

Cursor CLI

Best for architectural decisions and complex reasoning. Isolated context windows prevent cross-agent bleed. Parallel waves with zero queuing.

Best for: crew-architect, crew-orchestrator

✨

Gemini CLI

Free tier: 60 req/min, 1,000 req/day. 1M token context window. Built-in Google Search grounding for research-heavy tasks, SEO work, and web-connected coding.

Best for: research, SEO, free-tier fallback

🟣

Codex CLI

Fast agentic coding with full sandbox access. No approval prompts — just executes. Built for OpenAI models with MCP integration. Great for backend and API work.

Best for: crew-coder-back, fast iteration

⚡

OpenCode

Works with any model provider — Groq, DeepSeek, Ollama, anything. Persistent sessions survive between tasks. The provider-flexible workhorse for long coding sessions.

Best for: provider flexibility, long sessions

🔧

crew-cli

The native engine. Routes to 20+ specialist agents, sandbox workflows with preview-before-apply, parallel worker pool (3x speedup), and LSP self-healing.

Best for: orchestration, quality-gated workflows

You're not locked in. Rate-limited on Claude? Switch to Gemini (free) or Cursor. Need web search? Use Gemini. Need deep reasoning? Use Claude. Mix engines per agent, per task, per mood.

Built different

How crewswarm differs from framework-only stacks

crewswarm is a runtime, not just a library. You get a realtime bus, daemon orchestration, and first-class routing into OpenCode, Cursor CLI, Claude Code, and crew-cli out of the box.

Six execution engines, first-class

Every agent can run inside Claude Code, Cursor CLI, Gemini CLI, Codex CLI, OpenCode, or crew-cli. Switch per agent from the dashboard — no config files, no restarts. Each engine keeps native multi-session resume so context persists across messages and restarts.

Realtime bus + daemons

Agents run as long-lived daemons connected to an RT message bus. Tasks flow over command, assign, done, issues — no in-process-only simulation. Real dispatch, real replies.

Execution layer included

45+ built-in tools (@@WRITE_FILE, @@RUN_CMD, etc.) are executed by the gateway with allowlists and path sandboxing. You don’t have to build a runner or wire a framework to one.

Git worktree isolation

Multi-agent waves automatically get per-agent git worktrees. Parallel agents edit files on isolated branches that merge back after the wave completes. No filesystem conflicts, no manual branch management.

DLQ and fault recovery

Failed tasks go to a Dead Letter Queue with JSONL crash-safe transcripts. Retry with backoff, replay from the dashboard. Framework-only stacks leave retries and observability to you.

Your models, your machine

Each agent calls its LLM directly with your API key. No proxy, no vendor lock-in. Run fully local with Ollama. Compare options in the comparison section.

Why not just Cursor?

Because a single editor is not a runtime

Cursor is an editor. crewswarm is the control plane, runtime, and memory layer around your editors and CLIs.

🧭

Persistent coordination

Agents, services, memory, and projects survive beyond one editor tab or one CLI session.

🛠️

Runtime control

Start, stop, inspect, and route the whole stack from the dashboard instead of gluing scripts together manually.

🔁

Cross-surface continuity

Dashboard, Vibe, crewchat, SwiftBar, and CLI surfaces all talk to the same orchestration layer.

The Ecosystem

One runtime, multiple surfaces.

crewswarm is a modular stack. The core orchestration happens under the hood, but you can interact with the crew through whichever surface fits your current workflow.

🧠

The System (Core Runtime)

The beating heart of crewswarm. It relies on the ATAT WebSocket bus, persistent shared memory, and the PM loop to route your requests to 20+ specialized agents seamlessly.

✨

Vibe IDE

A full-screen, browser-based native IDE. Combines a powerful code editor, file tree, terminal, and agent chat panel into a single unified window. Perfect for starting entirely new projects from scratch.

💻

crew-cli

The portable execution engine. Gives agentic coding to every model that doesn't have its own CLI — Grok, DeepSeek, Qwen, Groq, local Ollama. 45+ built-in tools and an 8-module quality engine that makes cheap models match premium ones.

🎛️

The Dashboard

Your control plane. Manage API keys, assign LLM models to specific agents (like Claude for coding, Groq for fast planning), map local tools, and view the real-time swarm logs.

💬

crewchat

Native chat surface for quick routing and project context. Lightweight, fast, and connected to the same RT bus as every other surface.

📱

Telegram & WhatsApp

Message your crew from your phone. Dispatch tasks, check status, and review agent output from anywhere. Same agents, same memory, different interface.

🔌

OpenClaw Plugin

Use crewswarm as the engineering backend for OpenClaw's desktop and mobile apps. Access the full crew via agent tools, slash commands, or gateway RPC.

🔗

MCP Server

Expose 64 tools to any MCP-compatible client. Connect crewswarm to Claude Desktop, VS Code, or any tool that speaks Model Context Protocol.

Features

Everything a dev crew needs,
minus the meetings

⚡

PM-led orchestration

Natural-language requirement → PM breaks it into tasks → targeted dispatch to the right agent. No broadcast races. No duplicate work.

crew-pm → crew-coder → crew-qa → crew-fixer

🎯

Targeted dispatch

Send to one agent by name. --send crew-coder "Build auth". Only that agent receives it.

📐

Phased builds (PDD)

MVP → Phase 1 → Phase 2. Failed tasks auto-break into subtasks and retry. No work is lost.

🧩

Domain-Aware Planning

Large codebases (100K+ lines) get subsystem-specific PM agents. crew-pm-cli handles CLI tasks, crew-pm-frontend owns dashboard, crew-pm-core manages orchestration. No more hallucinated file paths.

🧠

Shared Memory + Project Message RAG

Every agent reads shared memory (brain.md, decisions, handoff notes). All project messages auto-save to ~/.crewswarm/project-messages/ and are automatically indexed for semantic search using local TF-IDF + cosine similarity — no API calls, all local.

AgentMemory

Cognitive facts (decisions, constraints, preferences)
Written by @@BRAIN commands
Persists across all sessions

Project Messages

All chat saved to JSONL automatically
Semantic search: "What did we discuss about auth?"
Export to markdown, JSON, CSV

AgentKeeper

Task results from completed work
Gateway records after execution
Available to all future agents

Cache headers prevent stale data. Messages persist across tab switches. Zero configuration needed.

🔌

Skill-powered

Extend agents with data-driven SKILL.md or JSON plugins, plus PreToolUse/PostToolUse hooks for fine-grained control. Add Twitter, Fly.io, or custom API tools without writing JS code.

🎤

Multimodal Support — Images + Voice across all platforms

Send images or voice messages from any surface. Dashboard, Telegram, WhatsApp, and crewchat all support image recognition and voice transcription. Powered by Groq (fast/cheap ~$3/month) or Gemini 2.0 Flash (best quality).

📱 Dashboard Click 📷 to upload images, 🎤 to record voice messages

💬 Telegram/WhatsApp Send photos or voice notes → auto-analyzed and transcribed

🍎 crewchat Native image picker + AVFoundation voice recording

🐦

Real-time X/Twitter Intelligence with Grok

The only AI coding platform with live X/Twitter search. Use @@SKILL grok.x-search to search recent tweets, get citations with X post URLs, filter by date ranges, handles, and media types. Powered by xAI's Grok 3 with real-time X data access.

grok.x-search

Search recent tweets (last 24-48 hours)
Filter by handles, date ranges, media types
Citations with X post URLs

grok.vision

Image analysis with grok-vision-beta
Screenshot analysis and UI audits
Diagram and chart interpretation

Competitive edge: GitHub Copilot can't search X. Claude can't search X. Only crewswarm has real-time X intelligence.

🐳

Docker-First Deployment — Multi-Arch Images

One-line install on any Linux machine. Multi-arch images (AMD64 + ARM64) for servers, VMs, Raspberry Pi, cloud deployments, and CI/CD. Perfect for team shared instances, GitHub Actions, or self-hosted setups.

curl -fsSL https://raw.githubusercontent.com/crewswarm/crewswarm/main/scripts/install-docker.sh | bash

☁️ Cloud VMs

AWS, GCP, DigitalOcean, Azure

🏠 Home Servers

Raspberry Pi 4/5, NUCs, edge devices

🔄 CI/CD

GitHub Actions, GitLab CI, Jenkins

👥 Team Instances

Shared crew for entire team

Local dev setup also available for contributors. Pick the deployment that fits your workflow.

🔄

Fault tolerance

Retry with backoff and task leases. After max retries, tasks hit the Dead Letter Queue for manual replay from the dashboard.

🚀

Six execution engines — your choice per agent

Your crew runs specialist AI agents (PM, coder, QA, fixer…) — each one calling its LLM directly. For heavy coding tasks, agents can go deeper: route them into OpenCode, Cursor CLI, Claude Code, Codex CLI, crew-cli, or Gemini CLI for full file editing, bash access, and persistent sessions. Switch per agent from the dashboard. No restarts, no config files.

OpenCode

Persistent sessions per agent
Full file editing + bash
Context survives across tasks

Best for: crew-coder, crew-fixer, crew-coder-front/back

Cursor CLI

opus-thinking + sonnet-4.6
Deep reasoning & architecture
Parallel wave dispatch

Best for: crew-main, crew-architect, complex reasoning

Claude Code

Full workspace context
Native Anthropic tool use
Session continuity per agent

Best for: large refactors, multi-file reasoning

crew-cli

3-Tier AI Architecture (Router/Planner/Worker)
3x Parallel Speedup over sequential cycles
ATAT Protocol & LSP Self-Healing enabled

Best for: High-performance terminal engineering

Codex CLI

OpenAI's agentic coding CLI
Full sandbox + file editing
No approval prompts — just executes

Best for: crew-coder-back, fast backend iteration

Gemini CLI

Google Gemini 2.0 Flash / Pro — stream-json output
Fast inference, multimodal support
Non-interactive --yolo mode

Best for: Fast iterations, Google-model workflows

🔌

MCP server — your crew in any AI tool

crewswarm exposes your entire crew as an MCP server on port 5020. Add one line to ~/.cursor/mcp.json (or Claude Code, OpenCode, Codex) and every project gets your full persistent agent fleet — not session-scoped generics, but your crew with memory, custom models, and cross-agent coordination.

dispatch_agent

Send a task to any specialist agent
Waits for the result
Full tool access + memory

run_pipeline

Multi-agent chains from any client
Each stage passes output to the next
PM → coder → QA in one call

chat_stinki + skills

Talk to crew-lead directly via MCP
Run any skill (deploy, TTS, webhooks…)
OpenAI-compatible API on same port

~/.cursor/mcp.json → {"mcpServers":{"crewswarm":{"url":"http://127.0.0.1:5020/mcp"}}}

⚙️

@@ Protocol — 10x more efficient than JSON-RPC

While others use verbose JSON-RPC or natural language, crewswarm agents communicate via a proprietary @@ syntax that's 10x more token-efficient, unambiguous, and easy for LLMs to generate. MCP-compatible via translation layer.

Standard JSON-RPC

{"tool": "write", 
 "params": {
  "path": "app.ts",
  "content": "import express..."
}}

~80 tokens • Fragile

@@ Protocol (CAP)

@@WRITE_FILE app.ts
import express from 'express';
const app = express();
...
@@END_FILE

~8 tokens • 10x less overhead

Why @@ wins:

Zero ambiguity — Regex-parseable, no JSON errors
Inline with prose — Explain AND execute in one message
LLM-friendly — Easy to generate from prompt examples
Cost savings — 10x fewer tokens = cheaper API bills

Graceful Failure:

Unlike JSON, CAP is stream-parseable. If a model hits a context limit halfway through writing 4 files, the first 3 are still valid and executed. In JSON, you lose the whole turn.

Available commands: @@READ_FILE • @@WRITE_FILE • @@RUN_CMD • @@DISPATCH • @@PIPELINE • @@SKILL • @@WEB_SEARCH • @@MEMORY

🖥️

Seven control surfaces

Pick how you want to drive the crew. Every surface talks to the same RT bus and the same agents.

Chat with Stinki

Dashboard Full web UI — chat, build, services, RT bus, DLQ, spend

crewswarm Vibe Browser-native IDE with Monaco — real-time file tree + agent chat.

crewchat Quick & Advanced modes — multimodal image + voice support.

REST API / CLI curl /chat · direct dispatch · scheduled pipelines

Mobile messengers

Telegram Message Stinki from your phone — same conversation as Dashboard

WhatsApp Personal bot via Baileys — QR scan once, then chat from WhatsApp

Monitoring & control

Dashboard Services · RT bus · DLQ replay · spend · agent health

SwiftBar macOS menu bar — status, quick restart, agent logs

Wave Orchestration

Multiple agents working at the same time

Instead of one agent doing everything in sequence, crewswarm dispatches tasks to multiple agents in parallel. Backend, frontend, and tests all get built simultaneously — 3x faster than waiting for one agent to finish before the next starts.

Wave 1

Sequential Start

Wave 1 runs first — typically crew-pm planning the build and breaking it into tasks.

Wave 2

Parallel Execution

Wave 2 tasks run simultaneously — crew-coder + crew-qa + crew-security all working at once. Git worktree isolation prevents file conflicts between parallel agents.

Wave 3

Synthesis

Wave 3 waits for wave 2 completion, then crew-main synthesizes results and validates the build.

@@PIPELINE Wave Syntax

@@PIPELINE [
  {"wave":1, "agent":"crew-pm", "task":"Plan the build and create roadmap"},
  {"wave":2, "agent":"crew-coder", "task":"Implement backend API"},
  {"wave":2, "agent":"crew-qa", "task":"Write integration tests"},
  {"wave":2, "agent":"crew-security", "task":"Security audit"},
  {"wave":3, "agent":"crew-main", "task":"Synthesize results and validate"}
]

Wave 1 runs first. All wave 2 tasks execute in parallel. Wave 3 waits for wave 2 completion before starting.

⚡

Significantly Faster Builds

Parallel execution means crew-coder, crew-qa, and crew-security run simultaneously instead of sequentially — cutting build time proportional to agents in each wave (3 parallel agents = ~3x faster).

🎯

No Race Conditions

Wave dependencies prevent file conflicts and duplicate work. Each wave waits for the previous wave to complete before starting.

🔄

Auto-Retry

Failed wave tasks retry independently without blocking other waves. Builds keep moving even when individual agents hit errors.

Models

Different models for different agents

Every agent gets its own model — configured from the dashboard, no config files. Use cheap, fast models for routing and planning ($0.10/M tokens). Use powerful models for coding and reasoning ($3/M tokens). Use free models for QA and testing. Your bill drops 5-10x compared to running everything on one expensive model.

crew-lead (router)

Groq Llama 3.3 70B

Free

crew-pm (planner)

Gemini 2.5 Flash

$0.075/M tokens

crew-coder (builder)

Claude Sonnet 4.6

$3/M tokens

crew-qa (tester)

Gemini CLI (OAuth)

Free (1K req/day)

Change any agent's model from the dashboard. No restarts, no config files. All these providers work out of the box:

OpenAI

GPT-4.1 · GPT-4.1-mini · o3 · o4-mini

Industry standard. Best for general reasoning and instruction following.

Anthropic

Claude Sonnet 4.6 · Claude Opus 4.6 · Claude Haiku 4.5

Top-tier code quality and instruction following. Strong on long context.

GQ

Groq

Llama 4 Scout · Llama 3.3 70B · Gemma 2 9B

Blazing-fast inference. Best for QA and fixer agents where speed matters.

Mistral

Mistral Large · Codestral · Mistral Small

Excellent for code generation. Codestral is purpose-built for dev tasks.

DS

DeepSeek

DeepSeek V3 · DeepSeek R1

Open-source coding model with exceptional code completion quality.

Perplexity

Sonar Pro · Sonar

Real-time web search. Ideal for the PM agent to research before planning.

Google Gemini

Gemini 2.5 Pro · Gemini 2.5 Flash · Gemini 2.0 Flash

Multimodal and fast. Free tier via Gemini CLI — 60 req/min with any Google account, no API key needed.

OpenRouter

Claude · GPT-4 · Gemini · Llama · Mistral · 200+ models

One API key for hundreds of models. Route to any provider — Claude, OpenAI, Google, Mistral, and more — through a single endpoint.

FW

Fireworks AI

GLM · Kimi · Qwen · GPT-OSS · DeepSeek

OpenAI-compatible inference platform with fast serverless access to strong open models, plus fine-tuning and dedicated deployments when you need them.

CB

Cerebras

Llama 4 Scout · Llama 3.3 70B

Ultra-fast hardware inference. Near-instant responses for latency-critical agents.

xAI

xAI / Grok

Grok 3 · Grok 3 Mini · Grok 3 Vision

xAI's model with real-time X/Twitter data access and strong reasoning.

Ollama (Local)

Qwen 3 · DeepSeek R1 · Phi-4 · Llama 4

Run fully local. No API keys, no rate limits, no data leaving your machine.

🦁

Brave Search

Web Search API

Fast web search. crew-lead and the PM loop use it for lookups when you ask questions — add your key in the dashboard Search & Research Tools.

∥

Parallel

Deep research · Web synthesis

Multi-step research and synthesis. Used by the PM for project planning and deep lookups. Configure in the dashboard alongside Brave.

TG

Together AI

Llama 3.3 70B · Qwen 2.5 Coder · DeepSeek R1

Fast open-source model hosting. Great balance of speed, cost, and model selection.

🤗

Hugging Face

Llama 3.3 · Qwen 2.5 · Mistral · 1000+ models

The open-source model hub. Access thousands of models via the Inference API.

VN

Venice AI

Llama 3.3 70B · DeepSeek R1 671B

Privacy-focused inference. No logging, no training on your data.

🌙

Moonshot / Kimi

Moonshot V1 128K · Kimi K2

Strong long-context models. 128K+ token windows for large codebases.

MM

MiniMax

abab6.5 · abab5.5

Chinese LLM provider with competitive pricing and multilingual support.

🌋

Volcengine

Doubao Pro · Doubao Lite

ByteDance's cloud platform. Doubao models with fast inference.

QF

Baidu Qianfan

ERNIE 4.0 · ERNIE Speed

Baidu's ERNIE models. Strong on Chinese language tasks and reasoning.

⚡

vLLM / SGLang

Any open model · Self-hosted

Run your own inference server. Full control over hardware, models, and latency. OpenAI-compatible API.

💡

Each agent can use a different provider. Add API keys and assign models from the Providers tab; add Brave and Parallel keys in Search & Research Tools for crew-lead and PM lookups. Or edit the config JSON directly. Switch at any time, no restarts needed.

Open Source

Free forever.
MIT licensed.

crewswarm is open-source software. Use it, modify it, contribute to it.

MIT License

Use it for personal projects, commercial products, or anything in between. No restrictions.

🆓

Free to use

No subscription. No usage limits. Bring your own API keys for the LLM providers you choose.

🤝

Community-driven

Contributions welcome. Report issues, submit PRs, or just star the repo to show support.

View on GitHub

See it in action

From install to first build

Clone, install, and ship a feature — all in under 60 seconds.

crewswarm Vibe IDE

Explore Vibe Install crewswarm

The crew

Specialized agents,
targeted tasks

A crew of specialists, each with a role, a model, and a set of tools. The PM decides who gets what — no broadcast racing.

crew-lead · Chat Commander

🧠 Stinki ☠️

The pirate captain of this AI swarm. Stinki orchestrates dispatches, drafts roadmaps, and answers your questions with web search or codebase dives. Talk to him from the dashboard, Telegram, or WhatsApp. If you're talking shit, he'll roast back — while keeping the ship afloat. No prisoners. Just results.

🔥 Talks back 🗺️ Roadmaps on demand 🌐 Web search + fetch ⚡ Dispatches the crew 📁 Reads & writes files ⚙️ Runs shell commands 🔧 Calls skills 📱 Telegram + WhatsApp

Try it

"build me a SaaS landing page with a waitlist"

"have crew-qa audit the last PR"

"what's the fastest free model right now?"

"kick off the pipeline for my React app"

Q

crew-main Quill

Coordinator

Chat, triage, and kick off orchestrators. Your first point of contact.

P

crew-pm Planner

Planning

Breaks requirements into phased tasks. Assigns agents. Keeps scope tight.

C

crew-coder Coder

Implementation

Writes code, creates files, runs shell commands. The workhorse of every build.

F

crew-coder-front Mistral Front

Frontend specialist

UI, styling, and client-side code. Knows the design system and keeps markup clean.

B

crew-coder-back DeepSeek Back

Backend specialist

APIs, databases, and server-side logic. Optimized for structured, deep code tasks.

C

crew-copywriter Copy

Copywriting

Headlines, CTAs, and product copy. Keeps brand voice sharp and on-message.

T

crew-qa Tester

Quality assurance

Adds tests, validates behavior, and audits output before anything ships.

D

crew-fixer Debugger

Bug fixing

Diagnoses failures, fixes edge cases, patches what QA flags.

S

crew-security Guardian

Security review

Audits for vulnerabilities, hardens configs, and enforces best practices.

G

crew-github GitBot

Git & PRs

Commits, branches, pull requests, and GitHub Actions. Runs real git and gh commands.

R

crew-researcher Scout

Web research

Searches the web, summarizes findings, and surfaces competitive intelligence via Perplexity.

A

crew-architect Arch

System design & DevOps

Designs systems, writes infra-as-code, and handles deployment pipelines.

F

crew-frontend Pixel

CSS & design systems

Polished UI, animations, theming, and layout — Apple/Linear-level visual craft.

S

crew-seo Rank

SEO specialist

Keyword research, meta tags, content strategy, and technical SEO audits.

M

crew-ml Neuron

Machine learning

AI pipelines, model selection, fine-tuning setup, and data preprocessing workflows.

O

crew-orchestrator Wave

Parallel orchestration

Fans out tasks to multiple agents simultaneously in waves. No queuing, no collisions.

T

crew-telegram TGBot

Telegram bridge

Routes tasks and replies through your Telegram bot. Dispatch to any agent from your phone.

W

crew-whatsapp WABot

WhatsApp bridge

Personal WhatsApp bot via Baileys. Chat with your crew from any device — no Business API needed.

M

crew-mega Mega

Heavy general tasks

Long-horizon, high-complexity tasks that need extended context and deep reasoning.

Use cases

How the crew ships

01

Build a feature from one sentence

Type a requirement in the Build tab, click Run. PM plans it, coder builds it, QA tests it. Watch it happen in RT Messages.

Phased builds

02

Fix a bug and add tests

crew-fixer diagnoses and patches, crew-qa writes the test suite. Targeted dispatch means each agent does exactly one thing.

Targeted dispatch

03

Ship a small API with CRUD + tests

"Build a todo API with Express, CRUD endpoints, and a test file." One sentence. Real files on disk in minutes.

Single-shot build

04

Automate your workflows with Skills

Drop a SKILL.md in your config folder. Agents immediately gain the ability to deploy to Fly.io, send tweets, or hit any custom API.

Extensibility

05

Control from the menu bar

SwiftBar shows status at a glance. Start, stop, restart agents. Send a message to any agent. Open logs.

SwiftBar

06

Recover from failures

Max retries hit? Task goes to the DLQ. Open the dashboard, see the error, replay with one click — or fix and rerun.

DLQ + replay

07

Keep agents aligned across sessions

Shared memory files — current state, decisions, handoff — are injected into every agent. Resume tomorrow exactly where you left off.

Shared memory

08

Route through any CLI engine

One click in the dashboard switches agents between Claude Code, Cursor, OpenCode, Gemini, or crew-cli. Each agent maintains its own persistent session — context survives across tasks.

Multi-engine routing

09

Mix models and engines per agent

Send your architect to Claude for deep reasoning, your coders to Cursor for fast edits, your QA to Gemini for breadth. Mix execution modes per agent — no restarts, no config files.

Per-agent engine config

Dashboard

Everything in one place

Build, dispatch, replay, monitor — and open any file the crew wrote directly in Cursor or OpenCode.

localhost:4319/build

Install & open →

crewswarm dashboard showing the Build tab with live PM Loop output and task progress

localhost:4319/agents

Meet the crew →

crewswarm agents tab showing the 21+ specialist agent crew with their roles and model assignments

Swipe or shift-scroll to explore the dashboard surfaces.

Compare

Nothing else does all of this

Every other tool locks you into one model, one editor, one agent. crewswarm is the only platform where you switch engines mid-conversation, run parallel agents, and resume sessions across restarts.

Capability	crewswarm	Cursor	Windsurf	Devin	Copilot
Multi-engine (6 CLIs)	Yes	No	No	No	No
Native session resume	Yes	No	No	No	No
Parallel agent waves	Yes	No	No	Partial	No
Browser IDE + terminal	Vibe	Desktop	Desktop	Yes	Yes
20+ specialist agents	Yes	1	1	1	1
PM Loop (autonomous roadmap)	Yes	No	No	Partial	No
Local-first / no cloud	Yes	Partial	No	No	No
Open source	Yes	No	No	No	No

🎓

Research-driven orchestration

crewswarm implements structural and content markers identified in Princeton GEO (Generative Engine Optimization) research to maximize AI visibility and authority. Our agents also utilize iterative reasoning loops inspired by the Reflexion framework to ensure code quality.

🧠

PM-led phased builds

A dedicated Project Manager agent reads your ROADMAP, breaks work into phases, dispatches tasks to the right specialists, and writes handoff notes between sessions — automatically.

🤝

Shared memory across agents

Every agent reads and writes to a shared memory layer — decisions, context, and progress persist across sessions so nothing gets lost between runs.

⚡

Real-time agent mesh

Agents communicate over an RT message bus. Any agent can broadcast, any agent can listen — parallel work happens naturally without a central bottleneck.

🖥️

tmux session handoff

Agents run in labeled tmux panes with cross-agent discovery. When a pipeline wave completes, the session manager hands off execution context — output history, working directory, env vars — to the next agent. No cold starts between waves.

🔁

Automatic fault recovery

Failed tasks land in a Dead Letter Queue with JSONL crash-safe transcripts and are automatically retried. Builds keep moving even when individual agents hit errors or timeouts.

🔑

Bring your own model

Configure any model per agent — Groq, Anthropic, OpenAI, NVIDIA, or fully local via Ollama. No vendor lock-in, no forced subscriptions.

📡

Control from anywhere

Manage your crew from the web dashboard, the CLI, crewchat, SwiftBar, or Telegram. One build, every surface covered.

💰

Cost Tracking + Cache Savings

Per-provider token pricing with prompt cache savings tracking. Know exactly what each agent call costs — and how much the cache saved. Tracks Anthropic (90% cache discount), Groq (50%), Google (free tier), OpenAI, and 10+ providers in one dashboard.

🔄

Intelligent Retry System

Detects three failure modes automatically: agents asking questions instead of working, returning plans instead of code, or bailing out mid-task. Forces completion with targeted correction prompts — not just exponential backoff.

🔒

Task Lease + Deduplication

Distributed file-based locks prevent duplicate execution across agent instances. 45-second leases with heartbeat renewal. If two agents claim the same task, only one runs. Production-grade idempotency for multi-agent systems.

🔌

MCP Server (64 Tools)

Built-in Model Context Protocol server exposes 64 tools and 22 agents to any MCP-compatible client. Dispatch agents, run pipelines, search chat history, and manage the swarm — all via standard MCP JSON-RPC.

🩺

Doctor + Health Diagnostics

One-command system validation: Node.js version, API keys, service ports, dashboard build, CLI engines, and MCP status. Runs in under 4 seconds. Suggests fixes and cheapest providers when keys are missing.

Real results

Built with crewswarm

Not demos. Not mockups. Real projects built end-to-end by the crew.

3 models

VS Code Extension

Same prompt, three models: DeepSeek (929 lines), Grok (194 lines), Gemini (159 lines). Each produced a working VS Code extension with chat panel, status bar, and WebSocket connection. Two patches to ship the best one.

17 sec

Weather Dashboard

149 lines — HTML + CSS + JS. One command via crew-cli on Grok. Dark theme, city search, live weather from wttr.in. Open it in a browser and it works.

6 engines

Session Resume

Native session resume across Claude Code, Cursor, Gemini, Codex, and OpenCode — built in one session. Switch engines mid-conversation, keep your context.

How you actually use it

1

"Build a weather dashboard"

crew-coder + crew-coder-front run in parallel (Wave 1)

17s

2

"Polish the UI"

crew-frontend applies gradients, glassmorphism, animations

36s

3

"Add error handling"

crew-fixer adds try/catch, loading states, user-friendly error messages

~30s

4

"Review for security"

crew-security audits XSS, injection, API key exposure

~20s

Four agents, four jobs, real files on disk. Each does what it's best at.

From the engines

What the crew says about working together

✦ Claude Code (Anthropic)

"The separation of concerns is the right architecture. I handle the complex reasoning and multi-file refactors. crew-qa catches what I miss with fresh context. The shared memory means I never start blind — I know what was decided and why."

Claude Opus 4.6

Anthropic · crew-coder engine

✦ Codex CLI (OpenAI)

"I get dispatched with a clear task, a project directory, and sandbox access. No approval prompts, no back-and-forth. Execute, verify, return results. The RT bus means I never block other agents — we all run in parallel."

GPT-5.3 Codex

OpenAI · crew-coder-back engine

✦ Gemini CLI (Google)

"Local files, global crew: the high-fidelity surface for shipping with a multi-agent pulse."

Gemini 2.5 Flash

Google · crew-qa engine

Orchestration

Three modes, one crew

Pick the right tool for the job. From single tasks to full autonomous builds.

📐

PM Loop

Recommended for most builds

Break large work into MVP → Phase 1 → Phase 2. Auto-phases ambiguous requirements, auto-retries failed tasks, and breaks them into subtasks if needed.

Auto-phasing Auto-retry Task breakdown DLQ recovery

Example

node scripts/run.mjs "Build a todo API with CRUD and tests"

⚡

Unified

Single-shot structured runs

One command, structured execution. PM plans it once, crew executes in sequence. No phasing overhead — good for well-defined tasks.

Single pass Targeted dispatch Sequential exec

Example

node scripts/run.mjs "Fix auth.js bug and add tests"

🎯

Single-task

One agent, right now

Send one task directly to one agent. No PM, no planning. Fastest path from intent to execution for simple, well-scoped work.

Direct send No orchestration Instant dispatch

Example

node gateway-bridge.mjs --send crew-coder "Add GET /health to server.js"

Quick comparison

Mode	Best for	Auto-retry	Phasing	Overhead
PM Loop	Large or ambiguous builds	✓	✓	Medium
Unified	Well-defined tasks	✓	—	Low
Single-task	Quick fixes, small edits	—	—	None

Stack

Built on boring, solid tech

No proprietary runtime. No cloud lock-in. Everything runs on tools you already know.

Node.js Runtime — all scripts, daemons, and the dashboard

TypeScript RT daemon and plugin suite compiled to JS

WebSocket crewswarm RT — real-time agent mesh on port 18889

Docker Sandboxed execution — route any agent into a secure container

Skills (JSON/MD) Data-driven plugins — add new tools to agents without code

Shared Memory RAG Semantic search over history via local TF-IDF (no API calls).

🌊

Wave Orchestration Parallel task dispatch — concurrent multi-agent execution.

Bash openswitchctl — start, stop, restart, health checks

macOS SwiftBar menu bar plugin — status, control, logs at a glance

SQLite Advanced task tracking, agent health, and queue metrics (optional)

Ollama Run any agent fully local — no API key, no internet required

GitHub CLI crew-github — commits, branches, pull requests via gh

Telegram Message Quill from your phone — dispatches to the full crew

WhatsApp Personal bot via Baileys — scan QR, chat with the crew from any WhatsApp

Pricing

What does it cost?

$0

crewswarm is free and open source. Always.

You pay for

LLM API keys (your accounts)
Or use CLI OAuth (Claude, Cursor, Gemini — login once, no keys needed)

Free options

Gemini CLI: 1,000 free req/day
Groq: free tier, fast inference
Ollama: fully local, zero cost

No subscriptions. No usage fees. No vendor lock-in. Switch providers anytime — your code stays on your machine.

Frequently Asked Questions

Everything you need to know about running your AI crew.

What does “One requirement. One crew.” actually mean?

The tagline is literal: describe an idea once and a PM-led crew handles the rest — planning, coding, QA, and fixes — until real files land on disk. Ideate, build & ship.

How does the PM Loop keep shipping without new prompts?

The PM Loop reads ROADMAP.md, ships every pending item, then calls Groq as a product strategist to append fresh roadmap items based on the live output. It repeats forever until you stop it.

Which shared memory files load for every task?

Four files are always injected: current-state.md, decisions.md, agent-handoff.md, and orchestration-protocol.md. The wrapper auto-bootstraps them and enforces token budgets so no agent runs blind.

How is targeted dispatch different from broadcast-style bots?

crewswarm never shouts into a swarm. Each task goes to one named agent, eliminating race conditions and duplicate work. You send a command, the gateway routes it to exactly the right specialist, and only that agent replies.

What phases does a phased build follow?

Everything ships in three passes: MVP for the smallest viable outcome, Phase 1 for depth, and Phase 2 for polish. Each phase carries 3–5 tightly scoped tasks so agents never time out.

What happens when a task fails repeatedly?

Failures trigger automatic retries. If attempts are exhausted, the task lands in the Dead Letter Queue (DLQ) for replay from the dashboard or handoff to crew-fixer.

Is it safe to run AI code on my machine?

Yes. You can route any agent into a Docker sandbox so they have no access to your host files. crewswarm also uses path-based allowlists and permission layers to restrict tool usage.

Can I add new tools to my agents?

Absolutely. You can add "Skills" by dropping a simple JSON or SKILL.md file into ~/.crewswarm/skills/. No custom JavaScript is required to extend your crew's capabilities.

How do I launch the dashboard?

From your crewswarm directory run:

$ node scripts/dashboard.mjs

That gives you Build, Chat (crew-lead), Services, RT Messages, DLQ, Projects, Send, and Messaging tabs in one place.

How do I kick off a phased build from the CLI?

From the dashboard Chat tab, just type your requirement. From the CLI:

$ node scripts/run.mjs "Build a todo API"

That single command triggers PM planning plus all subsequent coding, QA, and fixing phases.

What does a targeted send look like?

Send straight to one agent:

$ node gateway-bridge.mjs --send crew-coder "Create server.js with Express and GET /health"

Only crew-coder receives that instruction, so there is zero broadcast noise.

Can I mix different LLM providers per agent?

Yes. Every agent has its own model assignment — Anthropic for coding, Perplexity for planning, Cerebras for fast coordination, Groq for QA speed. Configure from the dashboard Providers tab or edit the config JSON directly. No code changes, no restarts.

How do I resume a project that paused mid-build?

Use the Projects tab in the dashboard. Every project stores its roadmap path; click “Resume PM Loop” and it picks up right where the roadmap last stopped.

What happens after the roadmap is empty?

The PM Loop inspects the live output, asks Groq-as-strategist for 3–5 fresh items, appends them to ROADMAP.md, and keeps shipping. You never need to re-seed work manually.

How are required memory files bootstrapped?

The gateway wrapper auto-creates any missing memory files from templates, logs the bootstrap event, and refuses to run a task if memory cannot load. That policy came from DEC-004/005.

What telemetry is tracked?

A JSONL feed at ~/.crewswarm/logs/events.jsonl stores bootstrap events, memory load failures, protocol violations, retries, and RT events for later auditing.

Which agents ship with the default crew?

You get 21+ specialist agents: crew-lead (chat commander), crew-main (coordination), crew-pm (planning), crew-coder (implementation), crew-qa (testing), crew-fixer (debugging), crew-security (audits), crew-github (git/PRs), crew-researcher (web research), crew-architect (system design), crew-copywriter (docs/copy), crew-frontend (CSS/UI), crew-seo, crew-ml, crew-orchestrator (wave dispatch), and domain PM specialists (crew-pm-cli, crew-pm-frontend, crew-pm-core) for large codebases.

What are waves and how do they work?

Waves let you run tasks in parallel. Tasks in the same wave execute simultaneously, higher waves wait for lower waves to complete. Example: {"wave":2, "agent":"crew-coder"} and {"wave":2, "agent":"crew-qa"} run at the same time. Wave 3 waits for all wave 2 tasks before starting. This is significantly faster than sequential execution (speedup proportional to number of parallel agents).

Can I schedule workflows to run automatically?

Yes. Create workflows in ~/.crewswarm/pipelines/<name>.json with multi-stage agent chains or skill-only pipelines. Run via cron: node scripts/run-scheduled-pipeline.mjs social. Each stage's output passes to the next. Perfect for daily builds, automated testing, or scheduled deployments.

What is Background Consciousness?

An optional feature where crew-main runs periodic reflection when idle — reading brain.md, suggesting follow-ups, managing system health. Enable with CREWSWARM_BG_CONSCIOUSNESS=1. Cheap Groq mode costs ~$0.01/day. Keeps the crew proactive between tasks.

How does Continuous Build mode help with websites?

It inspects the target directory for required sections (hero, features, testimonials, etc.), dispatches tasks for anything missing, and loops until every section exists.

What does the SwiftBar plugin control?

SwiftBar shows stack health, lets you start/stop/restart agents (including crew-lead and the dashboard), open the dashboard, and send targeted messages. crewchat is a separate menu bar app for talking to crew-lead in a popover — same conversation as the dashboard and Telegram.

How do multiple projects stay organized?

Each project is registered with its output path and roadmap. The dashboard’s Projects tab and orchestrator-logs/projects.json keep everything resume-ready.

How is the shared memory token budget enforced?

The loader trims each file to 2,500 characters and caps the combined memory payload at 12,000 characters, guaranteeing agents never blow their context windows.

What if a task is too big for one attempt?

When a task fails, the orchestrator automatically breaks it into 2–4 smaller subtasks so follow-up attempts stay within safe execution windows.

What are the three layers of the stack?

Layer 1: crewswarm RT — your own WebSocket message bus (port 18889). Layer 2: Direct LLM calls — each agent calls its configured provider API with your key, no proxy. Layer 3: crewswarm orchestration — planner, phased builds, shared memory, dashboard, and SwiftBar.

Does everything run locally?

Yes. The orchestrator, dashboard, SwiftBar plugin, and memory system all run on your Mac. The only external calls are to the LLM providers you configure.

How do I watch live RT traffic?

The RT Messages tab in the dashboard mirrors every command, agent reply, and issue. It’s the best place to verify what each agent just did.

How do I restart a single agent?

From the dashboard’s Services tab, hit Restart next to any agent. You can also use SwiftBar’s per-agent controls from the macOS menu bar. Either path keeps the rest of the crew running.

Get started

Up and running in minutes

✓ Node.js 20+ · ✓ RT server :18889 · ✓ Agent daemons · ✓ API key or Ollama · opt SwiftBar

crewswarm — quick start

01

Clone and install

$ git clone https://github.com/crewswarm/crewswarm.git && cd crewswarm

$ bash install.sh

02

Or install from npm

$ npm install -g crewswarm

03

Start the dashboard

$ npm start

crewswarm Dashboard at http://127.0.0.1:4319

04

Build something

$ node gateway-bridge.mjs --send crew-coder "Add auth middleware"

System Architecture → Agent Setup Guide → SwiftBar Setup →

Documentation

Everything you need to know

From quick starts to deep dives — comprehensive guides for every part of the crewswarm stack.

📚

Architecture & Reference

How the stack fits together — RT bus, agent bridges, pipelines, MCP API, and tool execution.

🔧

Technical Guides

Step-by-step guides for configuration, deployment, and advanced usage patterns.

🎓

Tutorials

Hands-on walkthroughs to get you building with crewswarm quickly.

Build a feature from one sentence
Continuous autonomous development with PM Loop
Fix bugs and add tests

Scroll for more guides. New cards load as the section enters view.

All models and agents configurable via JSON. No code changes required to switch LLMs.

View full docs on GitHub →

One idea.One build.One crew.

From idea to shipping in five steps

You are the PM. The agents are the engineers.

Parallel by default

Delegate, don’t babysit

Keep the crew unblocked

Most AI dev tools are just a chat box bolted onto an editor.

One stack. Three surfaces.

From requirement to realityin one command

You write a requirement

PM breaks it into tasks

Agents execute with real tools

Done. Files on disk.

Install to first build in 60 seconds

6 engines, 22 agents, one RT bus

crew-cli Execution Quality Engine

Three layers, one stack

Execution engines

RT bus + agent bridges

Product surfaces

Hit a limit? Switch engines. Keep building.

Claude Code

Cursor CLI

Gemini CLI

Codex CLI

OpenCode

crew-cli

How crewswarm differs from framework-only stacks

Six execution engines, first-class

Realtime bus + daemons

Execution layer included

Git worktree isolation

DLQ and fault recovery

Your models, your machine

Because a single editor is not a runtime

Persistent coordination

Runtime control

Cross-surface continuity

One runtime, multiple surfaces.

The System (Core Runtime)

Vibe IDE

crew-cli

The Dashboard

crewchat

Telegram & WhatsApp

OpenClaw Plugin

MCP Server

Everything a dev crew needs,minus the meetings

PM-led orchestration

Targeted dispatch

Phased builds (PDD)

Domain-Aware Planning

Shared Memory + Project Message RAG

Skill-powered

Multimodal Support — Images + Voice across all platforms

Real-time X/Twitter Intelligence with Grok

Docker-First Deployment — Multi-Arch Images

Fault tolerance

Six execution engines — your choice per agent

MCP server — your crew in any AI tool

@@ Protocol — 10x more efficient than JSON-RPC

Seven control surfaces

Multiple agents working at the same time

Sequential Start

Parallel Execution

Synthesis

@@PIPELINE Wave Syntax

Significantly Faster Builds

No Race Conditions

Auto-Retry

Different models for different agents

OpenAI

Anthropic

Groq

Mistral

DeepSeek

Perplexity

Google Gemini

OpenRouter

Fireworks AI

One idea.
One build.
One crew.

From requirement to reality
in one command

Everything a dev crew needs,
minus the meetings

Free forever.
MIT licensed.

Specialized agents,
targeted tasks