There is no single winner for every job. Some runtimes win on raw coding quality, some on polish, some on price, and some on system flexibility. The useful question is not "best overall?" but "best for this lane?"
CrewSwarm treats these as execution lanes inside a larger PM loop. That means you can pick the strongest engine for each lane instead of forcing one runtime to do every step.
Claude Code still feels strongest for raw autonomous coding and multi-step repo work.
Codex CLI is extremely strong on runtime discipline, tool use, and headless execution feel.
crew-cli wins on provider flexibility, failover, local-model mixing, and PM-loop compatibility.
These are practical ratings for agentic coding and tool execution, not benchmark scores.
| Runtime | Best at | Agentic coding | Tool use | Polish | System fit |
|---|---|---|---|---|---|
| Claude Code | Strongest raw coding lane | 9.5 | 9.5 | 8.8 | 9.0 |
| Codex CLI | Polished execution lane | 9.2 | 9.4 | 9.1 | 8.6 |
| Cursor CLI | Fast composer-style coding lane | 8.8 | 8.7 | 8.7 | 8.7 |
| Gemini CLI | Open official runtime with broad tooling | 8.2 | 8.8 | 8.3 | 8.8 |
| crew-cli | Portable execution engine for CrewSwarm | 8.1 | 8.6 | 8.2 | 9.7 |
| OpenCode | Open-source hacker workflow | 7.8 | 8.3 | 8.4 | 9.0 |
If you are giving one runtime a hard repo task, Claude Code and Codex still set the pace. They feel the most mature as autonomous solo coding lanes.
Once you are running a PM loop, the question changes. Provider failover, local-model mixing, runtime cost control, and execution portability matter as much as raw coding quality.
crew-cli is not trying to win purely as a vendor-native coding lane. Its strength is that it can keep work moving across providers, local models, and runtime modes without locking the system to one stack.
The point of CrewSwarm is not that one runtime wins forever. The point is that the PM loop can route each lane to the runtime that fits the task best.
You want the strongest raw single-agent coding lane and do not mind vendor dependence.
You want a polished execution lane with strong headless behavior and OpenAI-side workflow integration.
You like the composer workflow and want a strong practical coding lane tied to the Cursor ecosystem.
You want a broad open official tool surface and good value for execution-heavy work.
You want one portable execution engine that can survive rate limits, mix local and hosted lanes, and fit cleanly into CrewSwarm agents and Vibe.
You are no longer optimizing one agent. You are optimizing the whole engineering workflow.