Welcome back to Agentic Coding Weekly. Here are the updates on agentic coding tools, models, and workflows for the week of Apr 19-25, 2026.
Executive Summary:
Kimi K2.6 and DeepSeek-V4-Pro became the first open-weight models to cross 80% on SWE-bench Verified.
DeepSeek, OpenAI, Moonshot, Qwen, and Xiaomi all shipped new models in the same week.
Zed added support for running multiple coding agents in parallel from the same editor window.
SpaceX has secured an option to acquire Cursor for $60B or pay $10B for a partnership.
Anthropic published a postmortem on recent Claude Code quality issues.
Roo Code shutting down May 15th.
Worth reading: coding models over-edit problem, and using AI to finish abandoned side projects.
1. Tooling and Model Updates
Current State of Benchmarks
While SWE-bench Verified is not a perfect benchmark for coding, Kimi K2.6 (80.2%) and DeepSeek-V4-Pro (80.6%) released last week are among the first open-weight models that have crossed the threshold of 80% on this benchmark. Back in Nov 2025, Opus 4.5 (80.9%) was the first proprietary model to do so.
Here's where things stand on the SWE-bench Pro and Terminal-Bench 2.0 benchmarks across newly released and existing frontier models:
Model | SWE-bench Pro | Terminal-Bench 2.0 | Pricing |
|---|---|---|---|
DeepSeek-V4-Pro-Max | 55.4% | 67.9% | $1.74 / $3.48 |
GPT-5.5-xhigh | 58.6% | 82.7% | $5 / $30 |
Kimi K2.6 | 58.6% | 66.7% | $0.95 / $4 |
Qwen3.6-27B | 53.5% | 59.3% | $0.6 / $3.6 |
Qwen3.6-Max-Preview | 57.3% | 65.4% | $1.3 / $7.8 |
MiMo-V2.5-Pro | 57.2% | 68.4% | $1.3 / $7.8 |
GLM-5.1 | 58.4% | 63.5% | $1.4 / $4.4 |
Claude Opus 4.7 | 64.3% | 69.4% | $5 / $25 |
Gemini 3.1 Pro | 54.2% | 68.5% | $2 / $12 |
Model names in bold are newly released last week.
DeepSeek V4
DeepSeek released the V4 series, two open-weight MoE models both supporting 1M token context. V4-Pro is the flagship: 1.6T total parameters, 49B active. V4-Flash is the lightweight option: 284B total, 13B active, at $0.14/$0.28. Check the announcement and the technical report.
GPT-5.5
OpenAI released GPT-5.5. Performance and pricing both have increased. On Terminal-Bench 2.0 and Artificial Analysis Intelligence Index, GPT-5.5-medium appears to match GPT-5.4-xhigh. Pricing has doubled from GPT-5.4, now $5/$30 per million input/output tokens. Check the announcement.
Kimi K2.6
Newest open-weight release from Moonshot AI. 1T total parameters, 32B active, 262K context window. Considering benchmarks and pricing $0.95/$4 per million tokens, it's the most competitive coding model whether open-weights or proprietary. Check the announcement.
Qwen3.6-27B and Qwen3.6-Max-Preview
Qwen team released two models. Qwen3.6-27B is a dense 27B open-weight model. It beats Qwen3.5-397B-A17B on most benchmarks. It scored 53.5% on SWE-bench Pro and 59.3% on Terminal-Bench 2.0. Qwen3.6-Max-Preview is the proprietary preview model. It scored 57.3% on SWE-bench Pro and 65.4% on Terminal-Bench 2.0.
MiMo-V2.5-Pro
Newest release from Xiaomi. Available via API with weights expected soon. Coding benchmark scores look comparable to current SOTA open-weight models and Opus 4.6 with 57.2% on SWE-bench Pro and 68.4% on Terminal-Bench 2.0. Check the announcement.
Zed: Parallel Agents in One Window
Zed added the ability to run multiple agents in parallel within a single window. All the agent threads are accessible through a new Threads Sidebar, whether they're working on the same repo, different repos, or across worktrees.
Previously we'd need multiple Zed windows, similar to juggling tmux windows/panes. Codex desktop and Claude desktop already support parallel agents, but they're not editors. Zed's advantage is that you can run different agent backends (Codex, Claude, Cursor, whatever) all from one place while still having full editor capabilities.
Anthropic's Claude Code Postmortem
Over the past couple of months, people have complained about quality degradation, usage limits, and sneaky changes to Claude Code behavior. Anthropic published this particular report on recent issues and what they have changed. Overall, a better way to communicate because they have at least written an official blog post and it's not just some random tweets.
Quick Updates
SpaceX has secured an option to either acquire Cursor for $60B later this year or pay $10B for a new partnership.
Anthropic A/B tested removing Claude Code from Pro ($20/month) plan for new signups. People were not happy with the lack of transparency and sneaky change.
GitHub Copilot made changes to individual plans. New signups are paused, Opus models are no longer available in the $10/month Pro plan, and Pro plans now have stricter usage limits.
Roo Code is shutting down on May 15th. Have fond memories of using this in VSCode from early last year.
2. Community Picks
Over-Editing Problem: Coding Models Are Doing Too Much
Over-editing refers to a model modifying code beyond what is necessary. Author explored this problem and benchmarked this and found that among the latest frontier models, GPT-5.4 over-edits the most and Opus 4.6 the least. The author also shows that reinforcement learning can train models toward more minimal edits without hurting general coding performance.
Use AI Coding to Finish Projects You Never Were Going To Finish
Paraphrasing the author: some personal projects you do to learn and grow, and some exist purely because you wish a solution existed. AI coding is well-suited for the second kind, the utilitarian projects where the learning isn't the point, the result is. The author details implementing a connector between YouTube Music and OpenSubsonic with Claude Code.
That’s it for this week. I write this weekly on Mondays. If this was useful, subscribe below:




