Welcome back to Agentic Coding Weekly. Here are the updates on agentic coding tools, models, and workflows for the week of May 17-23, 2026.
Executive Summary:
Gemini 3.5 Flash launched. Beats 3.1 Pro on most benchmarks, but the pricing is now close to Pro-tier pricing.
Antigravity 2.0 and Antigravity CLI announced at Google I/O. Gemini CLI sunsets June 18, 2026.
Composer 2.5 released by Cursor. Fine-tuned Kimi K2.5 that benchmarks close to Opus 4.7 and GPT 5.5 but much cheaper.
Qwen 3.7 Max released. Scores 69.7% on Terminal Bench 2.0.
DeepSeek V4 Pro discount now permanent. $0.435 / $0.87 per million input / output tokens is the new standard pricing.
Worth reading: Simon Willison’s six-month LLM recap, workflows for using LLMs as staff engineer, and a token-speed visualizer.
Meanwhile, Codex, Claude Code, and OpenCode all have slightly different ways to resume, update, and run in yolo mode, which annoys me to no end. If it bothers you as well, setting these 12 aliases on all the machines you work on should do it.

1. Tooling and Model Updates
Gemini 3.5 Flash
Google's new Flash model beats Gemini Gemini 3.1 Pro on almost all benchmarks. Priced at $1.5 / $9 per million input / output tokes, it's 3x more expensive than last flash model, Gemini 3 Flash, and close to 3.1 Pro's $2 / $12. Read the announcement.
It's quite fast (over 200 tokens per second) but also more verbose. To run the Artificial Analysis test suite, it costs $1551 for 3.5 Flash vs $892 for 3.1 Pro, while 3.5 Flash still ranked lower thank 3.1 Pro.
Coding benchmark comparison:
Model | Terminal-Bench 2.1 | SWE-Bench Pro |
|---|---|---|
Gemini 3.5 Flash | 76.2% | 55.1% |
Gemini 3.1 Pro | 70.3% | 54.2% |
GPT-5.5 | 78.2% | 58.6% |
Claude Opus 4.7 | 66.1% | 64.3% |
Antigravity 2.0 and Antigravity CLI
Alongside Gemini 3.5 Flash, Google also launched Antigravity 2.0, a parallel agent manager similar to Claude and Codex desktop. No IDE inside 2.0.
Antigravity CLI (closed source) which uses same harness as Antigravity 2.0 is replacing Gemini CLI. Gemini CLI will stop working from June 18, 2026.
The 2.0 auto-update was poorly handled and broke people's workflow without any warning.
Composer 2.5
Cursor's further fine-tuned version of Kimi K2.5. Benchmarks put it close to Opus 4.7 and GPT 5.5 but priced much lower at $0.50/M input and $2.50/M output tokens. Context window is 200k tokens. Read the announcement.
Benchmark | Composer 2.5 | Opus 4.7 | GPT-5.5 | Composer 2 |
|---|---|---|---|---|
Terminal Bench 2.0 | 69.3% | 69.4% | 82.7% | 61.7% |
SWE-Bench Multilingual | 79.8% | 80.5% | 77.8% | 73.7% |
Quick Updates
Qwen 3.7 Max is the latest release in the Max series, the proprietary variant of the Qwen models. It scores 69.7% on Terminal Bench 2.0 and 60.6% on SWE Bench Pro. Pricing is $2.5 / $7.5 per million input / output tokens, and the context size is 1 million tokens.
DeepSeek V4 Pro’s 75% API discount was originally scheduled to run until May 31. That discount is now permanent. API pricing stays at $0.435 / $0.87 per million input / output tokens.
2. Community Picks
The Last Six Months in LLMs in Five Minutes
Annotated slides from Simon Willison's 5-minute lightning talk at PyCon US 2026. Found a phone recording of the talk on YouTube. Per Simon, the two main themes: coding agents got really good, and local models wildly outperform expectations.
How I Use LLMs as a Staff Engineer in 2026
Compared to a year ago, the author details what he uses AI for now and what he still doesn't. The main shift per the author is that agents are now good enough to create entire PRs for him in most cases. Read the post.
How Fast Is N Tokens Per Second Really?
A simple HTML page that visualizes token generation speed. You can move a slider from 0.05 to 2000 tokens per second and see what that speed feels like in practice. Checkout the tool.
That’s it for this week. I write this weekly on Mondays. If this was useful, subscribe below:




