Welcome back to Agentic Coding Weekly. Here are the updates on agentic coding tools, models, and workflows worth your attention for the week of Dec 14 - 20, 2025.

Tooling and Model Updates

Gemini 3 Flash

Google released Gemini 3 Flash. It comes close to 3 Pro on most benchmarks and actually beats it on a few, including SWE-bench Verified where Flash scored 78.0% versus Pro's 76.2%. For context, GPT-5.1 Codex Max sits at 77.9% and Opus 4.5 leads at 80.9%.

Pricing is $0.5 / $3 per million input / output tokens. That's a 66% increase on input and 20% on output compared to 2.5 Flash ($0.3 / $2.5). Similar jump to what we saw going from 2.5 Pro to 3 Pro.

Read the announcement and the model card.

Nemotron 3 Nano

Nvidia released Nemotron 3 Nano 30B-A3B, a Mixture-of-Experts hybrid Mamba-Transformer model. It beats similar-sized models like Qwen3-30B-A3B-Thinking-2507 and GPT-OSS-20B on SWE-Bench (OpenHands). Supports context lengths up to 1M tokens.

Notably, unlike most "open" models recently, Nvidia released the weights, training recipe, and data alongside the model.

Support for 3 Nano has been added to llama.cpp and ollama, so you can start running it locally for simple coding tasks.

Quick Updates

  • OpenAI released GPT‑5.2‑Codex, a GPT‑5.2 variant optimized for agentic coding. Available now for paid ChatGPT users. Not available through API yet.

  • Codex CLI and GitHub Copilot added support for Anthropic’s Agent Skills. Quick refresher, Skills let you extend AI agents like Claude Code by writing specialized knowledge and workflows in markdown files with optional scripts and assets. Agents read basic descriptions at startup and load the rest only when needed, keeping context usage efficient. Anthropic also released Skills as an open standard.

  • You can now use Claude Code with OpenRouter, including with non‑Anthropic models. Read setup instructions.

Workflow of the Week

Elevator Music Plugin for Claude Code plays soothing elevator music while Claude Code waits for user input. See demo and installation guide in the readme.

If you're on Mac and want something simpler, this HN comment shows how to add a notification hook using osascript.

osascript -e 'display notification "Ready" with title "Claude Code" sound name "Glass"'

I often get distracted when Cluade Code is "thinking" and start browsing Reddit or Hacker News and Claude Code keeps waiting for my input when it's done. Playing a sound makes it impossible to miss and is enough to get me to close those Reddit tabs.

Subscribe to get weekly updates delivered to your inbox:

Community Picks

2025 LLM Year in Review by Andrej Karpathy

Self explanatory. Just go read it.

I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours

Good showcase of AI agent capability for complex, spec-compliant code generation with minimal supervision when the task is constrained by a robust, implementation-independent test suite. Read more.

AI Coding Reports on Productivity and Quality

Greptile's study on recent trends in AI software development highlights significant productivity gains: median PR size increased 33% and lines of code per developer grew from 4,450 to 7,839 per month. The report page design is quite beautiful, worth a look just for that. Read report.

CodeRabbit analyzed 470 open-source GitHub pull requests, 320 AI-co-authored and 150 human-only. Their findings: AI PRs show roughly 1.4–1.7× more critical and major issues. AI-generated code often omits null checks, early returns, guardrails, and comprehensive exception logic. Read report.

Your job is to deliver code you have proven to work

Argues that AI-assisted coding has led to a problematic trend: submitting large, untested pull requests. The core point is that a developer's primary duty is to deliver code that's proven to work, not shift the burden of verification to reviewers.

Key quote: "We need to deliver code that works—and we need to include proof that it works as well. Not doing that directly shifts the burden of the actual work to whoever is expected to review our code."

Prediction: AI will make formal verification go mainstream

Martin Kleppmann suggests that formal verification, historically restricted to high-stakes research due to the PhD-level effort required, might finally go mainstream thanks to LLMs.

For industrial software, he argues that the expected cost of bugs is lower than the expected cost of using the proof techniques that would eliminate those bugs. But now, LLM-based coding assistants are getting pretty good at writing proof scripts making formal verification vastly cheaper. Read more.

That’s it for this week. I’ll be back next Monday with the latest agentic coding updates.

Reply

or to participate

Keep Reading

No posts found