2025 was a busy year for agentic coding. We started the year with the disruption of DeepSeek R1 and the "vibe coding" being coined by Andrej Karpathy, and we ended it with models finally crossing the 80% threshold on SWE-bench Verified.
We went from copy‑pasting snippets out of a chat window into our editor to running CLI agents that can navigate a real repo, edit multiple files, and carry changes through a non-trivial codebase.

To keep track of it all, I've put together a list of milestones from the year below, along with a chart showing how model performance on coding benchmarks improved throughout the year.
Milestones
Jan 20: DeepSeek R1, an open-weights model that matched the capabilities of the frontier LLMs, was launched. Caused a major disruption in the AI world and was considered "Sputnik Moment" for AI.
Feb 2: Andrej Karpathy coined the phrase "vibe coding" in a tweet that dominated the entirety of 2025.
❝There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists ...
— Andrej KarpathyFeb 25: Claude 3.7 Sonnet and Claude Code launched. 3.7 Sonnet was the first hybrid reasoning model.
Mar 25: Gemini models finally became good with 2.5 Pro (Experimental 03-25) launch.
Apr 16: o3 and o4-mini along with Codex CLI launched.
May 23: Claude Opus 4 and Sonnet 4 launched. Both of these models crossed the 70% score on SWE-bench Verified and became the first models to do so.
Jun 4: Cursor 1.0 release.
Jun 19: Software in the era of AI keynote by Andrej Karpathy. Kicked off the discussion of software 3.0 and rewatch of the Rain Man movie.
Jun 25: Gemini CLI launch.
Jul 14: Kiro IDE launch.
Jul 22: Qwen3-Coder-480B-A35B-Instruct launched along with Qwen Code, forked from Gemini CLI.
Jul 28: GLM-4.5 and GLM-4.5-Air launched.
Aug 5: gpt-oss-120b and gpt-oss-20b, open-weights model release from OpenAI.
Aug 6: Claude Opus 4.1 launch.
Aug 7: Long awaited GPT-5 launch to combine the GPT and o series of model from OpenAI.
Aug 12: Claude Sonnet 4 starts supporting 1M tokens of context through API.
Sep 15: GPT‑5-Codex launch.
Sep 30: Claude Sonnet 4.5 launch.
Oct 16: Agent Skills launch from Anthropic. Has now been adopted or being adopted by other coding agents like Codex, Gemini CLI, OpenCode, etc.
Nov 1: OpenCode 1.0 release.
Nov 6: Kimi K2 Thinking launch. Became the first open-weights model to cross the 70% score on SWE-bench Verified.
Nov 12: GPT-5.1 launch.
Nov 18: Gemini 3 and Antigravity AI IDE from Google.
Nov 19: GPT-5.1-Codex-Max launch.
Nov 25: Claude Opus 4.5 launch. The first model to cross the 80% score on SWE-bench Verified.
Dec 1: DeepSeek-V3.2 release.
Dec 9: Devstral 2 and Vibe CLI launch from Mistral.
Dec 11: GPT-5.2 launch.
Dec 17: Gemini 3 Flash launch. Surpassed Gemini 3 Pro on SWE-bench Verified.
Dec 18: GPT-5.2-Codex launch. Along with Opus 4.5 and Gemini 3 Pro, it's the strongest model available for agentic coding at the end of 2025.
Dec 22: GLM-4.7 launch.
Dec 23: MiniMax M2.1 launch. Together with GLM-4.7, it's the strongest open-weights available for agentic coding at the end of 2025.
Progress on SWE-bench Verified

That’s it for 2025. Keep reading the weekly issues to stay on top of model releases, tooling updates, and workflows throughout 2026.



