ACW #11: Kimi K2.5, Qwen3-Max-Thinking, and Mistral Vibe 2.0

Welcome back to Agentic Coding Weekly. Here are the updates on agentic coding tools, models, and workflows worth your attention for the week of Jan 25-31, 2026.

1. Tooling and Model Updates

Kimi K2.5

Moonshot AI released Kimi K2.5, a native multimodal agentic model that's now the best open-weights model for coding. It’s a MoE model with 1T total params / 32B active and a 262k context window.

The model is pre-trained on vision-language tokens, so it handles visual knowledge, cross-modal reasoning, and tool use grounded in visual inputs natively. It can generate code from UI designs or video workflows and orchestrate tools for visual data processing autonomously.

There's also an "Agent Swarm" capability where K2.5 decomposes complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents. Moving from single-agent scaling to coordinated swarm-like execution.

Scores 76.8% on SWE-bench Verified. For context, GLM 4.7 sits at 73.8% and Minimax M2.1 at 74% among open-weights models. On the proprietary side, Gemini 3 Pro is at 76.2%, GPT-5.2 Thinking at 80%, and Opus 4.5 leads at 80.9%.

Available via API at $0.6 / $3 per million input / output tokens with a 262k token context window.

Early user reports on Hacker News suggest performance is closer to Sonnet 4.5 than Opus 4.5, but still impressive given the pricing and open-weights nature.

Read the announcement and the technical report.

Qwen3-Max-Thinking

Proprietary reasoning model from the Qwen team, positioned as a model that competes with top proprietary options and the newest open-weights releases for coding. Scores 75.3% on SWE-bench Verified at $1.2 / $6 per million input / output tokens. Read the announcement.

Mistral Vibe 2.0

V2 release of Vibe CLI, the CLI coding agent from Mistral. It now supports custom subagents (e.g., PR review, test generation), slash-command skill invocation, and custom agent configurations beyond the built-ins (default/plan/accept-edit/auto-approve).

Claude Code Updates

Keybindings are now configurable. Run /keybindings to create or open your config at ~/.claude/keybindings.json. Seems to be rolling out slowly since it's not available to me yet. I've always wanted to reconfigure Enter for new lines instead of sending messages, so this is a nice QoL improvement.

Boris Cherny, creator of Claude Code, shared how the Claude Code team uses Claude Code. Lots of useful bits in the thread, but the one I really like is "turn whatever you find yourself doing multiple times a day into a skill". Could be tech debt cleanup, testing changes in dev, or generating a report from the last 7 days of Slack/Jira/GitHub activity. This is a nice followup to his tweet last month where he shared his personal Claude Code setup.

2. Community Picks

Notes from Claude Coding by Andrej Karpathy

Long tweet worth reading in full. Some of my notes from the tweet:

Went from 80% manual coding to 80% agent-based coding in ~4 weeks in December
"No IDE needed" and "agent swarm" hype is premature; for code you care about, watch changes in an IDE like a hawk
Models no longer make syntax errors but subtle conceptual mistakes like wrong assumptions, no clarification-seeking, over-complicated implementations, bloated abstractions
Instead of speedup, main gain is expansion: coding things that weren't worth the effort before, and working on code previously blocked by knowledge/skill gaps
Already noticing atrophy in his ability to write code manually

AGENTS.md Outperforms Skills in Vercel's Evals

Vercel packaged Next.js documentation as a skill but noticed skills weren't getting invoked where they should have been. No matter what instructions they changed, benchmarks didn't improve past a certain point.

Instead, they put the docs index directly in AGENTS.md by adding brief description of docs and where to find them. Doing what skills do through their frontmatter, but in AGENTS.md, produced much better results. Read the post.

Claude Code Opus 4.5 Performance Tracker

People complain a lot on Reddit and HN that models get nerfed or dumbed down, or that sometimes they're secretly serving quantized models. These complaints aren't always invalid. There was an issue with Claude Code back in December and another one last week.

This is an independent tracker to detect statistically significant degradations in Claude Code with Opus 4.5 performance on SWE tasks. Runs daily benchmarks against a curated subset of SWE-Bench-Pro. View the tracker and read the HN discussion.

One Human + One Agent = One Browser From Scratch

Case study where the author collaborated with an AI agent to build a functional web browser engine from scratch in 3 days. After Cursor's browser experiment about a week ago, this was a much better showcase of building a browser from scratch. Read the post and the HN discussion.

How AI Assistance Impacts Coding Skills

A randomized controlled trial from Anthropic of 52 (mostly junior) software engineers found that using AI assistance led to a statistically significant decrease in mastery.

Developers using AI scored 17% lower on mastery quizzes, particularly in debugging, when learning a new Python library (Trio). While AI slightly accelerated task completion, the productivity gain was not statistically significant. Read the study and the HN discussion.

3. Weekly Quiz

The theme this week is Agent Skills.

Question 1: What is the minimum required file structure for a valid Agent Skill?
A) A directory containing SKILL.md and README.md
B) A directory containing only SKILL.md
C) A directory containing SKILL.md and a scripts/ folder
D) A directory containing config.yaml and SKILL.md

Question 2: Which of the following is NOT a standard optional subdirectory in the Agent Skills specification?
A) scripts/
B) references/
C) hooks/
D) assets/

Question 3: In the progressive disclosure model, when does an agent load the full SKILL.md body?
A) At startup, along with all other skills
B) When the skill is activated
C) Only when explicitly requested by the user
D) After all reference files are loaded

Question 4: Which of the following is a valid skill name?
A) pdf_processing
B) pdfTools
C) code.review
D) code-review

Question 5: How should you reference other files within your skill?
A) Use absolute paths from the system root
B) Use relative paths from the project root where the skill is installed
C) Use relative paths from the skill root
D) Use path aliases defined in the frontmatter metadata

That’s it for this week. I write this weekly on Mondays. If this was useful, subscribe below: