ACW #13: GLM‐5, MiniMax M2.5, and GPT‐5.3‐Codex‐Spark

Welcome back to Agentic Coding Weekly. Here are the updates on agentic coding tools, models, and workflows worth your attention for the week of Feb 8-14, 2026.

Executive Summary:

GLM-5 from Z.ai scores 77.8% on SWE-bench Verified, taking the open-weight lead from Kimi-K2.5. 744B parameters (40B active), $1/$3.2 per million tokens.
MiniMax M2.5 hits 80.2% on SWE-bench Verified. First non-OpenAI/Anthropic model to cross 80%. Proprietary, no weights released.
GPT-5.3-Codex-Spark delivers 1000+ tokens per second. Research preview for ChatGPT Pro users. Scores 58.4% on Terminal-Bench 2.0.
Cursor releases Composer 1.5, their in-house coding model. Benchmarks between Sonnet 4.5 and Opus 4.6 on Terminal-Bench 2.0.
Worth reading: A novel edit tool implementation using line hashes that improved success rates across 15 models.

1. Tooling and Model Updates

GLM-5

Z.ai's latest open-weight release after GLM-4.7 in December. Scales from GLM-4.5's 355B parameters (32B active) to 744B parameters (40B active).

Scores 77.8% on SWE-bench Verified and 56.2% on Terminal-Bench 2.0, taking the open-weight lead from Kimi-K2.5 (76.8% and 50.8% respectively). For context, last week's proprietary releases: Opus 4.6 hit 80.8%/65.4%, GPT-5.3-Codex hit 77.3% on Terminal-Bench 2.0.

Available via API at $1/$3.2 per million input/output tokens. 200k context, 128k max output. Check the announcement.

MiniMax M2.5

MiniMax's latest model scores 80.2% on SWE-bench Verified and 51.7% on Terminal-Bench 2.0. First model outside OpenAI and Anthropic to cross the 80% threshold on SWE-bench Verified.

Two versions available: M2.5 at $0.15/$1.2 per million tokens, and M2.5 Lightning at $0.3/$2.4 with double the token throughput. Weights not released, proprietary only for now. Check the announcement.

GPT-5.3-Codex-Spark

OpenAI's research preview of a smaller, faster version of GPT-5.3-Codex. First model designed for real-time coding, delivering 1000+ tokens per second. First milestone from the Cerebras partnership announced in January.

Scores 58.4% on Terminal-Bench 2.0, compared to GPT-5.3-Codex at 77.3% and Sonnet 4.5 at 50%. 128k context, text-only.

Available for ChatGPT Pro users in the Codex app, CLI, and VS Code extension. Has separate rate limits during the research preview. Check the announcement.

Quick Updates

Composer 1.5: Cursor released an update to their in-house coding model. Limited benchmarks available, but Terminal-Bench 2.0 performance falls somewhere between Sonnet 4.5 and Opus 4.6.
Claude Code customization: Boris Cherny (Claude Code creator) posted a thread on how teams customize Claude Code, covering hooks, plugins, LSPs, MCPs, skills, effort levels, custom agents, status lines, and output styles.

2. Community Picks

I Improved 15 LLMs at Coding in One Afternoon. Only the Harness Changed.

The implementation of edit tool used by coding agents like Claude Code, Codex, and Cursor differs and there is no best solution.

Author, maintainer of oh-my-pi, implemented a novel approach using a simple hash for each line in the file and this improved success rates and reduced token usage by removing the edit failures. This approach matched or beat the replace the unique string approach taken by tools like Claude Code for most models.

Read the post and the HN discussion.

MicroGPT

Andrej Karpathy published code to train and run inference on GPT in 243 lines of pure Python. No dependencies, not even numpy. See the implementation on GitHub.

3. Weekly Quiz

CLAUDE.md is a markdown file that Claude Code loads at the start of every session. We use it to set coding standards, architecture decisions, preferred libraries, and conventions. These five questions cover what you should know about managing it in your projects.

Q1. Your team wants to share project-level Claude Code instructions via version control, but you also have personal sandbox URLs and test credentials you use locally. Where should you put your private, project-specific preferences?

A) ~/.claude/CLAUDE.md
B) ./CLAUDE.md with a comment marking them as personal
C) ./CLAUDE.local.md
D) ./.claude/rules/personal.md

Q2. Your project's instructions are getting big. What's the most maintainable way to structure them without creating a single massive CLAUDE.md?

A) Split topic-specific guidance into ./.claude/rules/*.md
B) Keep everything in ./CLAUDE.md and rely on headings
C) Put everything into README and tell Claude to read it
D) In ~/.claude/rules/ so they apply everywhere

Q3. You want certain rules to apply only when Claude is working on TypeScript files under src/ and lib/. How do you scope a rule file to specific paths?

A) Name the rule file after the directory, e.g., src.md
B) Place the rule file inside the target directory
C) Add a paths field in YAML frontmatter at the top of the rule file
D) Use an @include directive in the project CLAUDE.md

Q4. Your CLAUDE.md contains the line @docs/git-instructions.md to pull in shared git workflow instructions. How is this path resolved?

A) Relative to the git repository root
B) Relative to the file containing the import
C) As an absolute path from the system root
D) Relative to the current working directory where Claude Code was launched

Q5. You want Claude to load CLAUDE.md files from an extra directory provided via --add-dir. What do you need to do?

A) Nothing, --add-dir automatically loads all memory files from that directory
B) Copy those CLAUDE.md files into your main repo
C) Only organization-managed policy can enable cross-directory memory loading
D) Set the CLAUDE_CODE_ADDITIONAL_DIRECTORIES_CLAUDE_MD=1 environment variable

That’s it for this week. I write this weekly on Mondays. If this was useful, subscribe below:

ACW #13: GLM‐5, MiniMax M2.5, and GPT‐5.3‐Codex‐Spark

1. Tooling and Model Updates

GLM-5

MiniMax M2.5

GPT-5.3-Codex-Spark

Quick Updates

2. Community Picks

I Improved 15 LLMs at Coding in One Afternoon. Only the Harness Changed.

MicroGPT

3. Weekly Quiz

Reply

Keep Reading

Workflow of the Week Collection

2025 Agentic Coding Reading List

ACW #14: Claude Sonnet 4.6, Gemini 3.1 Pro, and Spec-Driven Development

Agentic Coding Weekly