Claude Sonnet 4.5:新一代编码模型之王

本文介绍了 Anthropic 的新 Claude Sonnet 4.5,将其定位为编码、复杂 Agent 构建和计算机交互的领先模型。作者 Simon Willison 的初步印象表明,它在编码方面超越了 GPT-5-Codex。一个突出的特点是 Sonnet 4.5 在 Claude.ai 的代码解释器上的卓越性能,该解释器允许直接克隆 GitHub 仓库以及从 NPM 和 PyPI 安装软件包。作者详细介绍了一个雄心勃勃的实验,该实验从他的手机发起,其中 Sonnet 4.5 成功地修改了一个 Python 项目,以在 SQLite 数据库中实现树状结构的对话,包括模式变更、实用函数和全面的测试。虽然它的图像生成能力很强,但据指出,在绘制自行车方面略逊于 GPT-5-Codex。文章还介绍了 Sonnet 4.5 具有竞争力的定价,与 Claude Opus 相比更具优势。它已广泛应用于 OpenRouter、Cursor 和 GitHub Copilot 等平台。此外,Anthropic 还推出了新的开发者工具,如 VS Code 扩展和重新命名的 Claude Agent SDK。




29th September 2025

Anthropic released Claude Sonnet 4.5 today, with a very bold set of claims:

Claude Sonnet 4.5 is the best coding model in the world. It’s the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math.

Anthropic gave me access to a preview version of a “new model” over the weekend which turned out to be Sonnet 4.5. My initial impressions were that it felt like a better model for code than GPT-5-Codex, which has been my preferred coding model since it launched a few weeks ago. This space moves so fast—Gemini 3 is rumored to land soon so who knows how long Sonnet 4.5 will continue to hold the “best coding model” crown.

The pricing is the same as the previous Sonnet: $3/million input tokens and $15/million output tokens. This remains significantly cheaper than Claude Opus—$15/$75—but still quite a bit more than GPT-5 and GPT-5-Codex, both at $1.25/$10.

It really shines with Claude.ai Code Interpreter

The claude.ai web interface (not yet the Claude iPhone native app) recently added the ability for Claude to write and then directly execute code in a sandboxed server environment, using Python and Node.js. I wrote about that in detail three weeks ago.

Anthropic’s implementation of this code interpreter pattern is more powerful than ChatGPT’s equivalent because it can directly clone code from GitHub and install software packages from NPM and PyPI.

Sonnet 4.5 absolutely shines at using this tool!

Here’s my most ambitious experiment with it. I started like this:

Checkout https://github.com/simonw/llm and run the tests with

pip install -e '.[test]'

pytest

It did exactly as I asked, checking out my LLM repo, installing it along with all of its dependencies, and running the tests.

✅ All 466 tests passed in 167.69 seconds (about 2 minutes 47 seconds)

Then I posed it a wildly ambitious challenge:

The tool stores prompts and responses in a SQLite database. It currently uses a single conversation row and a linear collection of responses within that conversation. Experiment with the idea of adding a parent_response_id column to that responses table and using it to model the responses in a conversation as a tree instead. Use new pytest tests which you execute individually (to save time on the whole test suite) to exercise your design. Write a tree_notes.md file which you first populate with your plan and then use for notes as you go along.

As you can see in the resulting conversation transcript, it did a really impressive job of this! It churned away across dozens of different tool calls, eventually outputting the following summary of its activity:

I’ve successfully experimented with adding tree-structured conversations to the LLM CLI tool. Here’s what was accomplished:

Implementation Complete

1. Database Schema Enhancement

  • Added parent_response_id column to the responses table via migration m022_parent_response_id
  • Enables tree-structured (branching) conversations instead of just linear sequences
  • Fully backward compatible—existing data unaffected

2. Comprehensive Utility Module (tree_utils.py)

  • 12 helper functions for tree operations:
    • Navigation (children, parent, siblings, paths)
    • Analysis (depth, descendants, size)
    • Queries (roots, leaves, statistics)
    • Visualization (ASCII tree display)

3. Complete Test Suite (test_tree_conversations.py)

  • 16 tests covering all tree operations
  • Tests linear chains, branching, multiple roots, forests
  • Includes realistic debugging scenario
  • All tests pass ✅

4. Integration Verification

  • Updated existing migration tests to expect new column
  • All 6 migration tests pass
  • Total: 22/22 tests passing ✅

📦 Deliverables (7 files in /mnt/user-data/outputs/)

  1. README.md—Quick start guide
  2. IMPLEMENTATION_SUMMARY.md—Complete technical overview
  3. MIGRATION_CODE.md—Database migration details
  4. TEST_RESULTS.md—Full test coverage report
  5. tree_notes.md—Design decisions and development notes
  6. tree_utils.py—Utility functions module
  7. test_tree_conversations.py—Test suite

🎯 Key Features

  • Multiple branches from any conversation point
  • Multiple roots per conversation (forest structure)
  • Rich analytics (depth, branching factor, tree size)
  • ASCII tree visualization
  • Cycle detection for safety

🚀 Ready for Integration

The implementation is production-ready with comprehensive testing. Next steps would be:

  1. Integrate tree_utils.py into LLM package
  2. Add CLI commands (llm branch, llm tree)
  3. Update Response.log() to accept parent_response_id

All code is tested, documented, and ready to use!

Did I mention I kicked off this experiment by entering prompts on my phone?

I told it to Give me a zip file of everything you have done so far—you can explore the contents of the file it made me in this Gist.

Pelicans

I released llm-anthropic 0.19 adding support for the new model. Then I ran my classic pelican benchmark, first with thinking turned on:

llm -m claude-sonnet-4.5 -o thinking 1 -o thinking_budget 4000 \
  'Generate an SVG of a pelican riding a bicycle'

Claude Sonnet 4.5:新一代编码模型之王

And then without:

llm -m claude-sonnet-4.5 \
  'Generate an SVG of a pelican riding a bicycle'

Claude Sonnet 4.5:新一代编码模型之王

These are pretty good—they are recognizably pelicans!—though not quite as good as GPT-5-Codex which is better at drawing bicycles.

I also tried having it describe this image:

A whole mess of pelicans, described below.

llm -m claude-sonnet-4.5 'describe this image' -a https://static.simonwillison.net/static/2024/pelicans.jpg

This image shows a large gathering of pelicans crowded together along what appears to be a waterfront or shoreline. The birds are densely packed, creating a chaotic scene with dozens of brown pelicans visible. Their distinctive long bills and pouches are clearly visible among the mass of feathered bodies. In the background, there’s calm water and a dark tree line or forested area. The lighting suggests this was taken during early morning or late afternoon, giving the scene a golden, warm tone. The pelicans appear to be resting or congregating together, which is typical behavior for these colonial seabirds.

Anthropic are rolling this out everywhere

The release of this model has been very well coordinated. My embargo on talking about it was due to lift at 10am Pacific today, and I got an email from them linking to their blog post at 10am on the dot. It’s also already live on OpenRouter and in Cursor and GitHub Copilot and no doubt a whole bunch of other places as well.

Anthropic also shipped a new Claude Code VS Code extension today, plus a big upgrade to the Claude Code terminal app. Plus they rebranded their confusingly named Claude Code SDK to the Claude Agent SDK instead, emphasizing that it’s a tool for building agents beyond just customizing the existing Claude Code product. That’s available for both TypeScript and Python.


AI 前线

王炸组合,阶跃星辰 SOTA 模型 Step-Video 和 Step-Audio 模型开源

2025-12-23 22:26:24

AI 前线

Node.js 事件循环详解

2025-12-23 22:26:34

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索