Kimi K2.5：视觉智能体智能

AI 前线
1月31日
编辑

勇敢牛牛

本文探讨了 Moonshot AI 发布的 Kimi K2.5，这标志着从仅限文本的 K2 模型进行的重大升级。关键增强功能包括在 15 万亿混合 Token 上训练的原生多模态支持，以及能够协调多达 100 个子智能体进行并行工作流的独特“智能体集群”架构。作者通过 SVG 视觉生成测试了该模型，并通过让其将复杂的软件插件项目分解为可并行的组件来测试其推理深度。虽然该模型展示了令人印象深刻的代码编写和规划能力，但作者也指出了其 595GB 的庞大体积，以及一种非标准的“修改版 MIT”许可证，该许可证对高收入商业用户提出了 UI 品牌展示要求。

Kimi K2.5: Visual Agentic Intelligence (via) Kimi K2 landed in July as a 1 trillion parameter open weight LLM. It was joined by Kimi K2 Thinking in November which added reasoning capabilities. Now they've made it multi-modal: the K2 models were text-only, but the new 2.5 can handle image inputs as well:

Kimi K2.5 builds on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens. Built as a native multimodal model, K2.5 delivers state-of-the-art coding and vision capabilities and a self-directed agent swarm paradigm.

The "self-directed agent swarm paradigm" claim there means improved long-sequence tool calling and training on how to break down tasks for multiple agents to work on at once:

For complex tasks, Kimi K2.5 can self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls. Compared with a single-agent setup, this reduces execution time by up to 4.5x. The agent swarm is automatically created and orchestrated by Kimi K2.5 without any predefined subagents or workflow.

I used the OpenRouter Chat UI to have it "Generate an SVG of a pelican riding a bicycle", and it did quite well:

Cartoon illustration of a white pelican with a large orange beak and yellow throat pouch riding a green bicycle with yellow feet on the pedals, set against a light blue sky with soft bokeh circles and a green grassy hill. The bicycle frame is a little questionable. The pelican is quite good. The feet do not quite align with the pedals, which are floating clear of the frame.

As a more interesting test, I decided to exercise the claims around multi-agent planning with this prompt:

I want to build a Datasette plugin that offers a UI to upload files to an S3 bucket and stores information about them in a SQLite table. Break this down into ten tasks suitable for execution by parallel coding agents.

Here's the full response. It produced ten realistic tasks and reasoned through the dependencies between them. For comparison here's the same prompt against Claude Opus 4.5 and against GPT-5.2 Thinking.

The Hugging Face repository is 595GB. The model uses Kimi's janky "modified MIT" license, which adds the following clause:

Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.

{{userData.name}}已认证

Kimi K2.5：视觉智能体智能

在使用智能体编程时评估内部质量

中国开源 AI 生态的架构选择：超越 DeepSeek 的建设之路

李飞飞世界模型公司一年估值暴涨 5 倍！正洽谈新一轮 5 亿美元融资

从 Prompt Engineering 到 Context Engineering

字节突然开源 Seed-OSS，512K 上下文主流 4 倍长度，推理能力刷纪录

一文读懂 AI Search：从 RAG 到 DeepSearch

使用 Amazon Bedrock 在 Flo Health 扩展医疗内容审核（第一部分） | Amazon Web Services

从 Prompt 到 Context：为什么 Think Tool 是形式化的必然？