Gemini 2.0 Flash 原生图像生成实验：开发者的新选择

Google 发布了 Gemini 2.0 Flash 的实验版本，该版本引入了原生图像生成功能，并已向所有 Google AI Studio 支持的区域的开发者开放。Gemini 2.0 Flash 结合了多模态输入、增强的推理能力和自然语言理解，可以根据用户需求生成图像。文章通过多个示例展示了 Gemini 2.0 Flash 在文本图像结合、会话式图像编辑、世界知识理解和文本渲染方面的优势。开发者可以通过 Gemini API 开始使用 Gemini 2.0 Flash，并根据官方文档了解更多关于图像生成的信息。Google 鼓励开发者提供反馈，以帮助最终确定生产版本。

In December we first introduced native image output in Gemini 2.0 Flash to trusted testers. Today, we're making it available for developer experimentation across all regions currently supported by Google AI Studio. You can test this new capability using an experimental version of Gemini 2.0 Flash (gemini-2.0-flash-exp) in Google AI Studio and via the Gemini API.

Gemini 2.0 Flash combines multimodal input, enhanced reasoning, and natural language understanding to create images that give you exactly what you ask for.

Here are some examples of where 2.0 Flash’s multimodal outputs shine:

1. Text and images together

Use Gemini 2.0 Flash to tell a story and it will illustrate it with pictures, keeping the characters and settings consistent throughout. Give it feedback and the model will retell the story or change the style of its drawings.

Story and illustration generation in Google AI Studio

2. Conversational image editing

Gemini 2.0 Flash helps you edit images through many turns of a natural language dialogue, great for iterating towards a perfect image, or to explore different ideas together.

Multi-turn conversation image editing maintaining context throughout the conversation in Google AI Studio

3. World understanding

Unlike many other image generation models, Gemini 2.0 Flash leverages world knowledge and enhanced reasoning to create the right image. This makes it perfect for creating detailed imagery that’s realistic–like illustrating a recipe. While it strives for accuracy, like all language models, its knowledge is broad and general, not absolute or complete.

Interleaved text and image output for a recipe in Google AI Studio

4. Text rendering

Most image generation models struggle to accurately render long sequences of text, often resulting in poorly formatted or illegible characters, or misspellings. Internal benchmarks show that 2.0 Flash has stronger rendering compared to leading competitive models, and great for creating advertisements, social posts, or even invitations.

Image outputs with long text rendering in Google AI Studio

Start making images with Gemini today

Get started with Gemini 2.0 Flash via the Gemini API. Read more about image generation in our docs.

from google import genai
from google.genai import types

client = genai.Client(api_key="GEMINI_API_KEY")

response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents="Generate a story about a cute baby turtle in a 3d digital art style. For each scene, generate an image.",
    config=types.GenerateContentConfig(response_modalities=["Text", "Image"])
)

Whether you are building AI agents, developing apps with beautiful visuals like illustrated interactive stories, or brainstorming visual ideas in conversation, Gemini 2.0 Flash allows you to add text and image generation with just a single model. We're eager to see what developers create with native image output and your feedback will help us finalize a production-ready version soon.

{{userData.name}}已认证

Gemini 2.0 Flash 原生图像生成实验：开发者的新选择

1. Text and images together

2. Conversational image editing

3. World understanding

4. Text rendering

Start making images with Gemini today

【早阅】氛围编程：AI 幻觉让你错失 20% 实际生产力，并累积巨额技术债务！

Vol.72 技术、应用、资本，2025 年 9 月 AI 行业综述---154 页 PPT

嘉兴南湖又将冲出一个 IPO！清华博士夫妻造物流机器人，年入 7 亿

拆箱开源版 Coze：Agent 核心三件套大公开，48 小时揽下 9K Star

UX 与产品设计师的职业发展路径 — Smashing Magazine

Getting AI Governance Right Without Slowing Everything Down

四年前端分享给你的高效开发工具库

帮助 AI 智能体使用 API 的两种方法（以及为什么你需要两者兼顾）