本文宣布谷歌 Gemini 2.5 Flash 图像处理模型正式面向生产环境推出,重点介绍了其在图像生成和编辑方面的先进功能。主要新功能包括支持 10 种不同的长宽比,方便用户为电影、社交媒体等各种平台创建内容,并支持仅输出图像。该模型使用户能够无缝混合多个图像,保持角色一致性以实现更丰富的叙事,使用自然语言进行精准编辑,并利用 Gemini 广泛的通用知识。开发者可以通过 Gemini API、Google AI Studio 和 Vertex AI 在企业环境中使用该模型。文章展示了 Cartwheel 和 Volley 等公司的实际应用案例,证明了该模型在提供卓越的角色控制、保持姿势以及为实时应用提供低延迟和美学引导的图像生成方面的有效性。此外,文章还为开发者提供了文档、使用指南、使用该模型构建的 AI 驱动应用示例、定价详情以及 Python 代码示例等资源,以鼓励开发者立即采用。
Our state-of-the-art image generation and editing model which has captured the imagination of the world, Gemini 2.5 Flash Image 🍌, is now generally available, ready for production environments, and comes with new features like a wider range of aspect ratios in addition to being able to specify image-only output.
Gemini 2.5 Flash Image empowers users to seamlessly blend multiple images, maintain consistent characters for richer storytelling, perform targeted edits with natural language, and leverage Gemini's extensive world knowledge for image generation and modification. The model is accessible through the Gemini API on Google AI Studio and on Vertex AI for enterprise use.
Further expanding creative possibilities, the model now supports 10 different aspect ratios. This allows for effortless content creation across various formats, from cinematic landscapes to vertical social media posts.
Supported ratios include:
- Landscape: 21:9, 16:9, 4:3, 3:2
- Square: 1:1
- Portrait: 9:16, 3:4, 2:3
- Flexible: 5:4, 4:5
What people are building
Cartwheel is harnessing AI to move beyond the "slot machine user experience" of many image generators, giving artists direct control to bring their creative vision to life. After months of building their "Pose Mode" feature and finding that other models failed to deliver, the team found a solution in the Gemini 2.5 Flash Image. By combining Cartwheel's 3D posing tool with Gemini 2.5 Flash Image, they have created a powerful new image creation system that delivers unparalleled character control and consistency.
“Other models couldn't render characters from arbitrary camera angles or maintain faithfulness to a pose without sacrificing "world knowledge". The new Gemini 2.5 Flash Image model was the first that could provide both.” - Andrew Carr, Co-founder of Cartwheel
Link to Youtube Video (visible only when JS is disabled)
Volley, the creators of the AI powered dungeon crawler Wit's End, use Gemini 2.5 Flash Image to generate and edit visuals in-session—character portraits, dynamic scene stills, multi-character compositions, and quick iterative edits from chat or voice.
“The model demonstrates state-of-the-art rule-following to aesthetic guidance while retaining latency under <10s, unlocking many live applications, for example, letting players select styles and refine outputs in multi-turn loops.” - James Wilsterman, CTO at Volley
Link to Youtube Video (visible only when JS is disabled)
It's been incredible to see the community's creativity in action during recent hackathons with Kaggle and Cerebral Valley, which saw hundreds of submissions showcasing the model's capabilities in diverse fields like STEM education, marketing collateral, and real-time augmented reality.
Start building
Developers can begin building with Gemini 2.5 Flash Image today. Check out the developer docs and cookbook for guidance on the new features, including the expanded aspect ratios and the ability to specify image-only output. The model is available via the Gemini API and for testing in Google AI Studio.
Building with Gemini 2.5 Flash Image is easy with Google AI Studio’s “build mode.” Instantly create and remix custom AI-powered apps from a single prompt, like "Build me an image editing app with filters." When you're ready, deploy your creation directly from AI Studio or save the code to GitHub—all for free. Try out and remix some of our example apps:
- Bananimate: Create animated GIFs with Nano Banana from your images and prompts.
- Enhance: Infinitely zoom into any photography with our creative upscaler. See if you can find the easter-egg (hint: 🍌)
- Fit check: Upload a photo of yourself and an outfit to see how it looks on you. A virtual fitting room powered by Nano Banana.
Gemini 2.5 Flash Image is priced at $0.039 per image, with a rate of $30.00 per 1 million output tokens. Pricing for other input and output modalities aligns with the standard Gemini 2.5 Flash pricing.
Here is a sample code to get you started:
from google import genai
from google.genai import types
from PIL import Image
client = genai.Client()
prompt = "Create a photograph of the subject in this image as if they were living in the 1980s. The photograph should capture the distinct fashion, hairstyles, and overall atmosphere of that time period."
image = Image.open('/path/to/image.png')
response = client.models.generate_content(
model="gemini-2.5-flash-image",
contents=[prompt, image],
config=types.GenerateContentConfig(
response_modalities=["IMAGE"],
image_config=types.ImageConfig(
aspect_ratio="16:9",
)
)
)
for part in response.parts:
if part.inline_data is not None:
generated_image = part.as_image()
generated_image.show()
We are continually amazed by the creativity of our developer community. We can't wait to see what you build next!
