Gemini API 批量模式:以更低成本处理更多任务

本文介绍了 Google Gemini API 的全新批量模式功能,旨在异步处理高吞吐量、对延迟不敏感的 AI 工作负载。相比同步 API,主要优势包括显著降低 50% 的成本、大幅提升大型任务的速率限制,并通过卸载复杂的客户端排队和重试逻辑来简化 API 调用。使用流程非常简单:用户将所有请求打包成单个文件,通过 API 提交,并在 24 小时内获取结果。文章通过实际案例突出了该功能的实用性,例如 Reforged Labs 利用 Gemini 2.5 Pro 结合批量模式进行大规模视频内容分析和标签标注,从而显著节省成本并增强了可扩展性。另一个例子是 Vals AI,它利用批量模式进行大规模基础模型评估,不再受限于常见的速率限制。文章提供了一个 Python SDK 代码片段,方便开发者快速上手,体验这一新功能的便捷性。最后,文章还提供了全面的文档、操作指南和定价详情链接,并指出批量模式目前正逐步向所有用户开放,未来还将扩展更多功能,这标志着 AI 可扩展处理迈出了重要一步。


Gemini models are now available in Batch Mode

Today, we’re excited to introduce a batch mode in the Gemini API, a new asynchronous endpoint designed specifically for high-throughput, non-latency-critical workloads. The Gemini API Batch Mode allows you to submit large jobs, offload the scheduling and processing, and retrieve your results within 24 hours—all at a 50% discount compared to our synchronous APIs.

Batch Mode is the perfect tool for any task where you have your data ready upfront and don’t need an immediate response. By separating these large jobs from your real-time traffic, you unlock three key benefits:

  • Cost savings: Batch jobs are priced at 50% less than the standard rate for a given model

  • Higher throughput: Batch Mode has even higher rate limits

  • Easy API calls: No need to manage complex client-side queuing or retry logic. Available results are returned within a 24-hour window.


A simple workflow for large jobs

We’ve designed the API to be simple and intuitive. You package all your requests into a single file, submit it, and retrieve your results once the job is complete. Here are some ways developers are leveraging Batch Mode for tasks today:

  • Bulk content generation and processing: Specializing in deep video understanding, Reforged Labs uses Gemini 2.5 Pro to analyze and label vast quantities of video ads monthly. Implementing Batch Mode has revolutionized their operations by significantly cutting costs, accelerating client deliverables, and enabling the massive scalability needed for meaningful market insights.
Bulk content generation and processing
  • Model evaluations: Vals AI benchmarks foundation models on real-world use cases, including legal, finance, tax and healthcare. They’re using Batch Mode to submit large volumes of evaluation queries without being constrained by rate limits.
Model evaluations

Get started in just a few lines of code

You can start using Batch Mode today with the Google GenAI Python SDK:

# Create a JSONL that contains these lines:
# {"key": "request_1", "request": {"contents": [{"parts": [{"text": "Explain how AI works in a few words"}]}]}},
# {"key": "request_2", "request": {"contents": [{"parts": [{"text": "Explain how quantum computing works in a few words"}]}]}}

uploaded_batch_requests = client.files.upload(file="batch_requests.json")

batch_job = client.batches.create(
    model="gemini-2.5-flash",
    src=uploaded_batch_requests.name,
    config={
        'display_name': "batch_job-1",
    },
)

print(f"Created batch job: {batch_job.name}")

# Wait for up to 24 hours

if batch_job.state.name == 'JOB_STATE_SUCCEEDED':
    result_file_name = batch_job.dest.file_name
    file_content_bytes = client.files.download(file=result_file_name)
    file_content = file_content_bytes.decode('utf-8')

    for line in file_content.splitlines():
      print(line)
Python

To learn more, check out the official documentation and pricing pages.

We're rolling out Batch Mode for the Gemini API today and tomorrow to all users. This is just the start for batch processing, and we're actively working on expanding its capabilities. Stay tuned for more powerful and flexible options!

AI 前线

10 分钟就拿到了朱啸虎投资的 AI 陪伴产品,想让年轻人不孤独|Hao 好聊 X 孙兆治

2025-12-23 13:01:45

AI 前线

OpenAI 发布最强推理模型 o3 和 o4-mini:图像深度思考首秀,还能自主调用工具

2025-12-23 13:01:49

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索