Cloudflare 通过左移安全实践扩展基础设施即代码

Cloudflare 通过'左移'安全方法成功扩展了其基础设施即代码（IaC）实践，消除了数百个生产账户中的手动配置错误。面对潜在的全球错误配置传播挑战，该公司采用了将所有基础设施配置视为代码的策略，强制执行强制性同行评审，并将自动化安全检查集成到其 CI/CD 流水线中。该系统每天处理约 30 个合并请求，在部署前使用约 50 个关键安全策略防止安全违规。该实现利用了 Terraform 与 Cloudflare Terraform Provider、Atlantis、GitLab 以及用于安全状态管理的自定义 Go 程序（tfstate-butler）。策略执行依赖于 Open Policy Agent（OPA）和 Rego。吸取的关键经验教训包括使用 cf-terraforming 实用程序加速采用、通过自动检测防止配置漂移，以及确保 API 与 OpenAPI 生成的 Terraform Provider 保持一致。这种主动模型不仅能防止事件发生，还能提高工程信心和速度，与更广泛的行业趋势——持续和自动化安全验证——保持一致。

Cloudflare has eliminated manual configuration errors across hundreds of production accounts by implementing Infrastructure as Code with automated policy enforcement, processing approximately 30 merge requests daily while catching security violations before deployment rather than after incidents occur.

The company's Customer Zero team faced a critical problem: a single misconfiguration could propagate across Cloudflare's global edge in seconds, potentially locking out employees or taking down production services. Manual dashboard management across hundreds of accounts created too many opportunities for human error at this scale.

The solution centered on treating all infrastructure configurations as code with mandatory peer review and automated security checks. Every production change now goes through a validation pipeline that enforces approximately 50 security policies before deployment. Teams still use the dashboard for analytics and observability, but critical production changes require code commits tied to users, tickets, and automated compliance checks.

According to Chase Catelli, Ryan Pesek, and Derek Pitts from Cloudflare's team, this shift-left approach moves security validation to the earliest stages of development, catching issues when remediation costs are lowest. The model prevents incidents rather than responding to them, while actually increasing engineering velocity by giving teams confidence that their changes are compliant.

The implementation centers on Terraform with the Cloudflare Terraform Provider, integrated into a custom continuous integration and deployment pipeline running on Atlantis with GitLab. All production account configurations live in a centralized monorepo, with individual teams owning and deploying their specific sections as designated code owners.

Cloudflare's Infrastructure as Code data flow diagram

A custom Go program called tfstate-butler acts as an HTTP backend for Terraform, serving as a secure state file broker. The design prioritizes security by ensuring unique encryption keys per state file, limiting the potential blast radius from any compromise.

Policy enforcement uses the Open Policy Agent framework with Rego language to validate security requirements. Policies run automatically on every merge request, operating in two modes: warnings that allow deployment with comments or denials that block changes entirely. Exception handling requires formal Jira-based approval followed by a pull request to document the deviation.

The migration revealed critical lessons about scaling Infrastructure as Code. High barriers to entry initially stalled adoption as Terraform fluency varied across teams. The cf-terraforming command-line utility, which automatically generates Terraform code from the Cloudflare API, significantly accelerated onboarding by eliminating manual resource imports.

Configuration drift emerged when teams made urgent dashboard changes during incidents, leaving Terraform state out of sync. Cloudflare implemented automated drift detection, which continuously compares state files with deployed configurations and automatically creates remediation tickets with service-level agreements when discrepancies are detected.

Cloudflare Terraform Provider lagging API capabilities created friction as Cloudflare's rapid product innovation outpaced Terraform support. The v5 provider release resolved this by automatically generating code from OpenAPI specifications, maintaining continuous alignment between product APIs and infrastructure code capabilities.

The shift-left model demonstrates how organizations can scale Infrastructure as Code while maintaining strict security governance. By moving validation from reactive audits to proactive automated checks, Cloudflare achieved both increased security and engineering velocity.

Many companies are adopting the shift-left approach. Google Cloud points out that locating security issues in production can lead to significant financial penalties, such as GDPR fines of up to 4% of global revenue. Early detection through automated CI/CD security checks can greatly lower remediation costs and reduce the need for architectural changes. OpsMx notes challenges like implementation barriers, gaps in automation, complex tools, and organizational silos, while emphasizing that automated policy enforcement using frameworks like NIST and OWASP helps teams identify and prioritize risks without burdening developers. According to Splunk's research, 73% of companies see a lack of automation as their main challenge in shift-left practices, but AI-driven tools are quickly improving security testing through smart automation, with adoption rates rising from 64% to 78% in just one year.

The shift-left movement has evolved beyond simply moving security checks earlier. Organizations are now pursuing continuous security validation through automated scanning (SAST, SCA, DAST, secrets management), policy-as-code enforcement, and AI-driven vulnerability prioritization that provides developers with immediate, actionable feedback within their existing workflows.

{{userData.name}}已认证

Cloudflare 通过左移安全实践扩展基础设施即代码

想要更好的 AI 输出？试试上下文工程

用第一性原理拆解 Agentic Coding：从理论到实操

降本 30%、提效 200%！星巴克日志平台的架构升级之路

估值 390 亿美金，全球最贵的人形机器人公司在研究用脚关洗碗机

在 Cloudflare 上构建和部署远程模型上下文协议 (MCP) 服务器

开发者指南：ADK 多智能体模式

硅谷大厂裁员背后的组织变革丨硅谷 AI 转型录 NO.1

千问 C 端应用团队一口气四篇论文入选 ICLR 2026 国际顶会！