Dagger 实战：用代码定义 CI/CD Pipeline，彻底告别 YAML 地狱

痛点：YAML Pipeline 的运维噩梦

如果你维护过超过 10 个微服务的 CI/CD，一定经历过这些：

GitHub Actions / GitLab CI YAML 膨胀：几百行的 .github/workflows/ 文件，嵌套 if 条件、矩阵策略、重复的 step 定义，改一个参数要翻半天
本地无法复现：Pipeline 只能在 CI 环境跑，本地调试靠猜，一次 push 等 10 分钟看结果
跨项目复用困难：公共逻辑靠 copy-paste 或 composite action，版本管理混乱
缓存玄学：CI 缓存命中率低，构建时间随项目增长线性膨胀

Dagger 的核心思路是：用真正的编程语言（Go/Python/TypeScript）定义 Pipeline，在容器中执行，本地和 CI 行为完全一致。

方案：Dagger 核心架构

Dagger 由三个关键组件构成：

组件	作用
Dagger Engine	容器化执行引擎，管理构建图、缓存和沙箱
SDK（Go/Python/TS 等 8 种语言）	类型安全的 API，用你熟悉的语言定义 Pipeline
Dagger Cloud（可选）	分布式缓存 + OpenTelemetry 可观测性

核心原理：每个 Pipeline 操作都是一个 有向无环图（DAG）节点，输入确定则输出确定，天然支持增量执行和缓存。

实操：用 Python SDK 构建一个完整 Pipeline

Step 1：安装 Dagger CLI

# Linux/macOS 一键安装
curl -fsSL https://dl.dagger.io/dagger/install.sh | sh
# 验证
dagger version
# dagger v0.18.x linux/amd64

唯一依赖：本机有 Docker（或兼容的容器运行时）。

Step 2：初始化 Dagger 模块

mkdir my-pipeline && cd my-pipeline
dagger init --sdk=python --name=my-ci

生成目录结构：

my-pipeline/
├── dagger.json         # 模块元数据
├── pyproject.toml
└── src/
    └── main/__init__.py  # Pipeline 代码入口

Step 3：编写 Pipeline 代码

编辑 src/main/__init__.py：

import dagger
from dagger import dag, function, object_type


@object_type
class MyCi:
    @function
    async def build(self, source: dagger.Directory) -> dagger.Container:
        """构建 Python 应用镜像"""
        return (
            dag.container()
            .from_("python:3.12-slim")
            .with_directory("/app", source)
            .with_workdir("/app")
            .with_exec(["pip", "install", "--no-cache-dir", "-r", "requirements.txt"])
            .with_exec(["python", "-m", "pytest", "tests/", "-v"])
        )

    @function
    async def lint(self, source: dagger.Directory) -> str:
        """代码质量检查"""
        return await (
            dag.container()
            .from_("python:3.12-slim")
            .with_exec(["pip", "install", "ruff"])
            .with_directory("/app", source)
            .with_workdir("/app")
            .with_exec(["ruff", "check", "."])
            .stdout()
        )

    @function
    async def publish(
        self, source: dagger.Directory, registry: str, tag: str
    ) -> str:
        """构建并推送 Docker 镜像"""
        container = await self.build(source)
        addr = await (
            container
            .with_entrypoint(["python", "-m", "myapp"])
            .publish(f"{registry}:{tag}")
        )
        return f"Published: {addr}"

Step 4：本地运行和在 CI 中复用

# 本地执行 lint
dagger call lint --source=.

# 本地执行 build
dagger call build --source=.

# 在 GitHub Actions 中同样一行搞定
# .github/workflows/ci.yml
# jobs:
#   ci:
#     runs-on: ubuntu-latest
#     steps:
#       - uses: actions/checkout@v4
#       - uses: dagger/dagger-for-github@v7
#         with:
#           verb: call
#           args: build --source=.

关键：本地和 CI 执行完全相同的代码路径，不存在"CI 里才出 bug"的问题。

避坑指南

1. 缓存层失效导致重复安装依赖

问题：每次改代码都重新 pip install，构建慢。

解决：利用 Dagger 的 with_mounted_cache 持久化 pip 缓存：

@function
async def build(self, source: dagger.Directory) -> dagger.Container:
    pip_cache = dag.cache_volume("python-pip")
    return (
        dag.container()
        .from_("python:3.12-slim")
        .with_mounted_cache("/root/.cache/pip", pip_cache)
        .with_directory("/app", source)
        .with_workdir("/app")
        .with_exec(["pip", "install", "-r", "requirements.txt"])
    )

2. Secret 管理不当

问题：Registry 密码硬编码或通过环境变量明文传入。

解决：用 Dagger 内置 Secret 类型，运行时注入，日志中自动 mask：

dagger call publish --source=. --registry=ghcr.io/myorg/myapp --tag=v1.0.0 \
  --registry-password=env:REGISTRY_PASSWORD

3. 大型 Monorepo 构建慢

问题：整个 source 目录传入容器，包含 node_modules、.git 等无关文件。

解决：在传入前过滤：

source_clean = source.without_directory(".git").without_directory("node_modules")

Dagger vs 传统 CI YAML 对比

维度	YAML Pipeline	Dagger
语言	YAML（无类型、无 IDE 补全）	Go/Python/TS（类型安全、可测试）
本地调试	❌ 依赖 CI 环境	✅ `dagger call` 本地直接跑
缓存	手动配置 key，命中率看运气	自动内容寻址缓存
复用	Composite Action / Template（脆弱）	模块化函数，跨语言调用
可观测性	CI 平台日志	OpenTelemetry 全链路 Trace
厂商锁定	绑定 GitHub/GitLab/Jenkins	容器运行时即可，CI 无关

总结

Dagger 解决了 CI/CD 领域长期存在的三大痛点：YAML 不可维护、本地无法复现、跨平台锁定。

核心收益： 1. Pipeline 即代码 — 用真实编程语言写，有类型检查、单元测试、代码审查 2. 本地优先 — 开发者在笔记本上就能完整运行 Pipeline，提交前就知道结果 3. 增量执行 — 内容寻址缓存让只改一行代码时只重新执行受影响的步骤

落地建议：先从一个小项目试点，将现有 CI YAML 中最复杂的那个 workflow 用 Dagger 改写，感受本地调试和缓存命中带来的效率提升，再逐步推广。